"Jochen F. Rick" <[EMAIL PROTECTED]> wrote: > Upgrade to a faster processor. > > Searching is slow. If the ramdisk doesn't help, I don't think anything > (besides a massive code rewrite) will help. >
Maybe it's not so massive. People have hacked things together in a day or two in the past, and simply not followed through with releasable code. If you are willing to spend a couple of minutes at startup to build an initial index, then you need no additional disk files. That's nice! Then, you need to adjust the index after each page save. To adjust, you remove all entries for the old version of the page and then add entries for the new version: (self keywordsIn: page text) do: [ :word | (index at: word) remove: page ifAbsent: [] ]. "update the page" (self keywordsIn: page text) do: [ :word | (index at: word ifAbsentPut: [ Set new ]) add: page ] Looking up the keywords could be very simple to begin with, something like: keywordsIn: aText ^aText findTokens: String separators, '.,<>/?=+!@#$%^&*()''' Finally, the searching algorithm can be adjusted to use the nice index. Here's a simple one which may well be fast enough: pagesMatchingSearch: aQuery | keywords matchingPages | keywords := self findTokensIn: aQuery. matchingPages := pages asSet. "start with all pages" keywords do: [ :keyword | "intersect with pages matching this keyword" matchingPages := matchingPages intersection: (index at: keyword ifAbsent: [#()]) ]. ^matchingPages So this is just a sketch of how to do it, but it looks like it should be posible if someone wants to blow a day or two putting it all together. Bonus points come for saving the index on disk so that swiki startup remains fast, but that gets tricky in a hurry. Maybe a sufficient way would be to add an ordered shutdown for swikis (if it doesn't exist already) and to save the index then; when swikis are started, the on-disk file is deleted so that it doesn't mislead future invocations. Trying to keep the on-disk index up to date looks quite hard! NOTE: if there is a complex in-memory data structure that involves multiple pages, then threading issues will be very significant. I think we should single thread the processing of individual requests, anyway, up until the data is ready to be spooled out, because on uni-processor machines the processor will be slammed, anyway, while processing one request. It's the network I/O that we want to multiplex, and that part happens mostly at the beginning and the end of a request/response session. -Lex