Caching search results

Bill Moseley Mon, 08 Jan 2001 09:14:06 -0800
I've got a mod_perl application that's using swish-e.  A query from swish
may return hundreds of results, but I only display them 20 at a time.  

There's currently no session control on this application, and so when the
client asks for the next page (or to jump to page number 12, for example),
I  have to run the original query again, and then extract out just the
results for the page the client wants to see.

Seems like some basic design problems there.

Anyway, I'd like to avoid the repeated queries in mod_perl, of course.  So,
in the sort term, I was thinking about caching search results (which is
just a sorted list of file names) using a simple file-system db -- that is,
(carefully) build file names out of the queries and writing them to some
directory tree .  Then I'd use cron to purge LRU files every so often.  I
think this approach will work fine and instead of a dbm or rdbms approach.


So I asking for some advice:

- Is there a better way to do this?

- There was some discussion about performance and how many files to put in
each directory in the past.  Are there some commonly accepted numbers for
this?

- For file names does it make sense to use a MD5 hash of the query string?
It would be nice to get an even distribution of files in each directory.

- Can someone offern any help with the locking issues?  I was hoping to
avoid shared locking during reading -- but maybe I'm worrying too much
about the time it takes to ask for a shared lock when reading.  I could
wait a second for the shared lock and if I don't' get it I'll run the query
again.

But it seems like if one process creates the file and begins to write
without LOCK_EX and then gets blocked, then other processes might not see
the entire file when reading.

Would it be better to avoid the locks and instead use a temp file when
creating and then do an (atomic?) rename?

Thanks very much,

Bill Moseley
mailto:[EMAIL PROTECTED]
Caching search results

Reply via email to