Marvin Humphrey wrote:
On May 20, 2006, at 12:01 AM, Robert Engels wrote:

Maybe don't cache the term pages, then, just cache the frequently requested
terms themselves.


That sounds like a winner. Search term frequencies follow a power law distribution. Cache the top 20% or so in an LRU and you'll cut down on disk seeks and linear scanning significantly.

Keep in mind that the .tis file is compressed: it uses far less memory per term than a TermInfo does. So, to minimize disk i/o, one should leave things compressed and cache portions of the .tis file instead. The OS's buffer cache should do this well for you. But if the system call overhead is causing significant delay, then the .tis file could be memory mapped. And if constructing and scanning TermInfos is the primary delay, then, of course, a cache of TermInfo's might be indicated. In summary, there are lots of possible places to optimize here, but it's not clear which, if any, are warranted.

Folks have benchmarked a TermInfo cache before and not found it advantagous. But perhaps your uses are sufficiently different that this is no longer the case.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to