Marvin Humphrey wrote:
On May 20, 2006, at 12:01 AM, Robert Engels wrote:
Maybe don't cache the term pages, then, just cache the frequently
requested
terms themselves.
That sounds like a winner. Search term frequencies follow a power law
distribution. Cache the top 20% or so in an LRU and you'll cut down on
disk seeks and linear scanning significantly.
Keep in mind that the .tis file is compressed: it uses far less memory
per term than a TermInfo does. So, to minimize disk i/o, one should
leave things compressed and cache portions of the .tis file instead.
The OS's buffer cache should do this well for you. But if the system
call overhead is causing significant delay, then the .tis file could be
memory mapped. And if constructing and scanning TermInfos is the
primary delay, then, of course, a cache of TermInfo's might be
indicated. In summary, there are lots of possible places to optimize
here, but it's not clear which, if any, are warranted.
Folks have benchmarked a TermInfo cache before and not found it
advantagous. But perhaps your uses are sufficiently different that this
is no longer the case.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]