John, thanks a lot for your excellent reply.
Especially, I think this sentence is very convincing,
> "Well, you _can_ be a lot better since you know what
you're
> doing. You can also be a _lot_ worse when you get it
wrong.
With such a high risk, probably I should try other
tricks to improve the system performance, before
rushing into the implementation of cache.
thanks again,
Kan
--- John Haxby <[EMAIL PROTECTED]> wrote:
> Kan Deng wrote:
>
> >1. Performance.
> >
> > Since all the cached disk data resides outside
> JVM
> >heap space, the access efficiency from Java object
> to
> >those cached data cannot be too high.
> >
> >
> True, but you need to compare the relative speeds.
> If data has to be
> pulled from a file, then you're talking several
> milliseconds to fetch
> from the disk. If it's in the OS's cache (and here
> I'm rather assuming
> Linux since that's what I know about) you're talking
> about microseconds
> rather than milliseconds to fetch the data from the
> OS. Once the data
> is in the JVM, but not in the CPU cache, then you're
> down to nanosecods
> to get the data from main memory (how many depends
> on the hardware; some
> platforms take a while to get the data moving but
> when it comes, it's
> very quick; some systems are fast to get going but
> don't have the
> throughput). It's not the absolute times that are
> important though:
> once you've got the data in the OS's cache then
> things like network
> latency, display update speed and scheduling
> overheads begin to make
> themselves felt and you won't make these any less by
> getting data into
> the JVM's memory. Well, not much anyway.
>
> >2. Volatile.
> >
> > Since the OS caches the disk data in a common
> area
> >shared by multiple processes, but not only JVM. If
> >there are other processes doing disk IO at the same
> >time, chances are the cached Lucene index data from
> >disk may be wiped.
> >
> >
> What you can do by hanging on to a lot of memory is
> make the overall
> machine performance worse. In fact by denying other
> processes memory,
> you're going to force up the I/O rate and when you
> do need to go to the
> disk then it'll take much longer -- net result,
> things run slower.
> Generally speaking, because the OS has a more
> holistic view of resource
> management, you'll get better overall performance.
>
> >Therefore, a more reliable and efficient cache
> should
> >reside inside JVM heap space. But due to the
> crowded
> >JVM heap space, we have to manually "evict" the
> less
> >frequently used data from the cache.
> >
> >
> It's that last sentence that is the critical one.
> Yes, you can do your
> own cache management, but how much better are you
> going to be than the
> OS? Well, you _can_ be a lot better since you
> know what you're
> doing. You can also be a _lot_ worse when you get
> it wrong. Choosing
> the right point to flush data from the cache
> ("evict") is not all that
> straightforward: the OS buffer cache was introduced
> into BSD unix in the
> early '80s and we're still seeing work going on to
> improve the basic
> strategy 20-odd years later.
>
> If you find that you're spending an inordinate
> amount of time waiting
> for I/O for the index from the OS, then that it the
> time to start
> looking at caching strategies. My own feeling is
> that you're going to
> find easier things to fix before you get that far.
>
> >Did I mis-understand anything?
> >
> >
> Probably not, it's just that performance is more of
> an holistic approach
> and an obvious, isolated, change isn't going to have
> the effect that you
> want.
>
> jch
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
>
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]