On Fri, Mar 21, 2003 at 01:39:07AM +0100, Ulrich Weigand wrote:

> >64-bit clean code, this may be different, but I haven't gotten to beat
> >on the 64-bit Oracle for Z code yet.
> OK, I can see the benefit of increasing the total amount of memary
> available for cache.  However, note that even without having a 64-bit
> Oracle, you could still have an equivalent effect by running a 31-bit
> Oracle under a 64-bit Linux kernel, and giving that kernel lots of
> memory (to be used as page cache).

True. As I said, I don't have any data on operating in that type of
configuration. For 31-bit code and 31-bit OS, there's a
demonstrateable advantage.  Since he's got a 9672, that's the
environment he's in.

Historically, I guess I'm somewhat suspicious of compatibility modes
like the mixed mode 31/64-bit stuff. None of the implementations of
such code I've ever worked with was sufficient (DEC Alpha, Cray, HP)
for production level reliability.  Perhaps you folks are better
programmers...8-).

> >The database level caching approach quickly runs into the process
> >size/disk buffer utilization problems that grow the virtual machine WSS
> >(even with raw I/O, you've just traded the system queuing the buffers
> >for the application queuing the buffers), and tends to generate the
> >problem of getting stuck in the bottom of E3 with nowhere to go;
> >symptom: database goes non-responsive.
>
> Well, this is kind of an unfair comparison.  Of course you'll have to
> make the memory that you use for MDC cache in the one scenario available
> in full to the guest for its use as cache in the other scenario.
> In any case, the original poster was running in LPAR where this issue
> doesn't even come up ...

Although he correspondingly loses significant manageability and
instrumentation capability, which was part of his trouble in the first
place. Again, I'm looking at 31-bit only environments; can't tell you
whether it would work better in a mixed-mode environment or in a
native 64-bit environment.

I also still find that the algorithms used in Linux for buffer
management are quite a bit less efficient than the ones used in VM --
that's no slight to the Linux folks, it's what they have to work with
-- and that getting VM to do something more efficiently than Linux can
is pretty much always a net win.

> >I would expect it to do so on Intel where there's no effective system
> >level disk cache. Since MDC allows Linux I/O to effectively return
> >asynchronously (even without async io in the linux driver and is active
> >even during the Linux raw disk io) you win some nice gains, esp if the
> >disk controllers also have the NVRAM DASDFW turned on.
>
> How is the Linux page/buffer cache any less asynchronous (or any less an
> 'effective system level disk cache') than VM's MDC?

Think about this way:

In the Linux case, you have:

application --> system cache --> disk

or

application --> disk (in the case of raw disk io)

This is all fine and good, subject to the fairly weak buffer
management capability in the stock Linux environment. In the VM case,
you have:

application --> Linux system cache --> MDC (read)  --> cntl unit --> disk
                                                       DASDFW

or


application --> MDC  --> cntl unit --> disk   (in the case of raw io)
                         DASDFW

with a stronger cache and I/O optimization algorthm managing the real
disk I/O over multiple machines, which would be needed in a HA
situation (I don't know of anyone running Oracle for real that uses a
single server to host a critical database. If they do, they've got
other problems bigger than this one).

If your application issues a read (either buffered or non-buffered) in
the VM case and MDC has pre-cached the response by doing a fulltrack
read or has previously cached the record, the response time for I/O
completion is significantly better than going direct to
disk. Simplistic, until you consider that if the same database table
is active for multiple database machines, you can do quite a bit of
I/O avoidance that isn't possible in the Linux-only scenario. Net
win. You also gain the early I/O completion notification from
virtualizing DASDFW, although that's more a hardware feature than a VM
feature. It does have an impact on write performance in that the write
I/O completes much more quickly, and is guaranteed via the NVRAM.

Is it better? Maybe not. It does however give you a lot more knobs to
manipulate the performance of the process. I'm of the opinion that the
I/O optimization code in VM has had more time to get optimized, and I
find that to be more tunable than the Linux code. It also has some
very inspired hardware feature exploitation code in in that Linux
hasn't inherited yet.  Time will tell -- you've got plenty of work to
keep you busy nights...8-).

> In any case, the point I was trying to make is that while VM is of
> course great for running many guests (and provides a lot of advantages
> w.r.t. configuration and system administration etc.), in the usual case
> I would not expect a *single* Linux workload to perform *better* when
> running under VM as compared to running under LPAR ...

I guess I don't ever find "single" applications. Once Linux is present
in production, the operational issues outweigh the VM
overhead. Consider also the HA requirement for real database
production workloads and it starts to get much more interesting.

> If you can find examples that contradict this, I'd be very interested
> to hear about them, as this would to me imply that there's either a
> problem in the setup or else a deficiency in the Linux kernel that
> we'd need to fix.

My favorite example is VSE running better under VM than it does
native. I know, not Linux, but the principle is the same -- be able to
do everything alone, but let VM do the heavy lifting when possible.

-- db

Reply via email to