RE: SAN

John . Whelan Fri, 16 Aug 2002 08:00:35 -0700


 So if I get a problem with my Mainframe Adabas databases Natural or Predict
environments where do I post?


 To come back to the SAN side of things.  We have a number.  Basically the
expensive one works extremely well with Oracle databases.  Split out the
database, the log files and the redo log files as you would normally.  Get
the SAN configured so these go to different lungs or whatever your SAN
vendor calls them.  Set the block size to the same as the Oracle database
normally 8k.

 What we have found on the Unix side the disk drives in the SAN are much
faster than the drives we still have attached to the  Unix servers.  Also we
have people who actually tune the SAN.  Within the SAN we have different
disk drives with different characteristics cheap slow and big, expensive,
fast and small.  The SAN tuning basically moves the clients who have a lot
of traffic to the faster drives transparently.

 When reading cache hit ratios of 95% in Oracle plus the hardware caches on
the disk controller plus the hardware cache on the drive itself means that
roughly only 2 or 3 transactions in a 100 actually make it all the way to
the hard drive.  Then the ideal configuration is to have as many spindles as
possible so that the heads don't have to move too much.  RAID 5 works very
well.  Some applications will make heavy read and write use of a temp
database so watch out for this.

 When writing each transaction has to be written to the hard drive when
committed, plus the redo log files etc. so the work on the hard drives is
much heavier.  Also when rebuilding indexes.  RAID 5 does not work well
since to write a block on one drive you have to read the appropriate blocks
on the other drives and write both the data drive and the error checking
drive.  Mirrored drives are best.  Your SAN people should understand this.

 Our expensive SAN has something like 64 gigs of memory in its cache and we
have actually seen performance increases of 30 fold on some databases.

 Just a final comment, currently we have a UNIX server running an Oracle
database that is managing to have hardware problems on the disk drive that
runs the redo log files.  Down time means unhappy clients.  I'd love to see
them move to the SAN.  A second UNIX client had a disk controller failure
such that it reported it was writing the data to the RAID 5 disk system
unfortunately it wasn't.  Took us a while to spot that one.  I have another
client who resisted the idea of the central SAN on cost grounds.  He's
running on a Compaq server but has a brand X disk subsystem that runs RAID
5.  He has a large load that takes 15 hours writing all the time, including
building indexes which really hammers the disk drives.  Compaq actually
specially design their drives to run cool even if they are writing all the
time, Brand X does not seem to be quite so concerned.  If you get a hardware
read error on a disk normally the server will try to reread it to get the
data off, you just see a delay.  Nice systems note which drives are having
errors and signal please replace this drive before it fails.  Brand X just
struggles on.  If it gets a really hard error it calculates the value from
the other drives.  Only trouble is you get to the point where the technician
tells you, you've so many errors on this RAID system I suggest you back it
up immediately and I'll reformat the drives to keep it going.  It takes five
hours to back that server up.  Needless to say the biggest 15 gig file
wouldn't back up there were hard errors across the drives.  That file of
course was the large database file.  Fortunately this took place on a
Friday, it took us most of the weekend to clean it up.  So for me
reliability is important.  At the moment the SAN reliability is good.

 Cheerio John

-----Original Message-----
Sent: August 13, 2002 3:29 PM
To: Multiple recipients of list ORACLE-L


Nice to see someone who knows what ADABAS is.

Ron S.

-----Original Message-----
Sent: Tuesday, August 13, 2002 1:50 PM
To: Multiple recipients of list ORACLE-L


You wrote:
>
> Will the "smarter algorithm" look inside the contents of a file before
> reading it? If it does not, then how will it be able to "intelligently"
> read ahead what data Oracle wants from inside its datafile? If it does,
how
> does it decipher the Oracle's way of storage?
>

How about using a variation of the algorithm ADABAS (database) is using for
sequential user reads:
1) Return the block requested.
2) If the next block is requested in a short period read the next 5 blocks
to the
    controller cache.
3) If those are read in order and request come for the next block read each
    time 10 blocks.

This is just from the top of my head but will assist greatly for full
table scan of big tables. Also remember that in Oracle before 9i blocks
used in FTS are put at the end of lru so are great candidates to be
overwritten
and you will read them again and again.

Yechiel Adar
Mehish
----- Original Message -----
To: Multiple recipients of list ORACLE-L <[EMAIL PROTECTED]>
Sent: Tuesday, August 13, 2002 5:30 PM


>
> Tim, Jared and Kirti
>
> Jared, Kirti : Thanks a lot for your input. and yes, I read the Sane SAN
> paper.
>
> Tim : Many thanks to you for pointing out some of the big "mis"assumptions
> I had made.  I have corrected most of the stuff you had mentioned except
> for these
>
> >.    * I'm less clear on whether SANs themselves perform read-ahead and
> the
> > conditions under which they do so.  I'm pretty sure that they are
smarter
> > about it than what you describe;  usually read-ahead mechanisms are
> > triggered by detected patterns of usage, not algorithms as simple as
> > described...
>
> Will the "smarter algorithm" look inside the contents of a file before
> reading it? If it does not, then how will it be able to "intelligently"
> read ahead what data Oracle wants from inside its datafile? If it does,
how
> does it decipher the Oracle's way of storage?
>
> > * your example about "read-ahead" conflicts makes some invalid
> > assumptions, namely about space being allocated in blocks not extents
> (when
> > does *that* ever happen?) and about "read-ahead being set to 3 blocks"
> > (again, when does *that* ever happen?).
>
> It does not happen. However I am going to be talking to a bunch of
> non-Oracle folks and management so I want to keep it as simple as
possible.
>
> > Altogether, empirical evidence (i.e. many successful SAN implementations
> > under Oracle over several years) does not lend credence to your basic
> > assertion that "SAN and Oracle don't go well together".  It is a fact
> that
> > they do...
>
> I am not trying to make a statement "SAN and Oracle dont go well
together".
> I am trying to convince my management that buying a SAN does not mean that
> we never need to worry about IO any more. Even a SAN needs to be
> configured. Currently they are under the impression that there are no IO
> problems but my database IO waits are 50% of the total response time. All
> my index, table data are scattered all over the disks -  many on the same
> disk and the answer I get is "No, we are not tasking the SAN at all. There
> are no IO issues"
>
>
> Thanks a lot
>
> Babu
>
>
>
>
>
> Tim Gorman <[EMAIL PROTECTED]>@fatcity.com on 08/13/2002 07:58:29 AM
>
> Please respond to [EMAIL PROTECTED]
>
> Sent by:    [EMAIL PROTECTED]
>
>
> To:    Multiple recipients of list ORACLE-L <[EMAIL PROTECTED]>
> cc:
>
>
>
> Babu,
>
> Is it possible that you are confusing the term "SAN" with the term "NAS"?
> As I read through your email, I couldn't help thinking that you discussing
> "network-attached storage" rather than "storage-area networks".  If so,
> some
> of my comments below might change slightly, but not much...
>
> ---
>
> Most of your major assumptions are correct, but there are important
> errors...
>
>     * DBWR only does RW, never RR or SR.  Mostly Oracle server processes
do
> RR and SR, but ARCH also does SR, as do backup processes (whatever they
> are);  everyone always forgets to add backup processes to the mix...
>
>     * SW is characteristic of LGWR and ARCH, but also of processes
> performing sorting (i.e. "direct writes" wait-event).  I think you'll
agree
> that databases generating a lot of redo (and archived redo) and performing
> lots of sorting are not necessarily "misconfigured".  The amount of redo
> generated is really a characteristic of the application itself, not the
> database configuration.  High amounts of sorting can possibly be tuned,
but
> that too is more a characteristic of the application and users usage of it
> than database configuration...
>
>     * I'm not sure what your conclusions regarding RAID5 chunks or RAID0
> stripes are, but I suspect they are incorrect.  Oracle DB_BLOCK_SIZE
should
> not come close to matching RAID5 chunksize of RAID0 stripesize;  even
> "(DB_BLOCK_SIZE * DB_FILE_MULTIBLOCK_READ_COUNT)", which denotes the
> largest
> I/O requests (for full-table scans) generated by Oracle should be much
> smaller than RAID5 chunksize or RAID0 stripesize, for most databases.  So,
> whatever conclusions you are drawing from that point about sizes is likely
> incorrect...
>
>     * I'm less clear on whether SANs themselves perform read-ahead and the
> conditions under which they do so.  I'm pretty sure that they are smarter
> about it than what you describe;  usually read-ahead mechanisms are
> triggered by detected patterns of usage, not algorithms as simple as
> described...
>
> Anyway, based on your mistaken assumptions, your list of conflicts between
> SAN and Oracle are quite mistaken as well...
>
>     * the difference between the "stripe width" and DB_BLOCK_SIZE is not
> "excess I/O" at the SAN level;  the disk drives do not necessary read the
> entire "stripe" or "chunk";  they merely *store* data in those extents on
> the device.  They don't have to read/write in those increments...
>
>     * your example about "read-ahead" conflicts makes some invalid
> assumptions, namely about space being allocated in blocks not extents
(when
> does *that* ever happen?) and about "read-ahead being set to 3 blocks"
> (again, when does *that* ever happen?).  You do have some of the basic
> ideas
> right, but please remember that your assumptions may be overly simplistic
> or
> just unlikely.  Moreover, remember that some of your basic assumptions
> (especially regarding SW and database configuration) are just plain
> wrong...
>
>     * your points about caching are mostly correct, except for DBWR doing
> reads again.  Also, even though LGWR uses something called the "log
> buffer",
> please be aware that this data structure is not a "cache".  A "buffer" is
a
> data structure into which data is written once and read only once;  a
> "cache" is a data structure into which data is (hopefully) written once
and
> (hopefully) read many times.  So, I/O in the LGWR stream is *not* cached
by
> Oracle at all.  The "buffer" mechanism is there purely to facilitate
> concurrency and the multiplexing of multiple server processes generating
> redo into the single LGWR process performing the write to online redo log
> files.  Lastly, your comment about "SAN's buffer can never really provide
> to
> Oracle the data it reads most ? Its already there in Oracle" is just plain
> incorrect.  Please remember the distinction between a "buffer" and a
> "cache", first of all.  Second, remember that not all I/O is cached by
> Oracle (i.e. redo).  Third, please remember that database performance
> "health" is not guaranteed by high BCHR in Oracle anyway...
>
> ---
>
> Many of the concepts discussed here are not characteristic only of SANs;
> they also pertain to file-systems, logical volume managers, JBOD, and NAS,
> not just SANs.  Please rethink some of the concepts you are thinking
> about...
>
> Altogether, empirical evidence (i.e. many successful SAN implementations
> under Oracle over several years) does not lend credence to your basic
> assertion that "SAN and Oracle don't go well together".  It is a fact that
> they do...
>
> ---
>
> If my original supposition that you are confusing "SAN" with "NAS" is
> correct, then I would agree with you that "NAS and Oracle don't go well
> together" in most situations, especially those involving high volumes of
> I/O.  NAS is great for non-DBMS uses (i.e. file serving) and for uses with
> low-volumes of I/O from DBMSs (i.e. DEV environment), but there is lots of
> empirical evidence out there indicating that NAS stinks for high-volumes
of
> I/O from DBMSs.  Of course, that opinion is based on the current state of
> affairs -- many NAS vendors have significant advances in technology in the
> pipeline that may dramatically alter that assessment in the future, even
> the
> near future...
>
> Hope this helps...
>
> -Tim
>
> ----- Original Message -----
> To: "Multiple recipients of list ORACLE-L" <[EMAIL PROTECTED]>
> Sent: Monday, August 12, 2002 11:48 PM
>
>
> All
>
> I have a meeting tomorrow where I am going to point out why SAN and Oracle
> does not go very well together. Here are my thoughts. Can you pick holes
in
> this argument, modify it or suggest any changes....
>
> TIA
>
> Babu
>
> SAN and Oracle ? Conflicting IO behavior
> *     There are four types of IO in Oracle
> 1.    Random Reads (RR) ? DBWR - Using indexes
> 2.    Sequential Reads (SR) ? DBWR - Full table scans
> 3.    Random Writes (RW) ? DBWR ? Writing dirty blocks
> 4.    Sequential Writes (SW) ? LGWR, Arch ? Writing redo logs and Redo
> Archival + Control files
>
> *     Bulk of any Oracle database's IO is done in RR, SR and RW. If SW is
> very high it denotes configuration problems.
> *     SAN (or for that matter any RAID device) is configured for writing
or
> reading large chunks at a time.  The stripe size on most SANs and RAID
> devices are 256K or more. Compare this to the Oracle block size of 4k/8k
in
> most databases (going upto 32K in datawarehouses)
> *     SANs do *Read Ahead*. If one block is requested, they read more than
> one blocks *while at the disk* hoping that the same process will request
> the other blocks some time soon.
>
> Here is the conflict.
> *     When ever Oracle does a RR, SR or RW it writes randomly and not
> sequentially.  It will read/write a particular block at a time in case of
> RR and RW and 'x' blocks (where x = dbfile_multi_block_read_count) in case
> of SR. Therefore only during SR will Oracle use the entire stripe width In
> all other cases, The difference in the stripe width and db_block_size will
> be excess IO.
> *     Why *read ahead* will cause a conflict :
> *     The internal structure of a datafile could be as follows. The file
> consists of 10 blocks. These are occupied by 3 tables.  The blocks shown
> below are numbered using table_name.block_number
>
|---------+---------+---------+---------+---------+---------+---------+-----
>
> ----+---------+---------|
> |         |         |         |         |         |         |         |
> |         |         |
> | 1.1     | 1.2     | 2.1     | 3.1     | 3.2     | 3.3     | 2.2     |
1.3
> | 2.3     | 3.4     |
> |         |         |         |         |         |         |         |
> |         |         |
>
|---------+---------+---------+---------+---------+---------+---------+-----
>
> ----+---------+---------|
>
>
>
> *     The first block on the datafile is the first block of table 1,
second
> block is the second block of table 1, the third block is the first block
of
> table 2 and so on.. (For simplicity sake, I am assuming Oracle will
> allocate space in blocks and not in extents)
> *     Now assume Oracle requests the first block of table 1.  Assume read
> ahead is set to three blocks (three blocks will be read instead of 2
> blocks). In this case the SAN will read 2.1, 3.1,3.2.
> *     The blocks 3.1 and 3.2 will be entirely useless as Oracle is never
> going to read it. SAN cannot tell that the block 2.2 that Oracle might
> possible request next is the 7th block in the datafile and so it can never
> *read ahead* intelligently.
>
> Why the buffer of SAN has very little impact w.r.t Oracle read
performance?
> *     Oracle has its own buffering for all IO types
> *     DBWR reads and writes uses the DB Buffer Cache
> *     LGWR uses the Log buffer
> *     Db buffer Cache is managed by a LRU Algorithm (Touchcount from 9I).
> *     Bulk of the IO done by Oracle is Logical IO (LIO) and not Physical
IO
> (PIO).
> *     Assume the buffer cache hit ratio is 80%. This means that only 20%
of
> the IO calls are PIO. Only 20% of the calls ever hit the SAN's cache.
Since
> this 20% is probably the least requested/never requested data (going by
> Oracle's LRU algorithm) , its quite likely that the SAN's buffers don't
> have this either.
> *     Given that Oracle is going to cache even this 20% in its buffers,
the
> next PIO call is going to be for something totally different ? which is
not
> there in the SAN's buffer.
> *     Couple this with the read-ahead (discussed earlier), Our SAN's
buffer
> is now populated with lots of data that Oracle might never use a PIO to
> retrieve.
> *     Thus the SAN's buffer can never really provide to Oracle the data it
> reads most ? Its already there in Oracle.
> To be fair, SAN's huge buffers will come as a boon to small databases ?
> where the entire database can be cached in the SAN's buffers.
>
>
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> --
> Author:
>   INET: [EMAIL PROTECTED]
>
> Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
> San Diego, California        -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from).  You may
> also send the HELP command for other information (like subscribing).
>
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> --
> Author: Tim Gorman
>   INET: [EMAIL PROTECTED]
>
> Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
> San Diego, California        -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from).  You may
> also send the HELP command for other information (like subscribing).
>
>
>
>
> --
> Please see the official ORACLE-L FAQ: http://www.orafaq.com
> --
> Author:
>   INET: [EMAIL PROTECTED]
>
> Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
> San Diego, California        -- Public Internet access / Mailing Lists
> --------------------------------------------------------------------
> To REMOVE yourself from this mailing list, send an E-Mail message
> to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
> the message BODY, include a line containing: UNSUB ORACLE-L
> (or the name of mailing list you want to be removed from).  You may
> also send the HELP command for other information (like subscribing).

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Yechiel Adar
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Smith, Ron L.
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: 
  INET: [EMAIL PROTECTED]

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

RE: SAN

Reply via email to