Hi All, Well after 6 wekks of testing here is the basic way to operate SUN T3's as efficiently as possible.
Be prepared for arguments with High priests from the cult of SAME. SUN T3's are fiber attached hardware RAID 5 arrays with a "modern cache". The hardware engineers argue that if you need more I/Os/sec just add another array as a "concatenated volume". The theory being the hardware is intelligent enough to use the cache to increase throughput. It actually works as they claim. Never did explain why it wasn't a single point of failure in the end though. My hardware was 3 4810's, each with 4 - 8 cpus, each with 4 - 8GB, 2 - 4 bricks per machine First insist that multiple bricks be mounted on at least 2 mount points. (D2 and D3). DO NOT USE the forcedirectio option. I don't know why but I have been unable to take less than a 40% throughput hit with it turned on. And I don't care what other people say, no matter how much respect I have for them Insist on at least one JBOD for oracle binaries and configs Insist on at least one JBOD for redo logs (D1) This a bare minimum. One set of redo on D1 One set of redo on D2 Archive logs, Rollback and Temp on D3 All data files where needed on D2 Next Level up Add another JBOD for redo and move redo on to it Move Rollback and Temp to D2 At this point to get more throughput you have to take the JBOD to raw devices. Or try forcedirectio on these devices :) If even better performance is needed, more JBOD, for rollback and redo. If more disk spaces is needed, get another brick. Which leads me to the recent discussion on "proper way to tune" Huh????? Why make it so complex? Tuning from a blue collar DBA perspective: Assess the machine first No matter what your ratios or what your waiting for: sar to see if the machine is ever pinned vmstat to see your queues and paging iostat to see disk activity top at timed intervals to catch rogue jobs read your logs and config files Then talk to the users Is the "system slow" or is it specific jobs?" log on run ratio reports and query v$system_event Any ratio that is out of range needs to be tuned: Especially disk sorts to memory sorts For the infamous buffer cache ratio: < 10% throw memory at it > 97% take memory away For wait states here's a quick drive through for those who look at the number and say "Yeah but what do they mean" Time Wait Total Time Average Event # Waits Timeout In Hndrds Time -------------------------- --------- ------- ---------------------- SQL*Net more data to client ######### 0 680421 .005 SQL*Net message to client ######### 0 17590 0.000 SQL*Net message from client ######### 0 3953399703 35.511 - These are all communication to the client. ignore db file sequential read 39562523 0 12300885 .311 - Data read, 0.0003 seconds average wait, ignore. This number will climb if - IO is bottlenecked or inappropriate (ie using FTS for joins) rdbms ipc message 12440441 ####### 2774129387 222.993 - Internal machine communication ignore db file scattered read 12264223 0 6202885 .506 - Data read, 0.0005 seconds average wait, ignore. This is higher due to type of read. - Increase in this time indicates an IO bottleneck log file parallel write 4724477 67 2212249 .468 log file sequential read 2097709 0 1712615 .816 - Redo logs, with 2 pure raw JBOD I have got this down to about 0.25 hundredths buffer busy waits 1548548 0 408235 .264 - Memory latch contention 0.0002 seconds ignore - If Timeout or average increase, need to determine why contention is increasing control file parallel write 669234 0 376491 .563 pmon timer 662092 662074 203382329 307.181 - Internal waits ignore direct path read 573442 0 423920 .739 - Reads from tempfiles (sorting). Each segment is 10M in this db so ignore. log file sync 551716 15 459036 .832 - See redo above. db file parallel write 201166 0 610793 3.036 - writing updates and inserts 0.003 seconds ignore undo segment extension 100516 100507 27 0.000 - Don't know what this is, hope it's not critical ;) SQL*Net break/reset to client 92904 0 9522 .102 - Client communication ignore LGWR wait for redo copy 92844 7 736 .008 - Affected by the archiver not keeping up. The alert log and an infamous ratio - are better ways of detecting this. file open 76910 0 1874 .024 - note the number of occurances is getting very small, average time is low, ignore direct path write 69706 0 1408596 20.208 - writes for sorts, even though average time is high, research indicates that - the client does not wait for this, so internal ignore SQL*Net message to dblink 48680 0 7 0.000 SQL*Net message from dblink 48680 0 108414 2.227 - network or machine dependant ignore control file sequential read 45198 0 31664 .701 latch free 44849 30305 28693 .640 - Memory latch contention, notice rate of Timeout to # of waits - If average time increases, need to determine why contention is increasing - QUICKLY SQL*Net more data from client 43851 0 229528 5.234 enqueue 19946 19380 5973595 299.488 - I used to worry but ...., nothing ever happened so ignore - And the rest happen so infrequently ignore file identify 6961 0 496 .071 smon timer 6639 6612 203366288 30632.066 log file single write 2770 0 2998 1.082 rdbms ipc reply 1290 0 2302 1.784 log buffer space 1104 1 5507 4.988 db file single write 924 0 162 .175 log file switch completion 743 8 17454 23.491 refresh controlfile command 722 0 413 .572 pipe get 377 213 81693 216.692 library cache pin 316 0 152 .481 control file single write 231 0 164 .710 library cache load lock 72 6 877 26.069 single-task message 68 0 49 .721 switch logfile command 45 0 815 18.111 process startup 16 0 94 5.875 SQL*Net more data from dblink 16 0 0 0.000 row cache lock 5 0 0 0.000 db file parallel read 3 0 7 2.333 instance state change 2 0 0 0.000 Null event 1 1 410 410.000 reliable message 1 0 0 0.000 sort segment request 1 1 103 103.000 And the more you study your database the more you will understand of the above :) After you are aware of your systems problems, fix your config files and file positions and then chase down SQL issues. >From your users, capture the SQL run explain plan Run top, catch processes that use a full cpu for more than 30 seconds Capture the sql, run explain plan I have always ogled women. When I got married, (well started going out) I explained to my wife that I was making sure I had the best. But really, she's a good wife, I'm even allowed to have opinions. If she approves of them I'm allowed to have them :) The digest hit 983K on Friday, if I'm kind, 100K was content. >From the titles I see that there were performance problems with partitioned tables and bitmap indices? I can't help those who won't help themselves. And I don't receive HTML email. TTFN Off to figure out the relationship between multiblock_read_count and those index_optimizer thingies Dave -- Dave Morgan [EMAIL PROTECTED] 403 399 2442 -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Dave Morgan INET: [EMAIL PROTECTED] Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).