Hey Ethan, Our problem with 64 CPUs is the Oracle licensing cost. $2.5M exceeds our entire IT budget (I think). Not to mention that machines that can handle 64 CPUs -- HP's PA-RISC and Alphas and Sun's Ultras among others -- are prohibitively expensive for us. Also, if the server cluster ain't DEC's (Compaq/HP), it ain't right. DEC invented and perfected it. I worked on a VAXCluster in the 80's and it was very stable (two 11/750s and a 11/780 -- 2.3 whole VUPs between the three!). There's my 10 years of DEC SA bigotry sneaking in... :)
As a knee-jerk reaction, I would agree with your Top Three Downtime Causes, which I would lump into a single "Human Error" category. There is only so much one can do to prevent this, no matter what hardware or DB is in use. We've encountered all three on our "big" DB, with relatively minor whiplash. Our DBs are small in the Oracle World. The largest is only 30GB on disk. With proper setup, this could be only 1 or 2 hours for complete recovery. That could negate the need for incurring the high cost of an HA cluster (aside from RAC). I appreciate the feedback! This is exactly the stuff I'm looking for! Like I mentioned, this kinda thing really needs to be good shouting match over beer. Thanks! Rich Rich Jesse System/Database Administrator [EMAIL PROTECTED] Quad/Tech International, Sussex, WI USA -----Original Message----- Sent: Thursday, February 13, 2003 11:14 AM To: Multiple recipients of list ORACLE-L If you asked me last week I might not have formulated much of an opinion, but I have been tainted by Mogens presentation on RAC or Not To RAC. Here are some questions you need to ask... Why not go with a box capable of the CPU's you will eventually need. Why add machines when adding CPU's might be just fine. Will these apps really not run on 64 CPU's? The added complexity of RAC and administration needs to be a factor in calculating your target uptime? My experience has been that most database downtime is a result of the following items. 1. DBA/Unix admin errors. 2. Application errors (run away batch jobs) 3. User errors (truncate table) RAC doesn't fix any of these things. However, a stand-by running a few hours behind could provide feasible solutions to most of these items. Just recently I saw a HACMP cluster (not RAC) come down causing a 1 hour outage as a result of the instructions provided directly from an IBM support rep to the Unix admin. The complexity of HA was the issue, so point #1 only becomes more likely as you add the complexity of running RAC to your environment. If you could chart all this stuff I got to feel that at some point the likelihood of one of issues above surpasses the likelihood of an actual hardware failure causing an outage. I think another point made during the presentation is that some very unique and hard to pinpoint errors can arise from running RAC. Don't be surprised if the answer back from Oracle is very vague (i.e. perhaps parameter X is set to high when circumstance Y happens... My 2 cents... - Ethan -----Original Message----- Sent: Thursday, February 13, 2003 9:40 AM To: Multiple recipients of list ORACLE-L With all this discussion on "Why RAC?", I thought I'd chime in with our reasoning, at least as it stands before any testing. We currently have a few "major" databases for our ERP/MRP system, Engineering drawings, and "legacy" (I loathe that word) data. These databases are spread across three larger systems: Solaris, HP/UX, and OpenVMS. They are set up as any three independant systems with their own disks, own CPUs, own memory, etc. These relatively expensive systems are under utilized, and finally, are beginning to show their age (up to six years old). By combining these systems under a single system, we will be saving money in hardware cost (future upgrades and repair) as well as in service contracts, not to mention the utimate savings -- computer room floorspace! What I don't want to do is have the consolidation negatively affect the DBs in performance or downtime (perceived or real). So, the idea right now is to use "commodity" (read: "inexpensive") servers, dual Intel (AMD???) 1Us, with a SAN, and 9iRAC. The theory being that while we'll take an initial kick in the fiscal crotch with the Oracle licensing, since we currently refuse to let go of our Concurrent User, we'll come out ahead in the long run with the added performance and unlimited user (per CPU) licensing. Also, with the commodity servers, we can switch out the server for faster CPUs without incurring more licensing cost should the need arise (yes, Cary, I'm well aware of the "CPU Upgrade Myth"!). With our testing, I hope to see that we'll be able to provide better uptime and performance with RAC than the total sum of the current boxes (save for the uptime on the OpenVMS box, which has 10 minutes of total downtime in the past 770+ days). Any comments on this? In the interest of bandwidth and brevity, I've been way too brief here. This should really be discussed over Guinness. Thx! Rich Rich Jesse System/Database Administrator [EMAIL PROTECTED] Quad/Tech International, Sussex, WI USA -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Jesse, Rich INET: [EMAIL PROTECTED] -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Post, Ethan INET: [EMAIL PROTECTED] Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services --------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.net -- Author: Jesse, Rich INET: [EMAIL PROTECTED] Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services --------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).