NT CLUSTERING On some platforms Windows NT can use up to 32 processors using SMP utilization manager but the 2GB address space limitation prevents the number of users from exceeding after a finite value. NT clustering provides a solution to the NT scalability issues by using Oracle Parallel Server (OPS), Oracle Fail Safe (OFS) and Oracle Parallel Query (OPQ). Oracle Corporation maintains a list of certified hardware and software combinations for used with Oracle Parallel and Oracle Fail Safe. OFS only provides availability by using a secondary node automatically using the same virtual network address if the primary node fails. OFS will automatically restart the Oracle database on the surviving node of the cluster and also rollback any uncommitted transactions at the time of the failure. OPS provides availability and scalability using NT clusters. Each node on the cluster runs an Oracle instance and all the instances have access to the same database on "shard disks". Every user on every node has simultaneous shared access to the ? Deploying, Managing, and Administering the Oracle Internet Platform Paper #260 / Page 8 entire database and OPS ensures the synchronization of transactions. As a result of this the workload gets distributed among the nodes. When using OPS, care should be taken to partition application data properly in order to minimize pinging and false pinging. Oracle Parallel Query divides large tasks, such as full table scans, into smaller tasks which can be processed in parallel and when combined with OPS it can span across nodes. The Distributed Lock Manager is integrated into the Oracle kernel allowing the DLM related issues to be resolved quickly since there is no dependency on 3rd party. Cluster management software is still provided by the OS vendor and is used along with Group membership services (GMS). The default locking mechanism is DBA locking. BACKUP AND RECOVERY STRATEGIES In very large database systems, the backup and recover procedures have to simplified in order to minimize the cost of downtime and the time taken to perform backup and recovery operations. The size of the backup files should be proportional to the transactional changes and not the size of the datafiles while the recovery time should be proportional to the amount of data recovered. - Backup Manager: NTBACKUP does not backup open files therefore Oracle provides OCOPY##.exe to copy open files to disk, which can then be copied by NTBACKUP to tape. The Recovery Manager of Oracle8 allows server based recovery, which minimizes the recovery problems by automating the backup, restore and recovery process and allowing the information to be placed in a recovery catalog. - Starting with Oracle8 an integrated method called Recovery Manager is provided for creating, managing and restoring the backups of a database while maintaining superior performance and high availability of the database. Using the recovery manager allows backing up of the entire database or a subset of the database in one operation while avoiding operator errors and also checking for database corruption. Other features include automatic parallelization of backup and recovery, minimizing redo generation, allowing hot and cold backups, supports tape backups in conjunction with vendor-supplied tape management software like Legato or Epoch. Using the recovery manager in Oracle8i, the recovery process for the entire database or part of it is very straightforward because the RMAN can restore the appropriate backups and archive logs as needed. Information about the backups and the archived logs is placed in a recovery catalog. Reports can be printed using the recovery catalog to get all the backup and recovery activities. One recovery catalog can be used to keep information on multiple Oracle8 databases. In order to avoid a single point of failure, the recovery catalog of one database should be placed in another database while the recovery catalog of the second should be placed in the first. Larger sites with multiple databases may use one recovery catalog for all of them thereby simplifying the administration of the catalog. For smaller databases, there is a RMAN mode, which allows the recovery catalog to be optional and gets all the needed information from the control file. When in this mode point-in-time recovery is not possible and neither is automatic recovery when the control file is not current. See Oracle Bulletin 108898.604, titled 'Automating Cold Backups on Windows NT', for examples of a script-based approach, as well. CONFIGURING AND TUNING The overall system performance depends on tuning the hardware, Oracle server, operating system and the application. Refer to the Server application developers guide for information on tuning the application. We will be focusing on the operating system and the Oracle8i server. The performance monitor can be used to track Oracle threads and determine how the system resources are utilized. Oracle8i adds a number of database-specific counters to the Windows NT Performance Monitor. There is a current limitation while using the performance monitor which allows only one instance to be monitored at a time and this instance is identified by the NT registry parameter HOSTNAME for Oracle hive. The registry is read on the fly but extreme care should be taken before modifying it and it is recommended to update the Emergency Repair Disk before making an update to it. An optimally tuned Oracle8i server on Windows NT has the following characteristics and these can be verified using the appropriate performance monitor counters: - Little or no Waiting on I/O which indicates that the CPU has work to do while there are outstanding I/Os - -Processor utilization ( Processor:%Processor time) ? Deploying, Managing, and Administering the Oracle Internet Platform Paper #260 / Page 9 - -Disk utilization ( LogicalDisk/PhysicalDisk: Disk Transfers/sec) - -Length of processor queue ( System: Processor Queue Length ) and - -Threads performing I/O ( Thread: %Processor time - Most of the CPU utilization is allocated to the shadow threads and not the background threads - - CPU utilization at the thread level (Thread: %Processor time) - Most of the CPU utilization is in the user mode and not the privileged mode. - -CPU time spent in the user versus privileged mode for the Oracle process (Process: %Usertime/%Privileged time) - Good response time - -This is highly dependent on the application and network tuning - System should be CPU bound. If the system is CPU-bound then high scalability can be achieved by adding processors. Verify the following to ensure that the system is not I/O bound. - -Oracle8 uses the asynchronous capabilities of Windows NT and therefore only one DBWR is needed. - -Isolate Sequential I/Os to their own controller volume. The redo logs are accessed in a sequential write-only manner therefore they must be placed on their own disk. - -Random I/Os should be balanced across drives. Datafiles are accessed in a random fashion and should be striped if possible. - -Disks containing redo logs should be mirrored. - -I/O rate capabilities should not be exceeded. Based on Compaq testing, random I/O should not exceed 60 I/Os /sec for 4GB drive, 50 I/Os/sec for 2GB drives or 40 I/Os/sec for 1GB and 500MB drives. Use the performance monitor to determine the number of I/Os/sec for each logical volume. Based on the fault tolerance level of the volumes determine the I/Os per disk. - No-fault tolerance used : (Disk reads + Disk writes )/#ofdrives - Mirroring(Disk reads + 2*Diks writes)/#ofdrives - Data guarding: (Disk reads + 4*Disk writes)/#ofdrives. Tuning the hardware is crucial and can be done by looking at the system documentation of the platform in use. [end 4 of x]