Title: Message
from Technet... HTH
Jack

Understanding -1018 Errors

Many Microsoft Exchange Server 5.5 customers have reported a particular error, called a "-1018," during day-to-day operation of Exchange Server. This market bulletin describes the conditions that cause this error to occur, and how you can use it to ensure higher levels of reliability and scalability for your Exchange server.

The -1018 Error and Why It Occurs

Rather than accept damaged data that could compromise your mail system and potentially cause systematic crashes over time, Exchange Server displays the -1018 error to alert you to the possible existence of bad data before it is backed up. Exchange Server displays this error if it detects potential data damage that is caused by either the operating system or by the failure of one of the underlying hardware subsystems that runs Exchange Server.

When Exchange Server makes a request to the hardware to read data from the database, it compares the returned data page and the calculated data checksum with the page number and checksum that were placed on the page when it was written. If a mismatch is detected, a -1018 error is generated. Note that this warning indicates that only the verification has failed. It is possible that the data on disk has not been damaged, and that a transient error has occurred in the data delivery system instead. To determine if the error was transient, Exchange Server attempts to read the data sixteen times. If the correct data is still not returned, Exchange logs an event ID 200, which states that the read retry has failed. An event ID 201 indicates that the read retry has succeeded.

What to Do When a -1018 Error Occurs

If additional confirmation of data damage is required, Microsoft recommends that you run the Esefile utility, which is available in the Support directory of Exchange Server 5.5 Service Pack 3.

Get on the 'phone to PSS before you touch this... ;-) 

Due to the fact that the -1018 error indicates a likely problem in a hardware subsystem, full backups of Exchange Server will not be completed while this problem exists. However, Exchange Server will continue to run until you can investigate the cause of the warning.

Why -1018 Errors are Crucial for Successful Data Backup

The -1018 error is used to provide an effective early warning for subsystem failure. By carefully checking the validity of data returned by the operating system and hardware subsystems, Exchange Server 5.5 ensures that your system contains valid data. In particular, all data is checked during backup to guarantee that it contains no errors. Without this level of error detection, your data might be damaged by a failing subsystem without your knowledge, which could result in the backup of damaged and unrecoverable data.

How to Avoid -1018 Errors

Customers should have a well thought out response plan for dealing with the occurrence of -1018 errors, and that plan should include escalation paths for hardware issues both within their organization and with their system vendor.

The following are deployment and administration areas that should be carefully reviewed when a system is experiencing -1018 errors:

Hardware fault tolerance. The most common cause of -1018 errors is when an Exchange site has no hardware fault tolerance. This typically occurs when backup is the only means of fault tolerance, and any part of the hardware subsystem fails. In this case, we recommend that customers adopt a higher level of hardware fault tolerance.

Operational practices. Good operational practices can reduce or eliminate other situations that give rise to -1018 messages. Specific attention should be paid to:

· Correct SCSI termination. In the case of a -1018 error, both SCSI termination and SCSI equipment should be checked for failure—SCSI problems commonly lead to data loss and corresponding -1018 errors.

· Caching controller management. If a caching controller is used with Exchange Server, it is critical that this controller be fully fault tolerant. This means that the controller must have a battery backup so that data in its cache will not be lost during a power failure, and can be moved to a new card if the controller card itself fails. Cache mirroring is also recommended so that memory errors can be easily corrected.

· Drive replacement. If a server is turned off without a clean shutdown and a drive is replaced, the RAID rebuild of the drive when the server is restarted will cause cached data to be lost.

Third-party tools. In rare cases, third-party software tools have been known to cause -1018 errors, particularly in cases where these programs directly modify the Exchange database files. Any direct modification to these files will cause a -1018 error. Contact your third-party vendor for more information.

Hardware bugs. Controller and drive firmware bugs can also be the cause of -1018 errors. Generally, these problems are unknown prior to subsystem shipment and cannot be caught by the vendor's fault tolerance system or by their standard diagnostic tools. If you suspect that you have a lower level fault-tolerance failure, it is important to contact your hardware vendor and start a detailed investigation to find the root cause of the error. Hard disks reporting high levels of soft "recovered" errors should be considered for replacement.

Conclusion

Exchange Server 5.5 is one of the few applications that provide such an advanced level of data integrity verification. Its -1018 errors provide a proactive reporting of possible hardware subsystem errors, which allow you to better ensure the continued reliability of your messaging infrastructure.

-----Original Message-----
From: MHR(Michael Ross) [mailto:[EMAIL PROTECTED]]
Sent: 27 March 2002 14:55
To: MS-Exchange Admin Issues
Subject: RE: datbase

THIS MESSAGE ORIGINATED ON THE INTERNET - Please read the detailed disclaimer below.
-----------------------------------------------------------------------------------

got info on those errors?
List Charter and FAQ at:
http://www.sunbelt-software.com/exchange_list_charter.htm

Reply via email to