Greetings,
As you have likely noticed, Tucows has experienced extended periods of
intermittent outages affecting a broad range of our systems over the
last 24 hours. Please be assured that we now believe that we have
rectified the problem causing the degraded service.
It is important to note that the following systems were not affected as
a result of these outages:
- All international systems including the Tucows mirror network.
- All registry systems operated by LibertyRMS including .info
etc.
The following systems were affected as a result of these outages:
- OpenSRS
- Domain Direct
- www.tucows.com main site
- tucows.com and opensrs.org email
For your benefit, the following is a rundown of the specific issues that
we encountered, including the time period during which service was
affected and the final resolution of the problem.
- Yesterday afternoon, all services located in our primary
datacenter were subject to intermittent network connectivity
problems. A Cisco 6500 series switch appeared to be at the
center of this issue and was quickly brought back within
normal operating parameters.
- Beginning shortly before 2300 EDT May 28 2001 (0300 UTC May
29 2001), we began to notice significantly degraded
performance from all systems located within our primary
datacenter, followed quickly by complete failure. Further
investigation indicated that the core switching gear had
failed. While system redundancy is in place, the erratic
behaviour was sufficient enough to cause a core failure.
Working in conjunction with Cisco, our network operations
team determined that complete replacement of the gear in
question was the most expedient solution. Replacement
hardware was shipped out from Nashville at roughly 0300 EDT
May 29 (0700 UTC) and arrived promptly at 0800 EDT (1200
UTC). The equipment was brought online at approximately 1230
EDT (1530 UTC) and the system was fully operational at 1300
EDT (1700 UTC)
Tucows has not yet determined the cause of the problem, but
our staff has narrowed it down to either repeated failures of
our Cisco gear, or a denial of service attack.
It is important to note that while we are completely satisfied that this
is an isolated issue, we are continuing our investigation until we can
ensure that the steps taken to solve the problem are lasting in nature.
We sincerely apologize for the inconvenience that this matter may have
caused you. Please be assured that we are working with all relevant
parties to ensure that we can continue to provide you with the highest
possible level of service going forward.
Sincerely,
Charles Daminato
TUCOWS Product Manager
OpenSRS - Special Operations