Whilst trying to help out to solve the situation of Daedalus being overloaded (Greg Ames and I were trying to split the load from Daedalus onto Nagoya), I _seriously_ screwed up Nagoya... I'm sorry...
I started off noticing that Nagoya was collapsing (or better, the network was collapsing) when the traffic was roughly at 5 mbps, although the system was doing just fine (load average 1.4 with GUMP running). I figured out it was something to do with the interfaces, and therefore had Justyna change some cables over in the lab, and after seeing that the interfaces were still running at 10 mbps half-duplex (for some odd weirdness that I still didn't understand), I forced them to be a 100 mbps full-duplex (the switch supports it apparently)... Well, that was the root of all problems... Forcing the interface at 100 mbps made the whole network around Nagoya collapse (I suspect, then that the switch is broken), including Nagoya's console... Without access to the console, and with the network interfaces sending random packets on the ethernet segment, the only possible solution was to physically power-off the system and hope for a better chance of interfaces auto-configuration at boot-up... Once I had that done (thank Justy again), I had access to the console (serial), but at the same time Nagoya didn't want to boot properly, it wasn't seeing the SCSI/fiber-optics disk array... That's where I noticed that I fucked up big time... Some times in the past (like 6 months ago), I removed a bunch of Solaris packages as documented by Sun Blueprints ("Security through System Minimization"), and since everything was working, I never actually thought that something bad could have happened... Well, what happened was that although the Solaris kernel had still the modules for the disk array in memory, well, those were not available anymore on the disk, and therefore, major pain at the next reboot... Now, I managed to restore the modules, reconfigure the system and have it up-and-running once again, but at the same time I didn't fix the problem afflicting the network... Therefore, Nagoya is up and running as it was before, but it can't really hold more than 10 mbps half-duplex traffic (therefore 5 mbps of real bandwidth), because of some random stuff happening on the ethernet segment... I'm sorry if everything got really fucked up this afternoon, hopefully the situation should get back to normal once the various queues on the different mail systems get flushed... Justy is going to file a request with Sun's hardware support to try and figure out why out of the 155 mbps network we have available for Nagoya we can use only 5 mbps (and why the switch and/or Nagoya's interfaces are behaving so strangely), but in the meantime, if anyone has experience with a "3Com SuperStack II 1000 Switch" and Sun HME interfaces please let me know... Really sorry about what happened, but I'm a moron and that's more than what I need to say... Pier -- To unsubscribe, e-mail: <mailto:general-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:general-help@;jakarta.apache.org>