Hey everyone, just wanted to mention that we reproduced this problem on
all three of our SunFire X4140's as well.
We had Solaris on them, but decided to switch them to Debian 7.1.0 to
run KVM, as SmartOS doesn't support AMD virtualization hardware (yet).
Here's how easy it was to reproduce:
1) Plug in an Ethernet cable with Internet access and DHCP into the
first port (eth0).
2) Install Debian using a boot CD made from
debian-7.1.0-amd64-netinst.iso with the default options.
# uname -a
Linux master 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1+deb7u1 x86_64 GNU/Linux
3) Log in and install network bridging.
# apt-get install bridge-utils
4) Change the default /etc/network/interfaces file from:
----cut---------cut---------cut---------cut---------cut---------cut-----
# The primary network interface
allow-hotplug eth0
iface eth0 inet dhcp
----cut---------cut---------cut---------cut---------cut---------cut-----
to:
----cut---------cut---------cut---------cut---------cut---------cut-----
# The primary network interface
auto eth0
iface eth0 inet manual
# The bridged network interface
auto br0
iface br0 inet static
address 192.168.1.163
network 192.168.1.0
netmask 255.255.255.0
broadcast 192.168.1.255
gateway 192.168.1.1
dns-nameservers 208.67.222.222 208.67.220.220
bridge_ports eth0
bridge_fd 0
bridge_hello 2
bridge_maxage 12
bridge_stp off
----cut---------cut---------cut---------cut---------cut---------cut-----
5) Reboot the server
# reboot
As it boots, the system will reset when it tries to configure the
network, and the BIOS will log "Hypertranspot sync flood error."
Additional reboots do the same thing. The only way to get the server up
and running is to unplug your Ethernet cable from eth0, and then once
you see the main console login come up plug the cable back in. From
then on the server works as expected.
I've used Salvatore's little trick of adding in "pre-up /sbin/ifconfig
eth0 up" right before the "bridge_ports eth0" line in the br0 section,
and that allows the server to boot with the cable still in eth0, both
from a warm boot and from a cold boot. This makes me think that the
problem involves some sort of timing issues. So a big thanks to
Salvatore for what appears to be a usable workaround!
Just FYI, from the SunFire Server Diagnostics Guide, when the CPU
detects one of the following errors, it reboots immediately, and then on
start the BIOS inspects the machine registers and logs "Hypertransport
sync flood error".
1) The CPU detects an uncorrectable multi-bit DIMM error
2) CRC or link error on one of the Hypertransport links
3) System or parity error on a PCI bus
I would be willing to test any updates that attempt to fix this bug, as
I understand not everyone has X4140's lying around (lol).
Thanks!
Boyd