Le 27/05/2009 01:52, Samiuela LV Taufa a icrit :
Simon Morvan wrote the following on 27/05/2009 2:28 AM:Hello all,
I've set up two OpenBSD boxes to act as redundant firewalls in front of
our network and I experience a strange behavior :

After a couple of hours/days one of the box stop functioning properly :
no ping, no more SSH access but I still capture CARP avertisement on the
network segments (when it occurs on the master). As a result, when it
happens on the master, the slave does not take over.

When it happens on the slave, the switch "sees" intermittently  the
virtual CARP mac on the slave port so it disturb the master routing
operations.

When I hook up a screen on the machine, I get back the login screen but
everything is frozen.

I really don't know where I should start looking at to troubleshoot the
issue.

Here's the dmesg, the two boxes are identical. I do VLAN routing on em0
and pfsync on re0 (@ 100BaseFD to be sure there's no issue with the
re(4) driver) :

OpenBSD 4.5 (GENERIC) #1749: Sat Feb 28 14:51:18 MST 2009
      dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
RTC BIOS diagnostic error 80<clock_battery>
cpu0: Intel(R) Atom(TM) CPU 330 @ 1.60GHz ("GenuineIntel" 686-class)
1.60 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,TM2,CX16,xTPR
real mem  = 2135666688 (2036MB)
avail mem = 2056806400 (1961MB)
RTC BIOS diagnostic error 80<clock_battery>
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 12/31/08, SMBIOS rev. 2.4 @
0xe3590 (23 entries)
bios0: vendor Intel Corp. version "LF94510J.86A.0140.2008.1231.0012"
date 12/31/2008
bios0: Intel Corporation D945GCLF2
acpi0 at bios0: rev 0
acpi0: tables DSDT FACP APIC WDDT MCFG ASF!
acpi0: wakeup devices SLPB(S4) P32_(S4) UAR1(S4) UAR2(S4) PEX0(S4)
PEX1(S4) PEX2(S4) PEX3(S4) PEX4(S4) PEX5(S4) UHC1(S3) UHC2(S3) UHC3(S3)
UHC4(S3) EHCI(S3) AC9M(S4) AZAL(S4)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: apic clock running at 134MHz
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 2 pa 0xfec00000, version 20, 24 pins
ioapic0: misconfigured as apic 0, remapped to apid 2
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P32_)
acpiprt2 at acpi0: bus 1 (PEX0)
acpiprt3 at acpi0: bus -1 (PEX1)
acpiprt4 at acpi0: bus 2 (PEX2)
acpiprt5 at acpi0: bus 3 (PEX3)
acpiprt6 at acpi0: bus -1 (PEX4)
acpiprt7 at acpi0: bus -1 (PEX5)
acpicpu0 at acpi0
acpibtn0 at acpi0: SLPB
bios0: ROM list: 0xc0000/0xae00! 0xcb000/0x1000 0xcc000/0x1000
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 "Intel 82945G Host" rev 0x02
vga1 at pci0 dev 2 function 0 "Intel 82945G Video" rev 0x02
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
intagp0 at vga1
agp0 at intagp0: aperture at 0x80000000, size 0x10000000
inteldrm0 at vga1: apic 2 int 16 (irq 11)
drm0 at inteldrm0
azalia0 at pci0 dev 27 function 0 "Intel 82801GB HD Audio" rev 0x01:
apic 2 int 22 (irq 9)
azalia0: codecs: Realtek ALC662
audio0 at azalia0
ppb0 at pci0 dev 28 function 0 "Intel 82801GB PCIE" rev 0x01: apic 2 int
17 (irq 255)
pci1 at ppb0 bus 1
re0 at pci1 dev 0 function 0 "Realtek 8168" rev 0x02: RTL8168C/8111C
(0x3c00), apic 2 int 16 (irq 11), address 00:1c:c0:c3:40:fa
rgephy0 at re0 phy 7: RTL8169S/8110S PHY, rev. 2
ppb1 at pci0 dev 28 function 2 "Intel 82801GB PCIE" rev 0x01: apic 2 int
18 (irq 255)
pci2 at ppb1 bus 2
ppb2 at pci0 dev 28 function 3 "Intel 82801GB PCIE" rev 0x01: apic 2 int
19 (irq 255)
pci3 at ppb2 bus 3
uhci0 at pci0 dev 29 function 0 "Intel 82801GB USB" rev 0x01: apic 2 int
23 (irq 10)
uhci1 at pci0 dev 29 function 1 "Intel 82801GB USB" rev 0x01: apic 2 int
19 (irq 11)
uhci2 at pci0 dev 29 function 2 "Intel 82801GB USB" rev 0x01: apic 2 int
18 (irq 9)
uhci3 at pci0 dev 29 function 3 "Intel 82801GB USB" rev 0x01: apic 2 int
16 (irq 11)
ehci0 at pci0 dev 29 function 7 "Intel 82801GB USB" rev 0x01: apic 2 int
23 (irq 10)
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb3 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xe1
pci4 at ppb3 bus 4
em0 at pci4 dev 0 function 0 "Intel PRO/1000GT (82541GI)" rev 0x05: apic
2 int 21 (irq 10), address 00:1b:21:38:77:25
ichpcib0 at pci0 dev 31 function 0 "Intel 82801GB LPC" rev 0x01: PM disabled
pciide0 at pci0 dev 31 function 1 "Intel 82801GB IDE" rev 0x01: DMA,
channel 0 configured to compatibility, channel 1 configured to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 ignored (disabled)
pciide1 at pci0 dev 31 function 2 "Intel 82801GB SATA" rev 0x01: DMA,
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide1: using apic 2 int 19 (irq 11) for native-PCI interrupt
wd0 at pciide1 channel 0 drive 0:<TS32GSSD25S-M>
wd0: 1-sector PIO, LBA48, 30560MB, 62586880 sectors
wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5
ichiic0 at pci0 dev 31 function 3 "Intel 82801GB SMBus" rev 0x01: apic 2
int 19 (irq 11)
iic0 at ichiic0
admtm0 at iic0 addr 0x2d: 47m192
spdmem0 at iic0 addr 0x50: 2GB DDR2 SDRAM non-parity PC2-6400CL5
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb3 at uhci2: USB revision 1.0
uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb4 at uhci3: USB revision 1.0
uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at ichpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0:<PC speaker>
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
mtrr: Pentium Pro MTRR support
softraid0 at root
root on wd0a swap on wd0b dump on wd0b

You might like to add your configuration for the carp as it may be relevant

/etc/hostname.carpX
I reply to the list as I know that some guys want to track this issue as well.

Here's the network config.

em0 is connected to the switch, admin vlan (108) as native, other vlan as 802.1q
re0 is connected to the other firewall (direct connection)

[ ISP Router ] <-- vlan 107 --> [ Firewalls ] <-- vlan 103 --> [ Servers ]

Here's the hostname.if config

===== fire01 =====

# cat /etc/hostname.em0
inet 10.19.0.2 255.255.255.0 NONE

# cat /etc/hostname.re0
inet 10.4.0.9 255.255.255.248 NONE media 100baseTX

# cat /etc/hostname.vlan107
inet 89.185.54.114 255.255.255.248 NONE vlan 107 vlandev em0

# cat /etc/hostname.vlan103
inet 10.16.4.2 255.255.255.0 NONE vlan 103 vlandev em0

# cat /etc/hostname.carp108
inet 10.19.0.1 255.255.255.0 NONE vhid 108 carpdev em0 pass ****

# cat /etc/hostname.carp107
inet 89.185.54.113 255.255.255.248 NONE vhid 107 carpdev vlan107 pass *****

# cat /etc/hostname.carp103
inet 10.16.4.1 255.255.255.0 NONE vhid 103 carpdev vlan103 pass ***
inet alias 89.185.54.97 255.255.255.240
inet alias 212.43.237.49 255.255.255.240

# cat /etc/hostname.pfsync0
up syncpeer 10.4.0.10 syncdev re0

===== fire02 =====

# cat /etc/hostname.em0
inet 10.19.0.3 255.255.255.0 NONE

# cat /etc/hostname.re0
inet 10.4.0.10 255.255.255.248 NONE media 100baseTX

# cat /etc/hostname.vlan107
inet 89.185.54.115 255.255.255.248 NONE vlan 107 vlandev em0

# cat /etc/hostname.vlan103
inet 10.16.4.3 255.255.255.0 NONE vlan 103 vlandev em0

# cat /etc/hostname.carp108
inet 10.19.0.1 255.255.255.0 NONE vhid 108 advskew 100 carpdev em0 pass oing7Quo

# cat /etc/hostname.carp107
inet 89.185.54.113 255.255.255.248 NONE vhid 107 advskew 100 carpdev vlan107 pass tae2Ir2u

# cat /etc/hostname.carp103
inet 10.16.4.1 255.255.255.0 NONE vhid 103 advskew 200 carpdev vlan103 pass Eumo2cei
inet alias 89.185.54.97 255.255.255.240
inet alias 212.43.237.49 255.255.255.240

# cat /etc/hostname.pfsync0
up syncpeer 10.4.0.9 syncdev re0

================
Further debugging ideas ?
No, i got no new hang since my last post.
When the error situation occurs, what happens if you 'tap/push' the
interface with an ifconfig (e.g. ifconfig em0)
As far as the local console is frozen when I hook up I've not even a change to type a command...
The Intel interfaces on OpenBSD are pretty much rock solid, but in 4.3
the bge interfaces did have this similar behaviour, which could be
temporarily fixed by invoking ifconfig on the interface. If this is the
same behaviour, then having that information may help the developers.
In my initial setup, the vlan routing was on the re0 card and the pf sync on the em0 card. As I suspected a problem with re0, I switched the config as it is now. I'll be glad if you could confirm that it's better to use the Intel card for business traffic and the Realtek one for sync purpose.

Sam T.

Thanks for your concern

--
Simon.

Reply via email to