High Interrupt Mode Reported by 'Top' for Soekris 4801
I am a new owner of two Soekris 4801s running OpenBSD 3.7 (generic) with pf/pfsynch/carp for redundant firewalling. I've encountered a problem with high interrupts (and some packet loss), and after having perused the on-line FAQ/forums and finding nothing that I could identify as matching the symptoms I've observed, I am now looking for pointers on how to isolate the problem and perhaps fix it. I have sis0 in use for the outer interface, sis2 for the inner, sis1 for pfsync. There is an inner carp'd interface address (carp0) and an outer (carp1). The configuration is generally along the lines of the FAQ and man pages. When traffic through the Soekris reaches approximately 4Mbs, the interrupt mode reported by top reaches 75% or higher and there is a measurable packet loss (1% - 5% or so). From 'pfctl -si', the congestion counter goes up rapidly when the interrupts are highest. The interrupt mode increases as the traffic volume increases, and goes down to about 1% when I failover to the other firewall. When I failover, I observe exactly the same behavior on the newly active firewall. Checking forums, I see that there have been reports of very high interrupts on the sis device in the past for OpenBSD on Soekris, but I read that these were all corrected in recent OpenBSD releases (and the problem I read about only applied whenever one sis interface was left 'down', which is not the case for my circumstances since all interfaces are in use). I've checked with Soekris, and they've not heard of symptoms such as I describe with OpenBSD 3.7. I've not noticed anything amiss in dmesg or /var/log/messages (well, all sis devices are sharing IRQ 10 but this is normal on a 4801, the FAQ states that this is not a problem, and other 4801 users haven't reported symptoms like the ones I describe). I haven't posted dmesg or other info in this message (I thought it might be rude to do so without being asked). Can anyone offer pointers on how I might go about isolating this problem? Bill -- William Bloom| Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.
Re: High Interrupt Mode Reported by 'Top' for Soekris 4801
I wondered that as well, but there appear to be lots (so it appears from other postings I found using google) of 4801s in use with OpenBSD, doing essentially the same thing as myself (Soekris w/ carp/pf/pfsynch). Yet, AFAICT, I'm the only one who's posted about this symptom. Since there are lots of people who do what I do, and if the problem were indeed that the 4801 processor is too wimpy, then wouldn't there be more problems like mine mentioned in the lists? And I'm running into high interrupts with only about 4Mbs throughput while others have claimed much higher values. Before I used this firewall that I have now, I used m0n0wall on FreeBSD. I chose OpenBSD over m0n0wall/FreeBSD due to m0n0wall state table limitations and lack of mature redundance features. But the m0n0wall handled this much traffic, and more, with a relatively low interrupt mode. As widely as OpenBSD is used on Soekris for firewalling compared to m0n0wall/FreeBSD with relatively few problems, I'm still not quite ready to decide that I haven't gotten myself a setup flaw somewhere. Just can't figure out where it could be. Bill Theo de Raadt wrote: >>>If the Soekris did not come with ethernet chipsets which are just >>>slightly over the bar of rl(4), the wimpy processor in the machine >>>might be able to cope. >> >>Throughput is only marginally better using an em in the pci slot of a >>4801. I think there's some other problem. > > > Yeah -- the super wimpy processor. -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.
Re: ssh key question
I've done this precise sort of thing on a set of Solaris machines (duplicated the SSH host key) that participate in a cluster. There is no reason I can imagine why this wouldn't be a reasonable thing for you to do for the circumstances you describe. Decide which machine's SSH host key is the one to be used for both machines, then copy /etc/ssh/ssh_host*key* from that machine to the other. You may like to first save the old keys from the target machine in another backup directory for fallback, just in case an unexpected problem arises later. Once this is done, any SSH clients who have established connections to the 2nd machine in the past while it was still using its original host key may now still have that old public key in their private 'known hosts' list. That's OK, but the user of the SSH client may see a warning that a host-spoof is suspected as soon as he/she tries to connect (after the host key has been replaced). So you might get a few phone calls. If possible and practical, it would be good to check all the SSH clients' 'known hosts' lists and remove the obsolete entry (it will get recreated automatically later during the next SSH connection). Bill [EMAIL PROTECTED] wrote: > Maybe this is slightly off topic because it is more of an ssh question, > sorry. > > I have two openbsd boxes running sshd. They are mirrors of each other, and > we switch between them every two weeks. They have their own IP numbers, > 10.1.1.42, and 10.1.1.43, but whichever machine is the production box gets > the IP number 10.1.1.44 and you can no longer get to that machine via it's > own IP number. > > Currently all employee's telnet into the production box. I want to get that > switched over to ssh. The trouble is the host key appearing to change every > two weeks. Can I just duplicate the host key from one box onto the other > box? And which key file[s] would that be that I need to copy? Or do I need > to see about turning off host key checking on our client? > > --ja > -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.
Re: CARP interface incorrectly comes up as INIT on boot
If I'd had this experience, I'd be tempted to use tcpdump on whichever physical interface is carpdev for the suspect carp interface in order to verify that multicast is enabled on your switch. With carp interfaces up, you should see periodic multicast messages. If you don't see any, then you've found your problem (and you need to revisit the switch configuration in order to fix it). I've seen a few cases in the past where action had to be taken in order to enable multicast on a switch; else CARP (or VRRP, or HSRP) of course fails to transition to proper states. You mention that this machine is not a firewall and you're not using pfsync, but you didn't say right out loud whether you're running pf. Probably you're not, but there do exist some non-firewall applications where pf is used, and I don't know whether your machine falls into that category. If it does, then also make sure the pf ruleset isn't blocking multicast. Bill Tim wrote: > I'm using CARP under 3.7 release version on two boxes that aren't firewalls, > so > no pfsync involved and CARP configured as described in the FAQ. What I'm > seeing > is that the box I've designated as BACKUP always boots with carp0 as INIT and > carp1 and carp2 both come up BACKUP as expected. The other box always boots > with all 3 carp interfaces correctly as MASTER. On the backup box, I can > execute 'ifconfig carp0 up' and the interface correctly transitions to > BACKUP. > To prove to myself that this was not a problem with that particular box, I > tried > switching the roles making the backup the master and vice versa and the > problem > moves to the other box. Here's the output of ifconfig -A on the backup box > and > I can supply more info if needed: > > lo0: flags=8049 mtu 33224 > inet 127.0.0.1 netmask 0xff00 > inet6 ::1 prefixlen 128 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 > pflog0: flags=0<> mtu 33224 > pfsync0: flags=0<> mtu 2020 > enc0: flags=0<> mtu 1536 > dc0: flags=8943 mtu 1500 > address: 00:10:a4:c7:51:4e > media: Ethernet autoselect (100baseTX full-duplex) > status: active > inet 192.168.0.3 netmask 0xff00 broadcast 192.168.0.255 > inet6 fe80::210:a4ff:fec7:514e%dc0 prefixlen 64 scopeid 0x5 > inet 192.168.0.12 netmask 0xff00 broadcast 192.168.0.255 > inet 192.168.0.22 netmask 0xff00 broadcast 192.168.0.255 > inet 192.168.0.42 netmask 0xff00 broadcast 192.168.0.255 > carp0: flags=8802 mtu 1500 > carp: INIT carpdev dc0 vhid 4 advbase 1 advskew 100 > inet 192.168.0.20 netmask 0xff00 broadcast 192.168.0.255 > carp1: flags=8843 mtu 1500 > carp: BACKUP carpdev dc0 vhid 3 advbase 1 advskew 100 > inet 192.168.0.10 netmask 0xff00 broadcast 192.168.0.255 > carp2: flags=8843 mtu 1500 > carp: BACKUP carpdev dc0 vhid 6 advbase 1 advskew 100 > inet 192.168.0.40 netmask 0xff00 broadcast 192.168.0.255 > > Tim > -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.
Zero PF Counters
Perhaps I've misread the man page, but it's not obvious to me how to zero the PF counters. For example, 'pfctl -si' shows a non-zero congestion counter, and I'd like to clear that counter after I think the congestion issue is remedied. But I see no way to do that (apart from a reboot). How to do this? Change in subject... One odd symptom I've experienced is that permitted users will login (SSH) to a host behind the firewall successfully, work with the system for a few minutes, then get disconnected suddenly. When I TCP dump from the login host, I see his/her session established successfully and work begins. Then, a few minutes after successful flow of traffic both directions, the user's desktop sends a long flurry of TCP resets as the connection is lost. When I disable PF (pfctl -d) on the firewall, the symptom vanishes. Now, if the ruleset had handled the TCP state wrongly, then I would have expected the TCP connection to not have survived long enough for the user to get several minutes of work done. The firewall's pflog (block log) shows no packets dropped for these connections, and there are no entries for packets dropped due to congestion. What's an interpretation of this? I am baffled for the moment. Another change in subject... The PF man page gives meager detail about the congestion counter. And the only FAQ items for this that I can find are related to queueing (and I don't have queues in my ruleset). What is the meaning of a non-zero congestion counter, and what action is PF taking when the congestion counter is incremented? Bill -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.
Re: "keep state" and PF Queues
The PF queueing FAQ page at http://www.openbsd.org has a wealth of info that seems to nicely clarify the pf.conf man page. I recall that the FAQ contains an example much as you describe (as I recall, specifying a queue for -incoming- traffic will indeed cause that traffic to be processed through the named queue as it is -outgoing-). Bill Brian A. Seklecki wrote: > Would anyone like to elaborate on the impacts of using "keep state" on > conjunction with pass rules that assign traffic to queues? > > One might assume that inverted traffic flows would also be queued, > however that would break the "traffic can only be queued egress an > interface" rule... > > There should be some remarks on this in pf.conf(5) > > TIA, > > ~BAS > -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.
Relative Firewall Performance: 3.7 and 4.0
I recently upgraded a Soekris 4801 firewall from OpenBSD 3.7 to 4.0. The configuration for firewalling (pf.conf) is unchanged. On 3.7, at peak throughput I normally saw maybe 65% - 76% interrupt mode and little or no congestion. However, on 4.0 with similar traffic levels I see 85% - 95% interrupt mode and the congestion counter increments fairly rapidly. Of course, one cannot expect best performance from a Soekris due to the Ethernet chipsets, but it was -adequate- on 3.7. I've spent a little time google'ing for any observations on a difference in performance between 3.7 and 4.0 and have found nothing useful so far. Have other list members had this experience or know of anyone else who has? If so, has anyone had any favorable performance tuning experiences that might help me out? So far, the only tuning change I've made for 4.0 was to increase net.inet.ip.ifq.maxlen from 50 to 150, but this appears to have had negligible impact. Bill -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | MphasiS Healthcare Solutions 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by thisB e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that isB privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this e-mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.
Re: Relative Firewall Performance: 3.7 and 4.0
Hmm, I'm rereading the product description for the Soekris lan1621, which would go into my 4801's PCI slot and give me 2 enet ports. It claims 'High performance PCI busmaster interface with large buffers and interrupt holdoff'. Would you have higher hopes for this than the on-board enet ports? Do you know whether the OpenBSD 4.0 sis driver would support the interrupt holdoff feature? Has anyone on this list actually tried a lan1621 on a Soekris 4801 in an effort to boost performance, and were you satisfied with the results? Bill Stuart Henderson wrote: > On 2007/02/23 16:27, William Bloom wrote: >> I recently upgraded a Soekris 4801 firewall from OpenBSD 3.7 to 4.0. The >> configuration for firewalling (pf.conf) is unchanged. On 3.7, at peak >> throughput I normally saw maybe 65% - 76% interrupt mode and little or no >> congestion. However, on 4.0 with similar traffic levels I see 85% - 95% >> interrupt mode and the congestion counter increments fairly rapidly. > > you might get a small improvement if you optimize the pf ruleset. > >> Of course, one cannot expect best performance from a Soekris due to the >> Ethernet chipsets, but it was -adequate- on 3.7. > > ethernet chipsets make little difference, plug an em(4) in and you'll > see pretty much the same. it's the PCI controller (or lack thereof) > that's the problem. > > fwiw, WRAP manage about a 1/3 more throughput from a similar processor, > but I'm not quite sure how. > -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | MphasiS Healthcare Solutions 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by thisB e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that isB privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this e-mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.
Re: Relative Firewall Performance: 3.7 and 4.0
How 'bout the Commell EMB-564VG then, as an alternative to Soekris? I've seen a few postings that seem to show high regard. Billl On Feb 24, 2007, at 3:49, Stuart Henderson wrote: On 2007/02/23 18:58, William Bloom wrote: Hmm, I'm rereading the product description for the Soekris lan1621, which would go into my 4801's PCI slot and give me 2 enet ports. It claims 'High performance PCI busmaster interface with large buffers and interrupt holdoff'. Would you have higher hopes for this than the on-board enet ports? No, sorry. lan1621 and onboard use the same chip. Do you know whether the OpenBSD 4.0 sis driver would support the interrupt holdoff feature? You can modify it to fairly easily, but I didn't find it helping very much and it increases latency a bit when traffic is low. Has anyone on this list actually tried a lan1621 on a Soekris 4801 in an effort to boost performance, and were you satisfied with the results? Not 1621, but using a half-decent gig nic doesn't improve performance, I doubt it would help very much. (mind you the gig nic is designed for a standard system and is unlikely to do very much interrupt mitigation at such low traffic levels as max out the Geode cpu so there may be some tuning that could be done, nothing one-size-fits-all though). I think if you have sufficient traffic that this is a problem, you could really do with something faster to leave yourself some headroom to cater for 'unusual' situations (worm activity, etc) too. There are quite a few single-board computers to choose from that might be suitable. -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com
Re: site-to-site vpn 4.0 to cisco 3000
time - 3600seconds Lan-to-Lan connection interface - external(2.2.2.2) connection type - bi-directional peer - 1.1.1.1 presharedkey - openbsdrules authentication - esp/sha/hmac160 local network - 10.10.0.0 (wildcard mask 0.0.255.255) remote network - 192.168.1.0 (wildcard mask 0.0.0.255) SA authentication - esp/sha/hmac160 encryption - 3DES-168 mode - tunnel Lifetime - 1200seconds On the OpenBSD box I start isakmpd with 'isakmpd -K', then ipsecctl -f /etc/ipsec.conf After a bit of time I see this in /var/log/messages isakmpd[18700]: ipsec_validate_id_information: dubious ID information accepted And the cisco log shows this 2 02/25/2007 10:37:16.280 SEV=5 IKE/172 RPT=7394 1.1.1.1 Group [1.1.1.1] Automatic NAT Detection Status: Remote end is NOT behind a NAT device This end is NOT behind a NAT device 6 02/25/2007 10:37:16.380 SEV=4 IKE/119 RPT=6680 1.1.1.1 Group [1.1.1.1] PHASE 1 COMPLETED 7 02/25/2007 10:37:16.380 SEV=4 AUTH/22 RPT=6575 1.1.1.1 User [1.1.1.1] Group [1.1.1.1] connected, Session Type: IPSec/LAN-to -LAN 9 02/25/2007 10:37:16.380 SEV=4 AUTH/84 RPT=52 LAN-to-LAN tunnel to headend device 1.1.1.1 connected 10 02/25/2007 10:37:16.500 SEV=5 IKE/25 RPT=9162 1.1.1.1 Group [1.1.1.1] Received remote Proxy Host data in ID Payload: Address 1.1.1.1, Protocol 0, Port 0 13 02/25/2007 10:37:16.500 SEV=5 IKE/24 RPT=27 1.1.1.1 Group [1.1.1.1] Received local Proxy Host data in ID Payload: Address 2.2.2.2, Protocol 0, Port 0 16 02/25/2007 10:37:16.500 SEV=4 IKE/61 RPT=27 1.1.1.1 Group [1.1.1.1] Tunnel rejected: Policy not found for Src:1.1.1.1, Dst: 2.2.2.2! 18 02/25/2007 10:37:16.500 SEV=4 IKEDBG/97 RPT=52 1.1.1.1 Group [1.1.1.1] QM FSM error (P2 struct &0xe7ed120, mess id 0xac462db5)! &0xe7ed120, mess id 0xac462db5)! Any ideas why I'm getting the "tunnel rejected" error? Does anyone see any glaring mistakes? After searching the archives and google'ing, I gather other folks are doing this without issue. I have complete control over both devices so if there's any other info I can provide let me know. I realize this isn't a cisco support list so if it's the cisco's fault I'll go bother someone else. I appreciate your time, thank you. please cc me as I'm not subscribed to the list. _________ With tax season right around the corner, make sure to follow these few simple tips. http://articles.moneycentral.msn.com/Taxes/ PreparationTips/PreparationTips.aspx?icid=HMFebtagline -- William Bloom [EMAIL PROTECTED]
Re: site-to-site vpn 4.0 to cisco 3000
The man page for isakpd.conf indeed sheds some light, there's an example in that page that show's how to specify lifetimes for both phases... [General] Default-phase-1-lifetime= 3600,60:86400 Default-phase-2-lifetime= 1200,60:86400 At this point, if the lifetimes indeed agree, then I myself would be a little puzzled over why the proposal would be rejected. Both endpoints are configured to use the peer address as the ID? At first blush, your settings seem all kosher. I would agree, though, that it certainly appears that there must still be some sort of inconsistency between the proposals. Another suggestion... It appears that you've been trying to initiate the VPN from one end, perhaps the OpenBSD end. Probably by sending a ping from the 1st site to the 2nd. Restart both ends to clear out any SAs that have been negotiated and try to ping from the -other- end in order to see what happens when the VPN negotiation is initiated the opposite direction. The log entries might show something useful. Also, did the OpenBSD logs show any detail of the failure from the last attempts apart from the mismatched SA queries? Bill On Feb 25, 2007, at 14:48, c l wrote: Hello, thanks for the reply, it helped if I'm not mistaken. I think I'm getting closer but still no joy. See below. From: William Bloom <[EMAIL PROTECTED]> To: c l <[EMAIL PROTECTED]> CC: misc@openbsd.org Subject: Re: site-to-site vpn 4.0 to cisco 3000 Date: Sun, 25 Feb 2007 14:02:13 -0700 I've setup maybe 78 LAN-to-LAN VPNs between my datacenter and other sites of customers and partners. However, I haven't had occasion to use OpenBSD as a VPN endpoint yet and I'm not an expert on the ike/ ipsec features of OpenBSD. Having said that, I've done quite a bit of VPN troubleshooting in the past, so I'll take a stab at this in general terms... My reading of the three 'ike esp' statements in ipsec.conf is that you've declared three sets of SAs on the OpenBSD endpoint, all to peer 2.2.2.2 - one SA between the interior address spaces of the two locations, a second between the endpoint address of the 1st location and the interior address space of the 2nd, and a third between the endpoint addresses. That third one certainly catches my attention since I know that -some- pieces of equipment (particularly the PIX, ASA, and I believe the Juniper although I've never confirmed this for a Cisco 3000) hate the idea of having their own endpoint address included in the encryption domain. This seems likely to me as a cause for the rejection. This is something that IKE might negotiate on -some- manufacturer's equipment but not others. In most cases, there's no need for the endpoints to participate in the encryption domain since they aren't application servers - they only need to exchange IKE messages and then simply pass IPsec to/from their respective protected address spaces. So my suggestion would be to strike that third 'ike esp' statement and then see what difference that makes in the log. As a special note, do be aware that this means that you probably won't be able to ping the 2.2.2.2 address from the 1st site (encryption enforcement on the Cisco will deny this, since you're pining from an address space at the 1st site that's covered by the VPN proposal and yet 2.2.2.2 is not in the encryption domain). If you need to troubleshoot the Cisco by pinging it, then you'll have to do so from a point -outside- the OpenBSD VPN endpoint. This did in fact change the behavior. First I did as you suggested and struck the statement for the two end points. The logs showed a similar message as before but this time it complained about the src: 1.1.1.1 dst: 10.10.x.x tunnel. So I removed that one as well. So now my ipsec.conf has just the one line in it. ike esp from 192.168.1.0/24 to 10.10.0.0/16 peer 2.2.2.2 \ main auth hmac-sha1 enc 3des group modp768 psk openbsdrules This gives me a different result. Here is the output from the cisco log. 2 02/25/2007 15:28:21.210 SEV=5 IKE/172 RPT=7437 1.1.1.1 Group [1.1.1.1] Automatic NAT Detection Status: Remote end is NOT behind a NAT device This end is NOT behind a NAT device 6 02/25/2007 15:28:21.310 SEV=4 IKE/119 RPT=6722 1.1.1.1 Group [1.1.1.1] PHASE 1 COMPLETED 7 02/25/2007 15:28:21.310 SEV=4 AUTH/22 RPT=6617 1.1.1.1 User [1.1.1.1] Group [1.1.1.1] connected, Session Type: IPSec/LAN-to -LAN 9 02/25/2007 15:28:21.310 SEV=4 AUTH/84 RPT=86 LAN-to-LAN tunnel to headend device 1.1.1.1 connected 10 02/25/2007 15:28:21.400 SEV=5 IKE/35 RPT=30 1.1.1.1 Group [1.1.1.1] Received remote IP Proxy Subnet data in ID Payload: Address 192.168.1.0, Mask 255.255.255.0, Protocol 0, Port 0 13 02/25/2007 15:28:21.400 SEV=5 IKE/34 RPT=9176 1.1.1.1 Group [1.1.1.1] Received local IP Proxy Subnet data in ID Payload: A
Re: site-to-site vpn 4.0 to cisco 3000
On further study of the iskampd.conf man page, I am thinking that you may be correct by turning you attention to the isakmpd.conf as a possible trouble spot. I notice that you specified group mod768 (Diffie -Hellman group 1)in your ipsec statements. As I said, not having had occasion to run a VPN before using OpenBSD as an endpoint, I am having to generalize from all the other VPN setups that I have done. Generally, the Diffie-Hellman group is only relevant in two places, one being the 'main' mode for phase 1 and the other being for PFS in phase 2 (-if- PFS is enabled). I see that the isakmpd normally uses DH2 by default for 'main' mode (so claims the man page), and this can be defined otherwise (e.g. DH 1) if preferred. I -suspect- that the DH group specified in the ipsecd.conf is not relevant to 'main' mode; it is perhaps used only when PFS is configured. So a possible problem might be that site #1 is using DH 2 for its proposal and an edit to isakmpd.conf may be the solution. Or, an alternative might be to reconfigure the VPN 3000 to use DH 2. If you have PFS enabled, deconfigure it for now in order to simplify things. Once you've got the VPN running, you can play around with PFS enablement if you really need it. I'm afraid I'm a bit hindered by having not used OpenBSD in any of the 70+ VPNs I've set up in the past. Mostly, I've worked with CheckPoint, Juniper, Cisco routers/PIX/ASA/VPN300, SonicWall, Nortel, and LinkSys. None of my customers have selected OpenBSD for VPN, yet. Bill On Feb 25, 2007, at 14:48, c l wrote: Hello, thanks for the reply, it helped if I'm not mistaken. I think I'm getting closer but still no joy. See below. From: William Bloom <[EMAIL PROTECTED]> To: c l <[EMAIL PROTECTED]> CC: misc@openbsd.org Subject: Re: site-to-site vpn 4.0 to cisco 3000 Date: Sun, 25 Feb 2007 14:02:13 -0700 I've setup maybe 78 LAN-to-LAN VPNs between my datacenter and other sites of customers and partners. However, I haven't had occasion to use OpenBSD as a VPN endpoint yet and I'm not an expert on the ike/ ipsec features of OpenBSD. Having said that, I've done quite a bit of VPN troubleshooting in the past, so I'll take a stab at this in general terms... My reading of the three 'ike esp' statements in ipsec.conf is that you've declared three sets of SAs on the OpenBSD endpoint, all to peer 2.2.2.2 - one SA between the interior address spaces of the two locations, a second between the endpoint address of the 1st location and the interior address space of the 2nd, and a third between the endpoint addresses. That third one certainly catches my attention since I know that -some- pieces of equipment (particularly the PIX, ASA, and I believe the Juniper although I've never confirmed this for a Cisco 3000) hate the idea of having their own endpoint address included in the encryption domain. This seems likely to me as a cause for the rejection. This is something that IKE might negotiate on -some- manufacturer's equipment but not others. In most cases, there's no need for the endpoints to participate in the encryption domain since they aren't application servers - they only need to exchange IKE messages and then simply pass IPsec to/from their respective protected address spaces. So my suggestion would be to strike that third 'ike esp' statement and then see what difference that makes in the log. As a special note, do be aware that this means that you probably won't be able to ping the 2.2.2.2 address from the 1st site (encryption enforcement on the Cisco will deny this, since you're pining from an address space at the 1st site that's covered by the VPN proposal and yet 2.2.2.2 is not in the encryption domain). If you need to troubleshoot the Cisco by pinging it, then you'll have to do so from a point -outside- the OpenBSD VPN endpoint. This did in fact change the behavior. First I did as you suggested and struck the statement for the two end points. The logs showed a similar message as before but this time it complained about the src: 1.1.1.1 dst: 10.10.x.x tunnel. So I removed that one as well. So now my ipsec.conf has just the one line in it. ike esp from 192.168.1.0/24 to 10.10.0.0/16 peer 2.2.2.2 \ main auth hmac-sha1 enc 3des group modp768 psk openbsdrules This gives me a different result. Here is the output from the cisco log. 2 02/25/2007 15:28:21.210 SEV=5 IKE/172 RPT=7437 1.1.1.1 Group [1.1.1.1] Automatic NAT Detection Status: Remote end is NOT behind a NAT device This end is NOT behind a NAT device 6 02/25/2007 15:28:21.310 SEV=4 IKE/119 RPT=6722 1.1.1.1 Group [1.1.1.1] PHASE 1 COMPLETED 7 02/25/2007 15:28:21.310 SEV=4 AUTH/22 RPT=6617 1.1.1.1 User [1.1.1.1] Group [1.1.1.1] connected, Session Type: IPSec/LAN-to -LAN 9 02/25/2007 15:28:21.310 SEV=4 AUTH/
Re: site-to-site vpn 4.0 to cisco 3000 SOLVED
Ah. Disregard my last post. I didn't realize that the 'ipsec' configuration specifies main mode (phase 1 negotiation) and quick mode (phase 2 negotiation) in separate substatements. Good find. That makes perfect sense. Bill On Feb 25, 2007, at 19:06, c l wrote: Finally got this to work. Here's the config that ended up working. I'm not sure why I didn't notice before but the quick mode stuff wasn't setup correctly. ipsec.conf ike esp from 192.168.1.0/24 to 10.10.0.0/16 peer 2.2.2.2 \ main auth hmac-sha1 enc 3des group modp768 \ quick auth hmac-sha1 enc 3des group none psk openbsdrules cisco IKE proposal authentication mode - presharedkeys authentication algorithm - sha/hmac-160 encryption - 3DES-168 DH Group - 1 768-bits Lifetime - 3600seconds Lan-to-Lan connection interface - external(2.2.2.2) connection type - bi-directional peer - 1.1.1.1 presharedkey - openbsdrules authentication - esp/sha/hmac160 local network - 10.10.0.0 (wildcard mask 0.0.255.255) remote network - 192.168.1.0 (wildcard mask 0.0.0.255) SA authentication - esp/sha/hmac160 encryption - 3DES-168 mode - tunnel Lifetime - 1200seconds Now I just have to figure out the routing :) From: William Bloom <[EMAIL PROTECTED]> To: c l <[EMAIL PROTECTED]> CC: misc@openbsd.org Subject: Re: site-to-site vpn 4.0 to cisco 3000 Date: Sun, 25 Feb 2007 18:53:12 -0700 The man page for isakpd.conf indeed sheds some light, there's an example in that page that show's how to specify lifetimes for both phases... [General] Default-phase-1-lifetime= 3600,60:86400 Default-phase-2-lifetime= 1200,60:86400 At this point, if the lifetimes indeed agree, then I myself would be a little puzzled over why the proposal would be rejected. Both endpoints are configured to use the peer address as the ID? At first blush, your settings seem all kosher. I would agree, though, that it certainly appears that there must still be some sort of inconsistency between the proposals. Another suggestion... It appears that you've been trying to initiate the VPN from one end, perhaps the OpenBSD end. Probably by sending a ping from the 1st site to the 2nd. Restart both ends to clear out any SAs that have been negotiated and try to ping from the -other- end in order to see what happens when the VPN negotiation is initiated the opposite direction. The log entries might show something useful. Also, did the OpenBSD logs show any detail of the failure from the last attempts apart from the mismatched SA queries? Bill On Feb 25, 2007, at 14:48, c l wrote: Hello, thanks for the reply, it helped if I'm not mistaken. I think I'm getting closer but still no joy. See below. From: William Bloom <[EMAIL PROTECTED]> To: c l <[EMAIL PROTECTED]> CC: misc@openbsd.org Subject: Re: site-to-site vpn 4.0 to cisco 3000 Date: Sun, 25 Feb 2007 14:02:13 -0700 I've setup maybe 78 LAN-to-LAN VPNs between my datacenter and other sites of customers and partners. However, I haven't had occasion to use OpenBSD as a VPN endpoint yet and I'm not an expert on the ike/ ipsec features of OpenBSD. Having said that, I've done quite a bit of VPN troubleshooting in the past, so I'll take a stab at this in general terms... My reading of the three 'ike esp' statements in ipsec.conf is that you've declared three sets of SAs on the OpenBSD endpoint, all to peer 2.2.2.2 - one SA between the interior address spaces of the two locations, a second between the endpoint address of the 1st location and the interior address space of the 2nd, and a third between the endpoint addresses. That third one certainly catches my attention since I know that -some- pieces of equipment (particularly the PIX, ASA, and I believe the Juniper although I've never confirmed this for a Cisco 3000) hate the idea of having their own endpoint address included in the encryption domain. This seems likely to me as a cause for the rejection. This is something that IKE might negotiate on -some- manufacturer's equipment but not others. In most cases, there's no need for the endpoints to participate in the encryption domain since they aren't application servers - they only need to exchange IKE messages and then simply pass IPsec to/from their respective protected address spaces. So my suggestion would be to strike that third 'ike esp' statement and then see what difference that makes in the log. As a special note, do be aware that this means that you probably won't be able to ping the 2.2.2.2 address from the 1st site (encryption enforcement on the Cisco will deny this, since you're pining from an address space at the 1st site that's covered by the VPN proposal and yet 2.2.2.2 is not in the encryption domain). If you need to troubl
Congestion and Dropping of Gratuitous ARP (GARP)
I have a case where a large (60+), rapid flurry of incoming GARP messages are evidently being only partially processed. Especially, the first 25 GARP messages are applied to the ARP cache and the remainder are ignored. I'm looking for a better understanding of why this happens as well as suggestions on how to workaround or solve the problem. Background... We have a double-firewalled arrangement. The inner firewall is a Checkpoint ClusterXL pair of Resilience Ndurant modules running Checkpoint NG AI R55 HFA_17, and the outer firewall is a pair of Soekris 4801 boxes running OpenBSD 3.7 in a PF/PFsync/CARP pair. Between the two pairs of firewall is our DMZ, behind the inner firewall is our trusted net. The ClusterXL pair runs as active/standby (as does the OpenBSD pair). Nominally, this all works fine. Now, the ClusterXL pair assigns a single virtual IP address to the outer interface of the active cluster member. The active cluster member provides proxy ARP for all NAT'd hosts. Whenever the active member sends an ARP reply for its virtual outer IP address or one of the NAT's hosts, it uses the MAC address for the physical ethernet interface. Whenever a cluster failover occurs, the new active member assigns the outer virtual IP address to itself, then sends GARP messages out the outer interface for the virtual IP address as well as for the NAT address of each NAT'd host. Since we have about 61 NAT'd hosts, that's 62 GARP messages that get sent. This set of GARP messages is sent fairly quickly as a flurry, less than 7 ms for all 61 messages. We find that when this flurry of GARP messages arrives at the outer firewall (observed using "tcp -i sis2 -ne 'arp'"), that the relevant ARP cache entries for the first through the 25th GARP messages are indeed updated with the new MAC address. ARP entries for any GARP message after the 25th are not updated - these entries retain their stale MAC value until their ARP entry expires several minutes later. However, if we manually send a lone GARP message (using the 'garp' command) from the inner firewall for one of these stale ARP entries, then it -is- updated. So the only affected ARP cache entries are those for whom a GARP message is preceded by at least 25 GARP messages within the last few milliseconds, and a lone GARP message (not part of a 'flurry') works fine. This sounds an awful lot like some sort of congestion. We found that other machines (mainly FreeBSD) that are peers alongside the outer firewall's inner interface on the DMZ don't suffer from this symptom. But then, these machines are not firewalls. Now, I realize that incoming ARP messages are handled on BSD systems with a network interrupt (NETISR) especially dedicated to ARP, separate from the NETISR for IP traffic. And I -believe- that the receive queue for the ARP NETISR is not too big (room for 50 mbufs, I recall). Also, since an OpenBSD firewall uses PF for filtering, and PF does rule processing in interrupt mode, then a busy OpenBSD firewall can typically be seen to be computing almost entirely in interrupt mode when not idle. Indeed, our active OpenBSD firewall is often 30% to 50% idle with pretty much all other computing in interrupt mode. I wonder if this behavior can starve the ARP NETISR somewhat. I know that NETISR receive queues have a drop counter associated with them (to count drops of an mbuf when the receive queue is full), but as far as I know there's actually no way to inspect the ARP receive queue drop count - the count is maintained by the kernel but there is no tool that displays it (e.g. netstat doesn't show this). I've checked through /var/log/pflog and I see no evidence that any ARP messages were dropped by PF. Could it be that the large amount of time a firewall dwells in interrupt mode for PF processing can somehow cause the ARP receive queue to get full more easily than otherwise? Why is '25' the magic number for seemingly dropped GARP messages instead of '50'? This is 100% reproducible at exactly 25. Is there any way that anyone can think of that I can inspect the ARP NETISR drop count? Any ideas on how to workaround or fix this? I don't see any sysctl settings in 3.7 for the ARP receive queue size. And there appears to be no way to throttle or pace the Checkpoint ClusterXL GARP messages that I can find. Bill -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under appl