High Interrupt Mode Reported by 'Top' for Soekris 4801

2005-10-06 Thread William Bloom
I am a new owner of two Soekris 4801s running OpenBSD 3.7 (generic) with 
pf/pfsynch/carp for redundant firewalling.  I've encountered a problem with 
high 
interrupts (and some packet loss), and after having perused the on-line 
FAQ/forums and finding nothing that I could identify as matching the symptoms 
I've observed, I am now looking for pointers on how to isolate the problem and 
perhaps fix it.

I have sis0 in use for the outer interface, sis2 for the inner, sis1 for 
pfsync. 
  There is an inner carp'd interface address (carp0) and an outer (carp1).  The 
configuration is generally along the lines of the FAQ and man pages.

When traffic through the Soekris reaches approximately 4Mbs, the interrupt mode 
reported by top reaches 75% or higher and there is a measurable packet loss (1% 
- 5% or so).  From 'pfctl -si', the congestion counter goes up rapidly when the 
interrupts are highest.  The interrupt mode increases as the traffic volume 
increases, and goes down to about 1% when I failover to the other firewall. 
When I failover, I observe exactly the same behavior on the newly active 
firewall.

Checking forums, I see that there have been reports of very high interrupts on 
the  sis device in the past for OpenBSD on Soekris, but I read that these were 
all corrected in recent OpenBSD releases (and the problem I read about only 
applied whenever one sis interface was left 'down', which is not the case for 
my 
circumstances since all interfaces are in use).

I've checked with Soekris, and they've not heard of symptoms such as I describe 
with OpenBSD 3.7.  I've not noticed anything amiss in dmesg or 
/var/log/messages 
(well, all sis devices are sharing IRQ 10 but this is normal on a 4801, the FAQ 
states that this is not a problem, and other 4801 users haven't reported 
symptoms like the ones I describe).  I haven't posted dmesg or other info in 
this message (I thought it might be rude to do so without being asked).

Can anyone offer pointers on how I might go about isolating this problem?


Bill
-- 
William Bloom| Systems Engineer|M P H A S I S Architecting Value | Eldorado 
Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 
| 
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its 
Customers and is intended for use only by the individual or entity to which it 
is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded to you without proper 
authority, you are notified that any use or dissemination of this information 
in any manner is strictly prohibited. In such cases, please notify us 
immediately at [EMAIL PROTECTED] and delete this mail from your records.



Re: High Interrupt Mode Reported by 'Top' for Soekris 4801

2005-10-06 Thread William Bloom
I wondered that as well, but there appear to be lots (so it appears from other 
postings I found using google) of 4801s in use with OpenBSD, doing essentially 
the same thing as myself (Soekris w/ carp/pf/pfsynch).  Yet, AFAICT, I'm the 
only one who's posted about this symptom.  Since there are lots of people who 
do 
what I do, and if the problem were indeed that the 4801 processor is too wimpy, 
then wouldn't there be more problems like mine mentioned in the lists?  And I'm 
running into high interrupts with only about 4Mbs throughput while others have 
claimed much higher values.

Before I used this firewall that I have now, I used m0n0wall on FreeBSD.  I 
chose OpenBSD over m0n0wall/FreeBSD due to m0n0wall state table limitations and 
lack of mature redundance features.  But the m0n0wall handled this much 
traffic, 
and more, with a relatively low interrupt mode.  As widely as OpenBSD is used 
on 
Soekris for firewalling compared to m0n0wall/FreeBSD with relatively few 
problems, I'm still not quite ready to decide that I haven't gotten myself a 
setup flaw somewhere.  Just can't figure out where it could be.


Bill

Theo de Raadt wrote:
>>>If the Soekris did not come with ethernet chipsets which are just
>>>slightly over the bar of rl(4), the wimpy processor in the machine
>>>might be able to cope.
>>
>>Throughput is only marginally better using an em in the pci slot of a 
>>4801. I think there's some other problem.
> 
> 
> Yeah -- the super wimpy processor.

-- 
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado 
Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 
| 
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its 
Customers and is intended for use only by the individual or entity to which it 
is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded to you without proper 
authority, you are notified that any use or dissemination of this information 
in any manner is strictly prohibited. In such cases, please notify us 
immediately at [EMAIL PROTECTED] and delete this mail from your records.



Re: ssh key question

2005-10-07 Thread William Bloom
I've done this precise sort of thing on a set of Solaris machines (duplicated 
the SSH host key) that participate in a cluster.  There is no reason I can 
imagine why this wouldn't be a reasonable thing for you to do for the 
circumstances you describe.  Decide which machine's SSH host key is the one to 
be used for both machines, then copy /etc/ssh/ssh_host*key* from that machine 
to 
the other.  You may like to first save the old keys from the target machine in 
another backup directory for fallback, just in case an unexpected problem 
arises 
later.

Once this is done, any SSH clients who have established connections to the 2nd 
machine in the past while it was still using its original host key may now 
still 
have that old public key in their private 'known hosts' list.  That's OK, but 
the user of the SSH client may see a warning that a host-spoof is suspected as 
soon as he/she tries to connect (after the host key has been replaced).  So you 
might get a few phone calls.  If possible and practical, it would be good to 
check all the SSH clients' 'known hosts' lists and remove the obsolete entry 
(it 
will get recreated automatically later during the next SSH connection).


Bill


[EMAIL PROTECTED] wrote:
> Maybe this is slightly off topic because it is more of an ssh question,
> sorry.
> 
> I have two openbsd boxes running sshd.  They are mirrors of each other, and
> we switch between them every two weeks.  They have their own IP numbers,
> 10.1.1.42, and 10.1.1.43, but whichever machine is the production box gets
> the IP number 10.1.1.44 and you can no longer get to that machine via it's
> own IP number.
> 
> Currently all employee's telnet into the production box.  I want to get that
> switched over to ssh.  The trouble is the host key appearing to change every
> two weeks.  Can I just duplicate the host key from one box onto the other
> box?  And which key file[s] would that be that I need to copy?  Or do I need
> to see about turning off host key checking on our client?
> 
> --ja
> 

-- 
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado
Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 |
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its 
Customers and is intended for use only by the individual or entity to which it 
is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded to you without proper 
authority, you are notified that any use or dissemination of this information 
in any manner is strictly prohibited. In such cases, please notify us 
immediately at [EMAIL PROTECTED] and delete this mail from your records.



Re: CARP interface incorrectly comes up as INIT on boot

2005-10-07 Thread William Bloom
If I'd had this experience, I'd be tempted to use tcpdump on whichever physical 
interface is carpdev for the suspect carp interface in order to verify that 
multicast is enabled on your switch.  With carp interfaces up, you should see 
periodic multicast messages.  If you don't see any, then you've found your 
problem (and you need to revisit the switch configuration in order to fix it). 
I've seen a few cases in the past where action had to be taken in order to 
enable multicast on a switch; else CARP (or VRRP, or HSRP) of course fails to 
transition to proper states.

You mention that this machine is not a firewall and you're not using pfsync, 
but 
you didn't say right out loud whether you're running pf.  Probably you're not, 
but there do exist some non-firewall applications where pf is used, and I don't 
know whether your machine falls into that category.  If it does, then also make 
sure the pf ruleset isn't blocking multicast.


Bill

Tim wrote:
> I'm using CARP under 3.7 release version on two boxes that aren't firewalls, 
> so
> no pfsync involved and CARP configured as described in the FAQ.  What I'm 
> seeing
> is that the box I've designated as BACKUP always boots with carp0 as INIT and
> carp1 and carp2 both come up BACKUP as expected.  The other box always boots
> with all 3 carp interfaces correctly as MASTER.  On the backup box, I can
> execute 'ifconfig carp0 up' and the interface correctly transitions to 
> BACKUP. 
> To prove to myself that this was not a problem with that particular box, I 
> tried
> switching the roles making the backup the master and vice versa and the 
> problem
> moves to the other box.  Here's the output of ifconfig -A on the backup box 
> and
> I can supply more info if needed:
> 
> lo0: flags=8049 mtu 33224
> inet 127.0.0.1 netmask 0xff00 
> inet6 ::1 prefixlen 128
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
> pflog0: flags=0<> mtu 33224
> pfsync0: flags=0<> mtu 2020
> enc0: flags=0<> mtu 1536
> dc0: flags=8943 mtu 1500
> address: 00:10:a4:c7:51:4e
> media: Ethernet autoselect (100baseTX full-duplex)
> status: active
> inet 192.168.0.3 netmask 0xff00 broadcast 192.168.0.255
> inet6 fe80::210:a4ff:fec7:514e%dc0 prefixlen 64 scopeid 0x5
> inet 192.168.0.12 netmask 0xff00 broadcast 192.168.0.255
> inet 192.168.0.22 netmask 0xff00 broadcast 192.168.0.255
> inet 192.168.0.42 netmask 0xff00 broadcast 192.168.0.255
> carp0: flags=8802 mtu 1500
> carp: INIT carpdev dc0 vhid 4 advbase 1 advskew 100
> inet 192.168.0.20 netmask 0xff00 broadcast 192.168.0.255
> carp1: flags=8843 mtu 1500
> carp: BACKUP carpdev dc0 vhid 3 advbase 1 advskew 100
> inet 192.168.0.10 netmask 0xff00 broadcast 192.168.0.255
> carp2: flags=8843 mtu 1500
> carp: BACKUP carpdev dc0 vhid 6 advbase 1 advskew 100
> inet 192.168.0.40 netmask 0xff00 broadcast 192.168.0.255
> 
> Tim
> 

-- 
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado 
Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 
| 
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its 
Customers and is intended for use only by the individual or entity to which it 
is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded to you without proper 
authority, you are notified that any use or dissemination of this information 
in any manner is strictly prohibited. In such cases, please notify us 
immediately at [EMAIL PROTECTED] and delete this mail from your records.



Zero PF Counters

2005-10-10 Thread William Bloom
Perhaps I've misread the man page, but it's not obvious to me how to zero the 
PF 
counters.  For example, 'pfctl -si' shows a non-zero congestion counter, and 
I'd 
like to clear that counter after I think the congestion issue is remedied.  But 
I see no way to do that (apart from a reboot).  How to do this?

Change in subject...

One odd symptom I've experienced is that permitted users will login (SSH) to a 
host behind the firewall successfully, work with the system for a few minutes, 
then get disconnected suddenly.  When I TCP dump from the login host, I see 
his/her session established successfully and work begins.  Then, a few minutes 
after successful flow of traffic both directions, the user's desktop sends a 
long flurry of TCP resets as the connection is lost.  When I disable PF (pfctl 
-d) on the firewall, the symptom vanishes.  Now, if the ruleset had handled the 
TCP state wrongly, then I would have expected the TCP connection to not have 
survived long enough for the user to get several minutes of work done.  The 
firewall's pflog (block log) shows no packets dropped for these connections, 
and 
there are no entries for packets dropped due to congestion.

What's an interpretation of this?  I am baffled for the moment.

Another change in subject...

The PF man page gives meager detail about the congestion counter.  And the only 
FAQ items for this that I can find are related to queueing (and I don't have 
queues in my ruleset).  What is the meaning of a non-zero congestion counter, 
and what action is PF taking when the congestion counter is incremented?


Bill
-- 
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado 
Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 
| 
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its 
Customers and is intended for use only by the individual or entity to which it 
is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded to you without proper 
authority, you are notified that any use or dissemination of this information 
in any manner is strictly prohibited. In such cases, please notify us 
immediately at [EMAIL PROTECTED] and delete this mail from your records.



Re: "keep state" and PF Queues

2005-10-19 Thread William Bloom
The PF queueing FAQ page at http://www.openbsd.org has a wealth of info that 
seems to nicely clarify the pf.conf man page.  I recall that the FAQ contains 
an 
example much as you describe (as I recall, specifying a queue for -incoming- 
traffic will indeed cause that traffic to be processed through the named queue 
as it is -outgoing-).


Bill

Brian A. Seklecki wrote:
> Would anyone like to elaborate on the impacts of using "keep state" on 
> conjunction with pass rules that assign traffic to queues?
> 
> One might assume that inverted traffic flows would also be queued, 
> however that would break the "traffic can only be queued egress an 
> interface" rule...
> 
> There should be some remarks on this in pf.conf(5)
> 
> TIA,
> 
> ~BAS
> 

-- 
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado 
Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 
| 
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its 
Customers and is intended for use only by the individual or entity to which it 
is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded to you without proper 
authority, you are notified that any use or dissemination of this information 
in any manner is strictly prohibited. In such cases, please notify us 
immediately at [EMAIL PROTECTED] and delete this mail from your records.



Relative Firewall Performance: 3.7 and 4.0

2007-02-23 Thread William Bloom
I recently upgraded a Soekris 4801 firewall from OpenBSD 3.7 to 4.0.  The
configuration for firewalling (pf.conf) is unchanged.  On 3.7, at peak
throughput I normally saw maybe 65% - 76% interrupt mode and little or no
congestion.  However, on 4.0 with similar traffic levels I see 85% - 95%
interrupt mode and the congestion counter increments fairly rapidly.

Of course, one cannot expect best performance from a Soekris due to the
Ethernet
chipsets, but it was -adequate- on 3.7.

I've spent a little time google'ing for any observations on a difference in
performance between 3.7 and 4.0 and have found nothing useful so far.

Have other list members had this experience or know of anyone else who has?
If
so, has anyone had any favorable performance tuning experiences that might
help
me out?  So far, the only tuning change I've made for 4.0 was to increase
net.inet.ip.ifq.maxlen from 50 to 150, but this appears to have had
negligible
impact.


Bill
--
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value |
MphasiS
Healthcare Solutions
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100
|
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by thisB e-mail is proprietary to MphasiS and/or its
Customers and is intended for use only by the individual or entity to which it
is addressed, and may contain information that isB privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this e-mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please notify
us immediately at [EMAIL PROTECTED] and delete this mail from your
records.



Re: Relative Firewall Performance: 3.7 and 4.0

2007-02-23 Thread William Bloom
Hmm, I'm rereading the product description for the Soekris lan1621, which
would
go into my 4801's PCI slot and give me 2 enet ports.  It claims 'High
performance PCI busmaster interface with large buffers and interrupt
holdoff'.
Would you have higher hopes for this than the on-board enet ports?  Do you
know
whether the OpenBSD 4.0 sis driver would support the interrupt holdoff
feature?

Has anyone on this list actually tried a lan1621 on a Soekris 4801 in an
effort
to boost performance, and were you satisfied with the results?


Bill

Stuart Henderson wrote:
> On 2007/02/23 16:27, William Bloom wrote:
>> I recently upgraded a Soekris 4801 firewall from OpenBSD 3.7 to 4.0.  The
>> configuration for firewalling (pf.conf) is unchanged.  On 3.7, at peak
>> throughput I normally saw maybe 65% - 76% interrupt mode and little or no
>> congestion.  However, on 4.0 with similar traffic levels I see 85% - 95%
>> interrupt mode and the congestion counter increments fairly rapidly.
>
> you might get a small improvement if you optimize the pf ruleset.
>
>> Of course, one cannot expect best performance from a Soekris due to the
>> Ethernet chipsets, but it was -adequate- on 3.7.
>
> ethernet chipsets make little difference, plug an em(4) in and you'll
> see pretty much the same. it's the PCI controller (or lack thereof)
> that's the problem.
>
> fwiw, WRAP manage about a 1/3 more throughput from a similar processor,
> but I'm not quite sure how.
>

--
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value |
MphasiS
Healthcare Solutions
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100
|
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by thisB e-mail is proprietary to MphasiS and/or its
Customers and is intended for use only by the individual or entity to which it
is addressed, and may contain information that isB privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this e-mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please notify
us immediately at [EMAIL PROTECTED] and delete this mail from your
records.



Re: Relative Firewall Performance: 3.7 and 4.0

2007-02-24 Thread William Bloom
How 'bout the Commell EMB-564VG then, as an alternative to Soekris?   
I've seen a few postings that seem to show high regard.



Billl


On Feb 24, 2007, at 3:49, Stuart Henderson wrote:


On 2007/02/23 18:58, William Bloom wrote:
Hmm, I'm rereading the product description for the Soekris  
lan1621, which
would go into my 4801's PCI slot and give me 2 enet ports.  It  
claims 'High

performance PCI busmaster interface with large buffers and interrupt
holdoff'.
Would you have higher hopes for this than the on-board enet ports?


No, sorry. lan1621 and onboard use the same chip.


Do you know whether the OpenBSD 4.0 sis driver would support the
interrupt holdoff feature?


You can modify it to fairly easily, but I didn't find it helping very
much and it increases latency a bit when traffic is low.

Has anyone on this list actually tried a lan1621 on a Soekris 4801  
in an

effort to boost performance, and were you satisfied with the results?


Not 1621, but using a half-decent gig nic doesn't improve
performance, I doubt it would help very much. (mind you the gig nic
is designed for a standard system and is unlikely to do very much
interrupt mitigation at such low traffic levels as max out the
Geode cpu so there may be some tuning that could be done, nothing
one-size-fits-all though).

I think if you have sufficient traffic that this is a problem,
you could really do with something faster to leave yourself some
headroom to cater for 'unusual' situations (worm activity, etc)
too. There are quite a few single-board computers to choose
from that might be suitable.



--
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value  
| Eldorado Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct:  
+11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com




Re: site-to-site vpn 4.0 to cisco 3000

2007-02-25 Thread William Bloom
time - 3600seconds

Lan-to-Lan connection
interface - external(2.2.2.2)
connection type - bi-directional
peer - 1.1.1.1
presharedkey - openbsdrules
authentication - esp/sha/hmac160
local network - 10.10.0.0  (wildcard mask 0.0.255.255)
remote network - 192.168.1.0 (wildcard mask 0.0.0.255)

SA
authentication - esp/sha/hmac160
encryption - 3DES-168
mode - tunnel
Lifetime - 1200seconds


On the OpenBSD box I start isakmpd with 'isakmpd -K', then ipsecctl  
-f /etc/ipsec.conf


After a bit of time I see this in /var/log/messages
isakmpd[18700]: ipsec_validate_id_information: dubious ID  
information accepted



And the cisco log shows this

2 02/25/2007 10:37:16.280 SEV=5 IKE/172 RPT=7394 1.1.1.1
Group [1.1.1.1]
Automatic NAT Detection Status:
  Remote end is NOT behind a NAT device
  This   end is NOT behind a NAT device

6 02/25/2007 10:37:16.380 SEV=4 IKE/119 RPT=6680 1.1.1.1
Group [1.1.1.1]
PHASE 1 COMPLETED

7 02/25/2007 10:37:16.380 SEV=4 AUTH/22 RPT=6575 1.1.1.1
User [1.1.1.1] Group [1.1.1.1] connected, Session Type: IPSec/LAN-to
-LAN

9 02/25/2007 10:37:16.380 SEV=4 AUTH/84 RPT=52
LAN-to-LAN tunnel to headend device 1.1.1.1 connected

10 02/25/2007 10:37:16.500 SEV=5 IKE/25 RPT=9162 1.1.1.1
Group [1.1.1.1]
Received remote Proxy Host data in ID Payload:
Address 1.1.1.1, Protocol 0, Port 0

13 02/25/2007 10:37:16.500 SEV=5 IKE/24 RPT=27 1.1.1.1
Group [1.1.1.1]
Received local Proxy Host data in ID Payload:
Address 2.2.2.2, Protocol 0, Port 0

16 02/25/2007 10:37:16.500 SEV=4 IKE/61 RPT=27 1.1.1.1
Group [1.1.1.1]
Tunnel rejected: Policy not found for Src:1.1.1.1, Dst: 2.2.2.2!

18 02/25/2007 10:37:16.500 SEV=4 IKEDBG/97 RPT=52 1.1.1.1
Group [1.1.1.1]
QM FSM error (P2 struct &0xe7ed120, mess id 0xac462db5)!
&0xe7ed120, mess id 0xac462db5)!



Any ideas why I'm getting the "tunnel rejected" error?   Does  
anyone see any glaring mistakes?  After searching the archives and  
google'ing, I gather other folks are doing this without issue.


I have complete control over both devices so if there's any other  
info I can provide let me know.


I realize this isn't a cisco support list so if it's the cisco's  
fault I'll go bother someone else.



I appreciate your time, thank you.
please cc me as I'm not subscribed to the list.

_________
With tax season right around the corner, make sure to follow these  
few simple tips. http://articles.moneycentral.msn.com/Taxes/ 
PreparationTips/PreparationTips.aspx?icid=HMFebtagline




--
William Bloom
[EMAIL PROTECTED]



Re: site-to-site vpn 4.0 to cisco 3000

2007-02-25 Thread William Bloom

The man page for isakpd.conf indeed sheds some light, there's an
example in that page that show's how to specify lifetimes for both
phases...

   [General]
   Default-phase-1-lifetime=   3600,60:86400
   Default-phase-2-lifetime=   1200,60:86400

At this point, if the lifetimes indeed agree, then I myself would be
a little puzzled over why the proposal would be rejected.  Both
endpoints are configured to use the peer address as the ID?  At first
blush, your settings seem all kosher.

I would agree, though, that it certainly appears that there must
still be some sort of inconsistency between the proposals.

Another suggestion...

It appears that you've been trying to initiate the VPN from one end,
perhaps the OpenBSD end.  Probably by sending a ping from the 1st
site to the 2nd.  Restart both ends to clear out any SAs that have
been negotiated and try to ping from the -other- end in order to see
what happens when the VPN negotiation is initiated the opposite
direction.  The log entries might show something useful.

Also, did the OpenBSD logs show any detail of the failure from the
last attempts apart from the mismatched SA queries?


Bill


On Feb 25, 2007, at 14:48, c l wrote:


Hello,  thanks for the reply, it helped if I'm not mistaken.  I
think I'm getting closer but still no joy.  See below.


From: William Bloom <[EMAIL PROTECTED]>
To: c l <[EMAIL PROTECTED]>
CC: misc@openbsd.org
Subject: Re: site-to-site vpn 4.0 to cisco 3000
Date: Sun, 25 Feb 2007 14:02:13 -0700

I've setup maybe 78 LAN-to-LAN VPNs between my datacenter and
other  sites of customers and partners.  However, I haven't had
occasion to  use OpenBSD as a VPN endpoint yet and I'm not an
expert on the ike/ ipsec features of OpenBSD.  Having said that,
I've done quite a bit  of VPN troubleshooting in the past, so I'll
take a stab at this in  general terms...

My reading of the three 'ike esp' statements in ipsec.conf is
that  you've declared three sets of SAs on the OpenBSD endpoint,
all to  peer 2.2.2.2 - one SA between the interior address spaces
of the two  locations, a second between the endpoint address of
the 1st location  and the interior address space of the 2nd, and a
third between the  endpoint addresses.  That third one certainly
catches my attention  since I know that -some- pieces of equipment
(particularly the PIX,  ASA, and I believe the Juniper although
I've never confirmed this for  a Cisco 3000) hate the idea of
having their own endpoint address  included in the encryption
domain.  This seems likely to me as a  cause for the rejection.
This is something that IKE might negotiate  on -some-
manufacturer's equipment but not others.  In most cases,  there's
no need for the endpoints to participate in the encryption  domain
since they aren't application servers - they only need to
exchange IKE messages and then simply pass IPsec to/from their
respective protected address spaces.

So my suggestion would be to strike that third 'ike esp'
statement  and then see what difference that makes in the log.  As
a special  note, do be aware that this means that you probably
won't be able to  ping the 2.2.2.2 address from the 1st site
(encryption enforcement on  the Cisco will deny this, since you're
pining from an address space  at the 1st site that's covered by
the VPN proposal and yet 2.2.2.2 is  not in the encryption
domain).  If you need to troubleshoot the Cisco  by pinging it,
then you'll have to do so from a point -outside- the  OpenBSD VPN
endpoint.


This did in fact change the behavior.  First I did as you suggested
and struck the statement for the two end points.  The logs showed a
similar message as before but this time it complained about the
src: 1.1.1.1 dst: 10.10.x.x  tunnel.  So I removed that one as
well.  So now my ipsec.conf has just the one line in it.

ike esp from 192.168.1.0/24 to 10.10.0.0/16 peer 2.2.2.2 \
   main auth hmac-sha1 enc 3des group modp768 psk openbsdrules

This gives me a different result.  Here is the output from the
cisco log.

2 02/25/2007 15:28:21.210 SEV=5 IKE/172 RPT=7437 1.1.1.1
Group [1.1.1.1]
Automatic NAT Detection Status:
  Remote end is NOT behind a NAT device
  This   end is NOT behind a NAT device

6 02/25/2007 15:28:21.310 SEV=4 IKE/119 RPT=6722 1.1.1.1
Group [1.1.1.1]
PHASE 1 COMPLETED

7 02/25/2007 15:28:21.310 SEV=4 AUTH/22 RPT=6617 1.1.1.1
User [1.1.1.1] Group [1.1.1.1] connected, Session Type: IPSec/LAN-to
-LAN

9 02/25/2007 15:28:21.310 SEV=4 AUTH/84 RPT=86
LAN-to-LAN tunnel to headend device 1.1.1.1 connected

10 02/25/2007 15:28:21.400 SEV=5 IKE/35 RPT=30 1.1.1.1
Group [1.1.1.1]
Received remote IP Proxy Subnet data in ID Payload:
Address 192.168.1.0, Mask 255.255.255.0, Protocol 0, Port 0

13 02/25/2007 15:28:21.400 SEV=5 IKE/34 RPT=9176 1.1.1.1
Group [1.1.1.1]
Received local IP Proxy Subnet data in ID Payload:
A

Re: site-to-site vpn 4.0 to cisco 3000

2007-02-25 Thread William Bloom

On further study of the iskampd.conf man page, I am thinking that you
may be correct by turning you attention to the isakmpd.conf as a
possible trouble spot.

I notice that you specified group mod768 (Diffie -Hellman group 1)in
your ipsec statements.  As I said, not having had occasion to run a
VPN before using OpenBSD as an endpoint, I am having to generalize
from all the other VPN setups that I have done.  Generally, the
Diffie-Hellman group is only relevant in two places, one being the
'main' mode for phase 1 and the other being for PFS in phase 2 (-if-
PFS is enabled).  I see that the isakmpd normally uses DH2 by default
for 'main' mode (so claims the man page), and this can be defined
otherwise (e.g. DH 1) if preferred.  I -suspect- that the DH group
specified in the ipsecd.conf is not relevant to 'main' mode; it is
perhaps used only when PFS is configured.
So a possible problem might be that site #1 is using DH 2 for its
proposal and an edit to isakmpd.conf may be the solution.  Or, an
alternative might be to reconfigure the VPN 3000 to use DH 2.  If you
have PFS enabled, deconfigure it for now in order to simplify
things.  Once you've got the VPN running, you can play around with
PFS enablement if you really need it.

I'm afraid I'm a bit hindered by having not used OpenBSD in any of
the 70+ VPNs I've set up in the past.  Mostly, I've worked with
CheckPoint, Juniper, Cisco routers/PIX/ASA/VPN300, SonicWall, Nortel,
and LinkSys.  None of my customers have selected OpenBSD for VPN, yet.


Bill

On Feb 25, 2007, at 14:48, c l wrote:


Hello,  thanks for the reply, it helped if I'm not mistaken.  I
think I'm getting closer but still no joy.  See below.


From: William Bloom <[EMAIL PROTECTED]>
To: c l <[EMAIL PROTECTED]>
CC: misc@openbsd.org
Subject: Re: site-to-site vpn 4.0 to cisco 3000
Date: Sun, 25 Feb 2007 14:02:13 -0700

I've setup maybe 78 LAN-to-LAN VPNs between my datacenter and
other  sites of customers and partners.  However, I haven't had
occasion to  use OpenBSD as a VPN endpoint yet and I'm not an
expert on the ike/ ipsec features of OpenBSD.  Having said that,
I've done quite a bit  of VPN troubleshooting in the past, so I'll
take a stab at this in  general terms...

My reading of the three 'ike esp' statements in ipsec.conf is
that  you've declared three sets of SAs on the OpenBSD endpoint,
all to  peer 2.2.2.2 - one SA between the interior address spaces
of the two  locations, a second between the endpoint address of
the 1st location  and the interior address space of the 2nd, and a
third between the  endpoint addresses.  That third one certainly
catches my attention  since I know that -some- pieces of equipment
(particularly the PIX,  ASA, and I believe the Juniper although
I've never confirmed this for  a Cisco 3000) hate the idea of
having their own endpoint address  included in the encryption
domain.  This seems likely to me as a  cause for the rejection.
This is something that IKE might negotiate  on -some-
manufacturer's equipment but not others.  In most cases,  there's
no need for the endpoints to participate in the encryption  domain
since they aren't application servers - they only need to
exchange IKE messages and then simply pass IPsec to/from their
respective protected address spaces.

So my suggestion would be to strike that third 'ike esp'
statement  and then see what difference that makes in the log.  As
a special  note, do be aware that this means that you probably
won't be able to  ping the 2.2.2.2 address from the 1st site
(encryption enforcement on  the Cisco will deny this, since you're
pining from an address space  at the 1st site that's covered by
the VPN proposal and yet 2.2.2.2 is  not in the encryption
domain).  If you need to troubleshoot the Cisco  by pinging it,
then you'll have to do so from a point -outside- the  OpenBSD VPN
endpoint.


This did in fact change the behavior.  First I did as you suggested
and struck the statement for the two end points.  The logs showed a
similar message as before but this time it complained about the
src: 1.1.1.1 dst: 10.10.x.x  tunnel.  So I removed that one as
well.  So now my ipsec.conf has just the one line in it.

ike esp from 192.168.1.0/24 to 10.10.0.0/16 peer 2.2.2.2 \
   main auth hmac-sha1 enc 3des group modp768 psk openbsdrules

This gives me a different result.  Here is the output from the
cisco log.

2 02/25/2007 15:28:21.210 SEV=5 IKE/172 RPT=7437 1.1.1.1
Group [1.1.1.1]
Automatic NAT Detection Status:
  Remote end is NOT behind a NAT device
  This   end is NOT behind a NAT device

6 02/25/2007 15:28:21.310 SEV=4 IKE/119 RPT=6722 1.1.1.1
Group [1.1.1.1]
PHASE 1 COMPLETED

7 02/25/2007 15:28:21.310 SEV=4 AUTH/22 RPT=6617 1.1.1.1
User [1.1.1.1] Group [1.1.1.1] connected, Session Type: IPSec/LAN-to
-LAN

9 02/25/2007 15:28:21.310 SEV=4 AUTH/

Re: site-to-site vpn 4.0 to cisco 3000 SOLVED

2007-02-25 Thread William Bloom

Ah.   Disregard my last post.  I didn't realize that the 'ipsec'
configuration specifies main mode (phase 1 negotiation) and quick
mode (phase 2 negotiation) in separate substatements.  Good find.
That makes perfect sense.


Bill

On Feb 25, 2007, at 19:06, c l wrote:


Finally got this to work.  Here's the config that ended up working.

I'm not sure why I didn't notice before but the quick mode stuff
wasn't setup correctly.

ipsec.conf
ike esp from 192.168.1.0/24 to 10.10.0.0/16 peer 2.2.2.2 \
   main auth hmac-sha1 enc 3des group modp768 \
   quick auth hmac-sha1 enc 3des group none psk openbsdrules


cisco
IKE proposal
authentication mode - presharedkeys
authentication algorithm - sha/hmac-160
encryption - 3DES-168
DH Group - 1 768-bits
Lifetime - 3600seconds

Lan-to-Lan connection
interface - external(2.2.2.2)
connection type - bi-directional
peer - 1.1.1.1
presharedkey - openbsdrules
authentication - esp/sha/hmac160
local network - 10.10.0.0  (wildcard mask 0.0.255.255)
remote network - 192.168.1.0 (wildcard mask 0.0.0.255)

SA
authentication - esp/sha/hmac160
encryption - 3DES-168
mode - tunnel
Lifetime - 1200seconds



Now I just have to figure out the routing :)





From: William Bloom <[EMAIL PROTECTED]>
To: c l <[EMAIL PROTECTED]>
CC: misc@openbsd.org
Subject: Re: site-to-site vpn 4.0 to cisco 3000
Date: Sun, 25 Feb 2007 18:53:12 -0700

The man page for isakpd.conf indeed sheds some light, there's an
example in that page that show's how to specify lifetimes for
both  phases...

   [General]
   Default-phase-1-lifetime=   3600,60:86400
   Default-phase-2-lifetime=   1200,60:86400

At this point, if the lifetimes indeed agree, then I myself would
be  a little puzzled over why the proposal would be rejected.
Both  endpoints are configured to use the peer address as the ID?
At first  blush, your settings seem all kosher.

I would agree, though, that it certainly appears that there must
still be some sort of inconsistency between the proposals.

Another suggestion...

It appears that you've been trying to initiate the VPN from one
end,  perhaps the OpenBSD end.  Probably by sending a ping from
the 1st  site to the 2nd.  Restart both ends to clear out any SAs
that have  been negotiated and try to ping from the -other- end in
order to see  what happens when the VPN negotiation is initiated
the opposite  direction.  The log entries might show something
useful.

Also, did the OpenBSD logs show any detail of the failure from
the  last attempts apart from the mismatched SA queries?


Bill


On Feb 25, 2007, at 14:48, c l wrote:


Hello,  thanks for the reply, it helped if I'm not mistaken.  I
think I'm getting closer but still no joy.  See below.


From: William Bloom <[EMAIL PROTECTED]>
To: c l <[EMAIL PROTECTED]>
CC: misc@openbsd.org
Subject: Re: site-to-site vpn 4.0 to cisco 3000
Date: Sun, 25 Feb 2007 14:02:13 -0700

I've setup maybe 78 LAN-to-LAN VPNs between my datacenter and
other  sites of customers and partners.  However, I haven't had
occasion to  use OpenBSD as a VPN endpoint yet and I'm not an
expert on the ike/ ipsec features of OpenBSD.  Having said
that,  I've done quite a bit  of VPN troubleshooting in the
past, so I'll  take a stab at this in  general terms...

My reading of the three 'ike esp' statements in ipsec.conf is
that  you've declared three sets of SAs on the OpenBSD
endpoint,  all to  peer 2.2.2.2 - one SA between the interior
address spaces  of the two  locations, a second between the
endpoint address of  the 1st location  and the interior address
space of the 2nd, and a  third between the  endpoint addresses.
That third one certainly  catches my attention  since I know
that -some- pieces of equipment  (particularly the PIX,  ASA,
and I believe the Juniper although  I've never confirmed this
for  a Cisco 3000) hate the idea of  having their own endpoint
address  included in the encryption  domain.  This seems likely
to me as a  cause for the rejection.   This is something that
IKE might negotiate  on -some-  manufacturer's equipment but not
others.  In most cases,  there's  no need for the endpoints to
participate in the encryption  domain  since they aren't
application servers - they only need to   exchange IKE messages
and then simply pass IPsec to/from their   respective protected
address spaces.

So my suggestion would be to strike that third 'ike esp'
statement  and then see what difference that makes in the log.
As  a special  note, do be aware that this means that you
probably  won't be able to  ping the 2.2.2.2 address from the
1st site  (encryption enforcement on  the Cisco will deny this,
since you're  pining from an address space  at the 1st site
that's covered by  the VPN proposal and yet 2.2.2.2 is  not in
the encryption  domain).  If you need to troubl

Congestion and Dropping of Gratuitous ARP (GARP)

2006-07-25 Thread William Bloom
I have a case where a large (60+), rapid flurry of incoming GARP messages are
evidently being only partially processed.  Especially, the first 25 GARP
messages are applied to the ARP cache and the remainder are ignored.  I'm
looking for a better understanding of why this happens as well as suggestions on
how to workaround or solve the problem.

Background...

We have a double-firewalled arrangement.  The inner firewall is a Checkpoint
ClusterXL pair of Resilience Ndurant modules running Checkpoint NG AI R55
HFA_17, and the outer firewall is a pair of Soekris 4801 boxes running OpenBSD
3.7 in a PF/PFsync/CARP pair.  Between the two pairs of firewall is our DMZ,
behind the inner firewall is our trusted net.  The ClusterXL pair runs as
active/standby (as does the OpenBSD pair).

Nominally, this all works fine.

Now, the ClusterXL pair assigns a single virtual IP address to the outer
interface of the active cluster member.  The active cluster member provides
proxy ARP for all NAT'd hosts.  Whenever the active member sends an ARP reply
for its virtual outer IP address or one of the NAT's hosts, it uses the MAC
address for the physical ethernet interface.  Whenever a cluster failover
occurs, the new active member assigns the outer virtual IP address to itself,
then sends GARP messages out the outer interface for the virtual IP address as
well as for the NAT address of each NAT'd host.  Since we have about 61 NAT'd
hosts, that's 62 GARP messages that get sent.  This set of GARP messages is sent
fairly quickly as a flurry, less than 7 ms for all 61 messages.

We find that when this flurry of GARP messages arrives at the outer firewall
(observed using "tcp -i sis2 -ne 'arp'"), that the relevant ARP cache entries
for the first through the 25th GARP messages are indeed updated with the new MAC
address.  ARP entries for any GARP message after the 25th are not updated -
these entries retain their stale MAC value until their ARP entry expires several
minutes later.

However, if we manually send a lone GARP message (using the 'garp' command) from
the inner firewall for one of these stale ARP entries, then it -is- updated.

So the only affected ARP cache entries are those for whom a GARP message is
preceded by at least 25 GARP messages within the last few milliseconds, and a
lone GARP message (not part of a 'flurry') works fine.

This sounds an awful lot like some sort of congestion.

We found that other machines (mainly FreeBSD) that are peers alongside the outer
firewall's inner interface on the DMZ don't suffer from this symptom.  But then,
these machines are not firewalls.

Now, I realize that incoming ARP messages are handled on BSD systems with a
network interrupt (NETISR) especially dedicated to ARP, separate from the NETISR
for IP traffic.  And I -believe- that the receive queue for the ARP NETISR is
not too big (room for 50 mbufs, I recall).  Also, since an OpenBSD firewall uses
PF for filtering, and PF does rule processing in interrupt mode, then a busy
OpenBSD firewall can typically be seen to be computing almost entirely in
interrupt mode when not idle.  Indeed, our active OpenBSD firewall is often 30%
to 50% idle with pretty much all other computing in interrupt mode.  I wonder if
this behavior can starve the ARP NETISR somewhat.

I know that NETISR receive queues have a drop counter associated with them (to
count drops of an mbuf when the receive queue is full), but as far as I know
there's actually no way to inspect the ARP receive queue drop count - the count
is maintained by the kernel but there is no tool that displays it (e.g. netstat
doesn't show this).

I've checked through /var/log/pflog and I see no evidence that any ARP messages
were dropped by PF.

Could it be that the large amount of time a firewall dwells in interrupt mode
for PF processing can somehow cause the ARP receive queue to get full more
easily than otherwise?  Why is '25' the magic number for seemingly dropped GARP
messages instead of '50'?  This is 100% reproducible at exactly 25.  Is there
any way that anyone can think of that I can inspect the ARP NETISR drop count?
Any ideas on how to workaround or fix this?  I don't see any sysctl settings in
3.7 for the ARP receive queue size.  And there appears to be no way to throttle
or pace the Checkpoint ClusterXL GARP messages that I can find.


Bill
-- 
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado
Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 |
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its 
Customers and is intended for use only by the individual or entity to which it 
is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under appl