IPSEC offloading on Intel PRO/100 S

2002-03-01 Thread Bruce M Simpson

Guys,

A few questions regarding the Intel PRO/100 S Ethernet adapter:

o Has anybody experimented with using the IPSEC ESP 3DES hardware offload
  capability of the Intel 82550EY ASIC used within the above NIC?

o Have Intel ever released specs for this ASIC publicly?

o Would anybody be interested in my adding support for this beast's crypto
  features to the fxp driver?

Please let me know your thoughts.

Regards,
Bruce.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: IPSEC offloading on Intel PRO/100 S

2002-03-01 Thread Bruce M Simpson

On Fri, Mar 01, 2002 at 04:33:04AM -0600, Len Conrad wrote:
> 
> > > o Would anybody be interested in my adding support for this beast's crypto
> > >   features to the fxp driver?
> >
> >Yes. :)
> 
> Is there ANY hardware encryption support in FreeBSD?

Things are gradually being rearranged to facilitate this, as part of SMPng.
At the moment the IP stack runs solely as a software interrupt. Since 4.4,
there has been the notion of interface capabilities. Normally this is only
used for TCP/UDP/IP checksum offloading.

BMS

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: IPSEC offloading on Intel PRO/100 S

2002-03-01 Thread Bruce M Simpson

Pardon my lack of caffiene; I have a pint mug of tea on my desk now.

On Fri, Mar 01, 2002 at 10:40:35AM +, Bruce M Simpson wrote:
> > 
> > Is there ANY hardware encryption support in FreeBSD?
> 
> Things are gradually being rearranged to facilitate this, as part of SMPng.
> At the moment the IP stack runs solely as a software interrupt. Since 4.4,
   ^
That should be 'solely within a software interrupt context, splnet()'.
It's also going to require some reworking of the imported KAME tree.

BMS

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: ALTQ integration in FreeBSD

2002-03-04 Thread Bruce M Simpson

On Sat, Mar 02, 2002 at 12:36:49PM +0200, Adrian Penisoara wrote:
> Hi,
> 
>   For my diploma exam I will study the state of QoS in today's
> networking and further directions and I probably will concentrate on
> ALTQ in FreeBSD (as I'm pretty familiar w/ FreeBSD).
> 
>   I see that most of today's OSes have a default QoS implementation (at
> least Win2000 and OpenBSD come to my mind) and there is a growing need
> for QoS integration into the OS. I was wondering what kept FreeBSD from
> integrating a QoS implementation (as it did with Kame IPv6) -- for
> example what are pro and cons of integration ALTQ in FreeBSD (I saw
> there was a thread launched sometime in the beginning of May 2001). I
> also saw that ALTQ on FreeBSD seems to be used pretty much even in
> production.
> 
>   What other QoS implementation alternatives are available for FreeBSD ?
> Why didn't FreeBSD follow tracks along OpenBSD's integration of ALTQ ?

You might want to check out the materials on Lucent's ECLIPSE project, which
added layer 2 and 3 QoS as well as disk I/O scheduling to FreeBSD 3.x.

http://www.bell-labs.com/project/eclipse/release/

BMS

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: GRE on 4.x

2002-06-06 Thread Bruce M Simpson

Barry,

I have a working GRE driver (tested against 4.5-RELEASE) which we are
using as part of Consume (www.consume.net). I would be happy to post the
code publicly for peer review, as I'd like to contribute it to FreeBSD.

Regards,
BMS.

On Mon, Jun 03, 2002 at 12:52:02PM +0200, Barry Irwin wrote:
> Hi All
> 
> I'm trying to integrate with a business aprner here who have a requirement
> we use GRE tunnel inside of IPSEC for a number of reasons.  While the gif(4)
> device provides IPIP tunneling (proto 4) this doesnt work when the remote
> side is expecting true GRE (proto 47). 
> 
> Has any one had any experiance working with GRE tunnels under FBSD.  I have
> tried the following two resources I found but not much success on either.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-net" in the body of the message



Re: Vimage howto

2008-12-08 Thread Bruce M. Simpson

Julian,

Thank you (and Marko) very much for preparing this document.

The VIMAGE import has had me at something of an impasse re: the IGMPv3 
branch and clearly written documentation is a big help indeed.


Julian Elischer wrote:

Well not completely, but I've had a number of questions over the
last few months about what it is, so, as Marko and I have written
the following "how to virtualize your module" document, I've been
directing people to it. After another couple of questions I think
this could do with wider distribition..


Thank you also for providing it here on the list, as opposed to relying 
on Perforce alone. Whilst I understand committers rate p4 for 
experimental work in the FreeBSD sphere, sadly it is simply not 
accessible to the not-so-silent majority in the FreeBSD sphere who are 
not committers, which makes its continued use questionable at best.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: how to program a driver?

2008-12-09 Thread Bruce M. Simpson

Espartano wrote:

Actually i know how to program with C language in a basic level but i
don't know nothing about hardware or computer organization, what
topics i should study for gain knowledges about net-drivers ? or if
someone can recommend me books about this topic  i will be very
thankful.
  


Try "The Indispensable PC Hardware Book" by Hans-Peter Messmer for a 
general overview of PC architecture.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: how to program a driver?

2008-12-09 Thread Bruce M. Simpson

[Resend to list for everyone]

Espartano wrote:

Actually i know how to program with C language in a basic level but i
don't know nothing about hardware or computer organization, what
topics i should study for gain knowledges about net-drivers ? or if
someone can recommend me books about this topic  i will be very
thankful.
  


   The seminal work is TCP/IP Illustrated Volume 2 (Gary Wright and W. 
Richard Stevens, Addison-Wesley). Whilst dated it will give you an 
overview of how all the parts in the BSD networking stack fit together.
   It really needs to be updated, however enough things are in flux 
right now that summarising all the changes would be difficult until say 
after FreeBSD 8.0 dust is settled.


   For computer architecture, probably best to learn PC architecture 
these days -- x86 is here to stay, kids, and Netbooks are something of a 
reactionary response triggered by the One-Laptop-Per-Child (OLPC) 
project. In my day, I learned 68000 assembly and C on the Amiga.


   Hans-Peter Messmer's "The Indispensable PC Hardware Book" is a huge 
book which cost me about 50 GBP new when I first bought it -- I was 
working in a reasonably well paid job at the time, but it can be found 
second hand no doubt around the world.
   Cover to cover it will tell you what you need to know about how the 
PC architecture fits together, but if you need more detail e.g. on stuff 
like FreeBSD network drivers, again, it's best to refer back to the 
source code itself.


Hope this helps.

cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Heads up --- Thinking about UDP and tunneling

2008-12-11 Thread Bruce M. Simpson

Hi,

I am missing context of what Max's suggestion was, do you have a 
reference to an old email thread?


Style bugs:
* needs style(9) and whitespace cleanup.
* C typedefs should be suffixed with _t for consistency with other 
kernel typedefs.

* Function typedefs usually named like foo_func_t (see other subsystems)

Have you looked at m_apply() ? It already exists for stuff like this 
i.e. functions which act on an mbuf chain, although it doesn't 
necessarily expect chain heads.


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: last call for L2/L3 rewrite code review

2008-12-11 Thread Bruce M. Simpson

Hi,

Just skimming this I notice it uses the if_afdata[AF_INET] pointer 
purely for lltbl purposes; this clashes with the IGMPv3 code drop.


Please look in the bms_netdev branch, where I introduce a 'struct 
ip_ifinfo' to make more general use of that slot. IGMPv3 needs to store 
per-interface state for AF_INET, so this slot really needs to be shared 
with other AF_INET stuff.


Looks like it needs to be updated for VIMAGE also, hopefully others more 
familiar with this can help -- I am busy enough with non-programming 
activity as it is to get up to speed on this, although I have at least 
managed to print Julian's write-up...


Other than that, it looks like a much needed improvement and we are all 
very grateful for our work on this.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Having problems with limited broadcast

2009-01-07 Thread Bruce M. Simpson

Peter Steele wrote:

..

Based on the discussion in the link above, it doesn't seem like the
problem was entirely resolved by the patches mentioned in this thread.
Has anything been done since this discussion took place. Surely there
must be a way to get limited broadcast to work under FreeBSD.
  


You will need to go to the pcap layer to send limited broadcasts w/o any 
IPv4 addresses configured in a BSD stack for now. If you have an IP on 
the interface, you can just use IP_ONESBCAST.


thanks
BMS
 


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
  


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Having problems with limited broadcast

2009-01-08 Thread Bruce M. Simpson

Peter Steele wrote:

...
It's really a matter of time. We didn't anticipate limited broadcast
being broken in FreeBSD and we're scrambling to come up with a solution.
To be quite frank I haven't done anything with IPv6 before so it would
be more research to get up to speed on this option. It seems our best
option is scapy, which unfortunately I also haven't used before...
  


It's not broken -- it has always been this way in all BSD derived 
networking stacks.


Limited broadcast addresses just don't contain any information about 
where the datagram should go, and this is the case in all other 
implementations. They are similar to multicast addresses in that regard.


Linux has a knob SO_BINDTODEVICE which is partly there to workaround 
this problem, however it isn't the ideal semantic fit.


The folk who point out that link-local addresses could be used, have an 
interesting suggestion which might work for you.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Having problems with limited broadcast

2009-01-08 Thread Bruce M. Simpson

Peter Steele wrote:

The folk who point out that link-local addresses could be used, have

an 
  

interesting suggestion which might work for you.



It's definitely interesting, but it is very likely that some of our
customers will want to be able to set their own IP ranges and not be
limited to 169.254/16. So we need a more generic solution.


Sounds like it's bpf/pcap city for you guys.

A similar bump-in-the-stack to SO_BINDTODEVICE, e.g. let's call it 
IP_SENDIF has been on the drawing board, but it needs appropriate 
security screening -- the ability to bypass the forwarding tables, 
whilst specifying an interface e.g. by index or name, would be desirable 
only for certain privileged processes.


BTW: If you guys are already looking at scapy, you may also wish to give 
pcs.sourceforge.net a look as an alternative.


It is a Python project which I did some hacking on with George 
Neville-Neill who started it. It has BPF/PCAP support out of the box and 
has a number of powerful features, including a packet-level expect() 
facility, which works in a very similar manner to pexpect (Python expect 
for text streams).


I added a scapy-like concatenation syntax ('/' operator) to it as that 
makes plugging packet chains together that much easier.


I have the beginnings of an IGMPv3 test suite in my home repo written 
using PCS, it uses pcap capture. I imagine a DHCP like protocol could 
easily be implemented using PCS too.


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Having problems with limited broadcast

2009-01-08 Thread Bruce M. Simpson

Peter Steele wrote:

...
I personally like this idea, but I'm not sure I can sell it to the
others. Are there any restrictions to these 169.254.x.y addresses?
  


169.254.0.0/16 must never appear outside a link -- it is strictly scoped 
to that link.


Currently the IPv4 BSD stack has no concept of link-scoped addresses, 
but IPv6 does. Link is a realized concept there because of KAME's 
support for the % syntax. Internally, interface indexes get used.


In practice this shouldn't be an issue as long as you can guarantee 
different addresses are used for the 169.254.0.0/16 block on each 
interface, however, it would mean any app using sockets would need to 
explicitly bind to the local address to ensure the correct interface is 
used. Furthermore, we effectively need to be able to support multiple 
next-hops for the 169.254.0.0/16 prefix, otherwise we can support only 
one such interface w/o significant kernel code rewrites.


So, really, LL may not buy you anything at all, and it's likely you need 
to go straight to pcap for your app. These restrictions have existed for 
years, and the fact that they haven't been addressed has largely been 
because there has been no community strategy to deal with it. I 
speculate some BSD-using organisations might have already solved these 
problems, however, without evidence (and code sharing), that's pure 
speculation.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: Having problems with limited broadcast

2009-01-08 Thread Bruce M. Simpson

Bruce M. Simpson wrote:

Peter Steele wrote:

...
I personally like this idea, but I'm not sure I can sell it to the
others. Are there any restrictions to these 169.254.x.y addresses?
  


169.254.0.0/16 must never appear outside a link -- it is strictly 
scoped to that link.


P.S. I checked in a change to ip_forward() a while back which enforces 
this, as forwarding such traffic between interfaces without NATting it 
or otherwise proxying it is a really bad idea (and also breaks the IPv4 
LL RFC).

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: howto determine network device unit number? device.hints?

2009-01-15 Thread Bruce M. Simpson

Yony Yossef wrote:

Thanks for the explanation.
 
So there's no way to determine this in advance.. 
I must build a script that contains my own mapping between MAC addresses and

the wanted interface names and run it after each driver load, rename the
interfaces if necessary.
It seems quite wrong, don't you agree?
 
And how come the unit number is given an arbitrary value? Is there a good

reason for that?
  


Normally the PCI probe runs in the opposite direction from that of 
Linux. It's largely to do with how the NEWBUS code walks the PCI bus. 
From a systems management point of view, yeah, it's irritating, however 
it would probably take more effort (i.e. kernel code) to try to patch it 
to work differently, and not everyone has free time to sit down and 
patch the kernel.


That and (unlike Solaris) there is no *direct* mapping between the 
card's driver number on the bus and its network driver number.


In your case I'm not sure why your two cards would flip order. Could it 
be how your BIOS and hardware set up the PCI IDSEL lines at boot?



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: howto determine network device unit number? device.hints?

2009-01-15 Thread Bruce M. Simpson

Yony,

Bruce M. Simpson wrote:


And how come the unit number is given an arbitrary value? Is there a 
good

reason for that?
  

...

In your case I'm not sure why your two cards would flip order. Could 
it be how your BIOS and hardware set up the PCI IDSEL lines at boot?


If this is the case on your system, then you really need to provide more 
data about your hardware, i.e. motherboard, BIOS, vendor information 
etc. as others point out.


Based on the data you've provided about the issue to date, my best guess 
is that something in the above is different on your system (which is why 
I mentioned IDSEL lines -- the mechanism PCI uses to actually assign bus 
numbers electrically).


Normally the behaviour of FreeBSD's bus probes is well known -- nexus is 
walked for child buses, then these buses are plumbed into NEWBUS, e.g. 
cpu0...cpuN on nexus itself, PCI buses, and PCI subordinate buses in 
that order.


* You mention you don't encounter the issue with Linux, but you may 
already be aware that udev can tie driver instance number(s) to specific 
MAC addresses, although this process isn't fully automatic and any given 
distro may or may not create the persistent udev rules on a first run -- 
so this is comparing apples with oranges.


* [PCI-Express is a special case though, and I've had to sit down and do 
some work with commercial clients to make sure their appliance was able 
to detect devices being in particular slot numbers. Again, though, it's 
just as subject to the PCI enumeration order further up on the bus 
hierarchy as non-PCI-Express drivers.]


So your issue may not be a simple matter of "this seems wrong, this 
doesn't work", though I am sorry to hear it isn't working for you right now.


There are a lot of dynamic factors in the overall picture of the system, 
and what seems to work as expected for many users, may not be working 
for you, and we really need basic hardware information, when folk see 
things like this happening, for any volunteer(s) out there to come up 
with the right solution, let alone the true picture of what's actually 
going on in your specific case.


thanks
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: howto determine network device unit number? device.hints?

2009-01-15 Thread Bruce M. Simpson

Eygene Ryabinkin wrote:

...
I wanted to stress only one point: simple 'kldunload ' and
'kldload ' makes devices to flip for Yony's case.  This means
that unless some PCI hotplug stuff is here (which I don't believe to be
present, because no physical cards are touched and there is actually a
small amount of PCI hotplug support in FreeBSD), no physical PCI devices
get added or removed from the PCI child tree.  It looks like that
something goes wrong during the PCI tree reprobe on the driver module
loading.
  


BTW: Thanks for looking further at the software layer first.

VIM is a wee bit easier to use than a bus analyzer.

Most motherboards don't support PCI geographical addressing, so... I 
wager it's the network driver code which may be the source of the 
problem, based on your analysis!


If this code just doing a blind bump of an instance count and using that 
as a "unit number"... well, that's OK and expected for software virtual 
devices, but is counter-intuitive for something like hardware.


But I don't have any mtnic source, so this is pure speculation on my part.


Correct me if I am wrong, but pci_driver_added from /sys/pci/pci.c will
invoke device_get_children() to get the list of the attached devices,
and for PCI case the list should be static.
  


Yup, that's right.


I guess that when Yony will enable verbose boot and will show us kernel
messages from two successive kldunload/kldload sequences, we will get
some additional information about what's going on.
  


Hopefully he will chime in...

[bms does some google searching *before* he thinks about throwing his 
toys out of the pram at the Orignal.Poster.]


ding :-) [a light bulb above bms' head]

So... Yony. you're writing a driver.
Maybe there's a bug in it?
That's cool, dude.
Hope it's a nice card and you plan on sharing the sweets with the rest 
of the class. ;-)


But seriously, please mention that you are writing a driver in general 
questions you might ask about the whole system, otherwise, FreeBSD 
volunteers will run around going "Is core code broken?" and that's not 
so good for community stress levels as a whole.


with lemonade,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: IGMP+WiFi panic on recent kernel - in igmp_fasttimo()

2009-03-14 Thread Bruce M Simpson

Sam,

Sam Leffler wrote:
This patches avoids the crash.  Not sure how ifma_protospec is 
supposed to be handled so I'm not committing it.


Thanks for this.

I have a test machine ready to be prepped but it's missing a CF card (I 
have none) so need to pick one up from a friend. I have a pci-cardbus 
adapter + a ral(4) CardBus card, but no CardBus ath(4) -- I imagine this 
ain't specific to ath(4) so that should be fine.


I'll try to look at this Sun/Mon, I have a -CURRENT image built for the 
1U box now that just needs bootstrapping (it has a CF slot).


thanks,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP or IP does not work

2009-03-23 Thread Bruce M Simpson

Matthias Apitz wrote:

I went today evening with my EeePC and CURRENT on USB key
to that Greek restaurant; DHCP does not get IP in CURRENT either;
this is somehow good news, isn't it :-)
  


This may be orthogonal, but:
   A lab colleague and I have been seeing a sporadic problem where the 
ath0 exhibits the symptoms of being disassociated from its AP. We are 
running RELENG_7 on the EeePC 701 since the open source HAL merge.
   In the behaviour we're seeing, we don't see any problem with the 
initial dhclient run, the ath0 just seems to get disassociated within 
5-10 minutes of associating.


If we leave 'ping ' running in the background, we don't 
see this problem.


   We have yet to produce a tcpdump to catch it 'in the act' and 
observe the DLT_IEEE80211 traffic when it actually happens, I have only 
seen the symptoms. The AP does not show the EeePC units as being 
associated any more at this point, but ath0 still shows 'status: 
associated'. The AP involved is a Netgear WG602 V2, and is running the 
vendor's firmware.


I'll try to get set up with 'tcpdump -y ieee802_11' from initial boot 
(including dhcp and anything we bump into).


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP or IP does not work

2009-03-23 Thread Bruce M Simpson
The following reply was made to PR kern/132722; it has been noted by GNATS.

From: Bruce M Simpson 
To: Matthias Apitz 
Cc: bug-follo...@freebsd.org, Sam Leffler , 
 freebsd-net@freebsd.org, "Sean C. Farley" 
Subject: Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP
 or IP does not work
Date: Mon, 23 Mar 2009 18:44:42 +

 Matthias Apitz wrote:
 > I went today evening with my EeePC and CURRENT on USB key
 > to that Greek restaurant; DHCP does not get IP in CURRENT either;
 > this is somehow good news, isn't it :-)
 >   
 
 This may be orthogonal, but:
 A lab colleague and I have been seeing a sporadic problem where the 
 ath0 exhibits the symptoms of being disassociated from its AP. We are 
 running RELENG_7 on the EeePC 701 since the open source HAL merge.
 In the behaviour we're seeing, we don't see any problem with the 
 initial dhclient run, the ath0 just seems to get disassociated within 
 5-10 minutes of associating.
 
 If we leave 'ping ' running in the background, we don't 
 see this problem.
 
 We have yet to produce a tcpdump to catch it 'in the act' and 
 observe the DLT_IEEE80211 traffic when it actually happens, I have only 
 seen the symptoms. The AP does not show the EeePC units as being 
 associated any more at this point, but ath0 still shows 'status: 
 associated'. The AP involved is a Netgear WG602 V2, and is running the 
 vendor's firmware.
 
 I'll try to get set up with 'tcpdump -y ieee802_11' from initial boot 
 (including dhcp and anything we bump into).
 
 cheers
 BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


ath0 apparent silent disassociation

2009-03-23 Thread Bruce M Simpson

[Repost without attachment]

OK. We've managed to reproduce this set of symptoms now in our work area.

[If anyone needs to see a pcap, please Cc: me offlist.]

Timebase: beginning of the pcap is in sync with a bringup from
single-user mode; the tcpdump runs in the background from init whilst
the system is brought up.

OK, so I timed the apparent loss of connectivity as 6m 30s from that
point I hit the stopwatch, to when I hit it again when the AP's Web GUI
no longer shows the STA affected as being associated.
Obviously such a timing is subject to human/visual jitter, and how
often Netgear's firmware pulls the STA association list from the AP into
the web GUI.

What stands out in the pcap is that 302.291s in (almost 5m exactly),
the STA (ath0) sends an IEEE 802.11 NULL frame to the AP with the PWR
MGT bit set (I'm going to sleep!). This more or less coincides with a
normal beacon from the Netgear AP. It does not advertise Auto Power Save
Delivery (apsd), that bit is 0.
This is puzzling as we don't enable power management by default. As
I understand it, this may be an AP feature in some environments... I can
try reproducing this with an explicit 'ifconfig ath0 -powersave' and see
if it reoccurs.

You'll see that after this NULL frame is sent, there is another
Probe Request, and the Netgear AP does Probe Respond, but this makes no
difference (I ended the capture around 150s after the NULL frame was sent).

At this point we can't send traffic from the ath0, or rather, the AP
is acting as though it never even heard the STA. The STA learns the AP's
IP address/MAC mapping through passive ARP -- we still see broadcasts on
the SSID -- but the AP has started to totally ignore the STA, and seemed
to have ignored its ARP requests also.
We are using MAC address ACL control with this AP, and the ath0
affected is definitely listed in its ACL table, configured up, rebooted etc.

It is as though the STA is entering power saving mode when not
explicitly told to, and the AP is not waking up the STA as it should.

If any more information needed, or where to look, please let me know
what's involved (I MFCed the change after all, so I'll help where I can
until I'm on holiday this week...)

My lab colleague is just working around this with 'ping ' for
now, that keeps things up, as does OpenVPN...

cheers
BMS



___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/124282: [libc] socket(2): INP_PORTHIGH and INP_ONESBCAST share same value

2009-03-23 Thread Bruce M. Simpson

bru...@freebsd.org wrote:

Synopsis: [libc] socket(2): INP_PORTHIGH and INP_ONESBCAST share same value

Responsible-Changed-From-To: freebsd-bugs->freebsd-net
Responsible-Changed-By: brucec
Responsible-Changed-When: Mon Mar 23 21:45:54 UTC 2009
Responsible-Changed-Why: 
Over to maintainer(s).
  


rwatson@ saw this crop up in -CURRENT and I believe he has a fix. Not 
sure about MFC but it clearly needs to get fixed...


cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP or IP does not work

2009-03-23 Thread Bruce M Simpson

John Hay wrote:

I found doing a -bgscan before it happens, make it not happen. I now
have -bgscan in my rc.conf.
  


That's exactly the workaround I needed. Thanks John.

As Sam points out, the root fix is probably already in HEAD; it would be 
nice to find time to backport, but this works for us for now as a 
workaround (we are just using ath0 as a STA for testing in the lab at 
the moment, it is likely we will use hostap later).


cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP or IP does not work

2009-03-23 Thread Bruce M Simpson
The following reply was made to PR kern/132722; it has been noted by GNATS.

From: Bruce M Simpson 
To: John Hay 
Cc: Matthias Apitz , freebsd-net@freebsd.org, 
 Sam Leffler ,
 "Sean C. Farley" , bug-follo...@freebsd.org
Subject: Re: kern/132722: [ath] Wifi ath0 associates fine with AP, but DHCP
 or IP does not work
Date: Tue, 24 Mar 2009 01:08:33 +

 John Hay wrote:
 > I found doing a -bgscan before it happens, make it not happen. I now
 > have -bgscan in my rc.conf.
 >   
 
 That's exactly the workaround I needed. Thanks John.
 
 As Sam points out, the root fix is probably already in HEAD; it would be 
 nice to find time to backport, but this works for us for now as a 
 workaround (we are just using ath0 as a STA for testing in the lab at 
 the moment, it is likely we will use hostap later).
 
 cheers,
 BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: CARP as a module; followup thoughts

2009-04-22 Thread Bruce M. Simpson

Hi,

Will Andrews wrote:

Hello,

I've written a patch (against 8.0-CURRENT as of r191369) which makes
it possible to build, load, run, & unload CARP as a module, using the
GENERIC kernel.  It can be obtained from:

http://firepipe.net/patches/carp-as-module-20090421.diff
  


There's no need to implement the in*_proto_register() stuff in that 
patch, you should just be able to re-use the encap_attach_func() 
functions. Look at how PIM is implemented in ip_mroute.c for an example.


Other than that it looks like a good start... but would hold off on 
committing as-is. the more general case of registering a MAC address on 
an interface should be considered.


cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: routing local traffic w/o using loopback interface

2007-08-17 Thread Bruce M. Simpson

rajneesh rana wrote:

hello all,

i am opening up two tap interfaces, both connected to bridge, assigning them
IP addresses and want to open up tcp connection b/w them without using
loopback interface, so i bind client socket to first tap using
SO_BINDTODEVICE option and socket server listening on other tap device.
The problem is that when i m calling connect, it is giving timeout error.
  
I am confused by your question because to the best of my knowledge the 
SO_BINDTODEVICE socket option does not exist in FreeBSD.

Is it possible two route traffic b/w two interfaces of same machine w/o
using loopback interface and kernel hacking.

Yes, I use if_bridge for this on a daily basis.

regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Failover default route?

2007-08-18 Thread Bruce M. Simpson

Tuc at T-B-O-H.NET wrote:

In my case, as always, its a bit "special". I have
2 OPENVPN tunnels, which I sent over different transits to
the same end host. On that host, I do my NAT. SO, without
getting into all sorts of hot/heavy things, is there a simple
program to install to ping something via the first tunnel,
and if it can't then switch my default route to the second
tunnel? Or, do I just use a script like here :
As Bill correctly points out, reachability detection using a routing 
protocol is often the preferred method, however this isn't always 
available. Pinging is NOT the best practice, see RFC 1122 3.3.1.4:
http://www.freesoft.org/CIE/RFC/1122/56.htm


You could use ifstated to detect changes in the tunnel interface status 
and switch default routes accordingly, though it doesn't significantly 
reduce the amount of manual scripting you have to do.


Microsoft's TCP implementation performs dead gateway detection based on 
triggered reselection as per RFC 816, however, they have a multipath 
capable FIB which can hold the multiple next-hops and their state -- 
something to consider for later.


An incrememntal piecemeal change which folks might find OK may be to add 
cost metrics back to the kernel radix trie, but that still has all the 
aggro of changing the API.


regards
BMS




___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Route caching ?

2007-08-22 Thread Bruce M. Simpson

Ivo Vachkov wrote:

Does FreeBSD rtalloc*() (or any other) functions implement route
caching and how ? I looked at the code but it's not exactly easiest
thing to read / understand :)
Not really, at least, not in the way one would think. rtalloc() is a 
legacy function.


ip_output() will still call rtalloc() if you pass it a filled out 
'struct route', a structure which is not a route, but an internal 
request to look up a route.


This is a wrapper for rtalloc_ign(), which in turn is a wrapper for 
rtalloc1(), the function which does the actual lookup.


rtalloc_ign() is pretty straightforward. Note however that this approach 
only checks the RTF_UP flag and ifp, nothing more. This makes it 
suitable for implementing floating statics, but nothing more dynamic 
than that.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Allocating AF constants for vendors.

2007-08-22 Thread Bruce M. Simpson
I second Max. If you are going to introduce a bunch of AF_* constants 
into the tree you have to be very careful as AF_MAX is used to size 
arrays and figure out how many radix trie heads to allocate.


It could be argued this wastes a bunch of CPU time and memory, though I 
speculate 'not much' at the moment; I am just a bit concerned that we 
have ifnet->if_afdata which is also sized based on AF_MAX, 37, even 
though most of the protocols in it are never attached to ifnets.


The only domain I've seen which really uses if_afdata is PF_INET6. 
PF_INET does not use it at all. In my opinion, there are structures 
per-family per-ifnet which really belong hung-off ifnet on a 1:1 basis 
and would simplify some of the lazy allocations we have further down in 
the stack.


If AF_MAX increases significantly so will wasted memory. If you are 
going to make any significant changes here, please considering moving 
this stuff to a more dynamic method of allocation.


On the other hand, if you don't need to reference these constants in the 
kernel at all, and they will all exist beyond AF_MAX, then you can 
disregard what I've said and append them to the rest of the list.


That is pretty much what happens for the libpcap/bpf DLT constants 
(which are not an exact analogue of the AF constants - we don't allocate 
other, larger kernel structures based on their value).


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Route caching ?

2007-08-22 Thread Bruce M. Simpson

Ivo Vachkov wrote:

Actually there is:

struct  route_in6 ip6_forward_rt;

that "caches" the last route used (thanks blue !!!) but i think this
technique is pointless in a multiflow traffic.
  


Yes, this is why OpenBSD got rid of this form of 'route caching'.


Is it reasonable to believe that route caches can improve networking
performance or we should leave it up to the routing table itself ?
  


I believe that if one goes beyond a single radix trie, as is needed for 
multi-pathing with multicast and source policy routing, route caching is 
*required* to achieve good performance.


Also, if FreeBSD moves ARP and NDP out of the radix trie, a route cache 
would be highly preferable as it amortizes the lock acquisition which 
would other be required for ARP/NDP/other layer 2 next-hop resolution.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Route caching ?

2007-08-22 Thread Bruce M. Simpson

Claudio Jeker wrote:

Just because you believe that route caches are great doesn't mean it is
true. Show some real code and include benchmarks with various workloads
(e.g. a core router that is hit by many many many sessions).
  


It is a reasonable approach, for a uniprocessor design, to focus on 
optimizing the route lookup as much as possible. Does this approach 
scale to SMP, though? This is still a very much open question and from 
what I have seen of the OpenBSD implementation, it only addresses the 
uniprocessor case - again please correct me here if I have missed any 
details.


I believe the Linux dst cache is strongly tied to the IBM-patented 
Remote-Copy-Update algorithm based on what I've read about their LC-trie 
implementation.



Until now all caching solutions resulted in very bad performance on busy
boxes. Remember ip_fastforward or how was it called? Another example are
all crapy L3 switches that burn down if the CAM (chache) is flodded.
  


I assume you are referring to NetBSD's flow-based IP forwarding cache, 
which was implemented outside of the scope of SMP; spl-style interrupt 
priority masking was still in use at that time.


It is established that saturating content-addressable memory is going to 
lead to the slow path being taken, however, that's the trade-off one 
makes with these designs.



IMO it is better to make the route lookup faster and forget about caching.
  

My concern is that you may be comparing apples with oranges here.

In the case of SMP, locking does become a consideration, and caches, if 
carefully implemented, are one way of addressing this.


On the other hand, CPU affinity has been proposed as a limited solution, 
however it depends how this is implemented - affinity for lookups, 
forwarding, or both?


Perhaps there is something I am missing about how the OpenBSD 
implementation deals with SMP, as I am not as familiar with their code 
as FreeBSD's.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: quagga 0.99.8 on current, tcpmd5 config confusion

2007-08-24 Thread Bruce M. Simpson

Randy Bush wrote:

just did a cvsup build and portupgrade of a six month old -current
i386 system running quagga.  quagga cranked to 0.99.8.  i got
slammed by bgp tcpmd5 requirement.

bgpd[469]: can't set sockopt TCP_MD5SIG 0 to socket 17
bgpd[469]: can't set sockopt TCP_MD5SIG 0 to socket 18
bgpd[469]: can't set sockopt TCP_MD5SIG 0 to socket 22

madly googled and found that i needed to hack kernel for tcp md5
hash, even though i am not using md5 auth (these are not really
infrastructure peerings.  yes i know better for production).
  


This I haven't seen before, then again, it's been years since I've used 
Zebra/Quagga let alone hacked the patch for md5 support, which is now 
~3.5 years old. It was only ever intended as a belt-and-braces attempt 
at getting things up in a way which the sponsor was satisfied with, with 
no other refinements.


I wasn't 100% happy about how I ended up doing the kernel support, and 
had to go with what I had working in my tree because of that old demon 
'economics', rather than doing things 'the right way': i.e. in the IPSEC 
Security Policy Database (SPD), with the routing daemon loading the 
keys, rather than the Security Associations Database (SADB) and keys 
loaded manually using setkey(8).


Other individuals have since made changes to this code. Now that we have 
settled on FAST_IPSEC thanks to gnn's hard work, it will be easier for 
Someone(tm) to pick this up, as KAME IPSEC and FAST_IPSEC interfaced to 
key sockets differently enough to change the implementation of the SPD.



with this kernel, i got a lot of whining about no keys

tcp_signature_compute: SADB lookup failed for 666.42.69.96
  


I remember putting in the SADB lookup failed message to help people 
track down problems with their configuration. If TCP_MD5SIG is not 
enabled on the tcp socket, no SADB lookup should happen, so you 
shouldn't be seeing this message.


It sounds to me as though Quagga may be enabling the TCP_MD5SIG option 
unconditionally based on all of the output you've posted. This is 
obviously incorrect. I can't speak for Quagga, though it seems 
reasonable to suggest that it shouldn't be doing that unless you tell it 
to. I believe the MD5 patches only get pulled in if you request them, 
and that md5 auth specifically needs to be enabled per peer.


Still, this is nearly 4 years on and I have other things going on now.

regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nc captures 1024 bytes

2007-08-28 Thread Bruce M. Simpson

Looks like a netcat bug, if it doesn't tune buffers to the interface MTU.

I'm not sure if nc has a 'de facto' maintainer however I believe it is 
something which was recently imported into the freebsd base system.


Still, it is better to try to field patches with the upstream maintainer 
before filing a FreeBSD PR with your patches.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [EMAIL PROTECTED]: Re: rtfree: 0xffffff00036fb1e0 has 1 refs]

2007-08-28 Thread Bruce M. Simpson

Christian S.J. Peron wrote:

I am not sure who has their hands in the routing code these days so
I figured I would just forward this message off here.  Does the
following look reasonable?
  

I'm looking, but mostly with long range goggles on.

Yes, this looks like the right change. rtalloc1() always returns an 
rtentry with the mutex for that rtentry held.


regards
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: nc captures 1024 bytes

2007-08-28 Thread Bruce M. Simpson

Weiguang Shi wrote:

nc might be waiting on all the interfaces; enumerating MTUs and choosing the 
largest
sounds complicated, especially when some interfaces can be configured to receive 
jumbo frames. Why not just use something like 64KB as the other user

suggested or something even larger?
  


That is the easy fix, yes. :^)

If the socket's pcb laddr is bound to an IP, and IP to which it is bound 
stays on the same physical interface, then the MTU may easily be 
obtained. If it's INADDR_ANY, or you expect the IP to be dynamically 
reconfigured on another interface, then auto-tuning is not possible.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [EMAIL PROTECTED]: Re: rtfree: 0xffffff00036fb1e0 has 1 refs]

2007-08-29 Thread Bruce M. Simpson
BTW: Casual inspection with kscope suggests there is a similar 
free-while-locked issue in nd6_ns_input() (netient6/nd6_nbr.c) and 
in_arpinput() (netinet/if_ether.c).


nd6_ns_input() references rt-»rt_gateway after rtfree(), a potential 
race not to mention a use-after-free.


I haven't checked Coverity for this, but it just doesn't look right.

BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: vlan stacking

2007-08-29 Thread Bruce M. Simpson

Ivan Alexandrovich wrote:

Hi

I'm wondering is anybody using double vlans ("q-in-q",
"vlan stacking", any name you like) on production hosts?
Does it play well with common ethernet device drivers in freebsd
(concerning the frame size) -  fxp, em, for example?

Looks like that almost nobody mentions q-in-q in freebsd 
maillists/forums,

except that nesting ng_vlan can be used to implement it.


I'm sure you or someone else can come up with a creative solution for 
Q-in-Q or arbitrary nesting levels. It's not something I use, so, I pass.


The mainline code doesn't support it without Netgraph; it would be 
necessary to allow vlan(4) to be nested. The ether_input() code demuxes 
802.1q encapsulation but only 1 level. The reason for this is because 
the outer VLAN tag got moved into the mbuf pkthdr structure for 
if_bridge to be able to process it.


I can't comment on the netgraph solution however.

regards
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Allocating AF constants for vendors.

2007-09-03 Thread Bruce M. Simpson

Alfred Perlstein wrote:

Ok, I'm not really sure what to do here.  At Juniper we have approx
20 additional entries for AF_ constants.  We also have theoretical
but not practical "problems" with spareness and utility of this
list, meaning we have plenty of arrays in our version of ifnets and
route entries that are also "bloated" as well.
  


Can you merge them into the list in such a way that AF_MAX does not need 
to slide forward?

Or do they need to be referenced from within the kernel tree itself?

Prevention of code bloat is better than the cure.  Not having the code 
in front of me I couldn't say for sure if we're talking about a dozen 
bytes or several pages potentially being wasted, so it is impossible to 
judge.


One of my concerns is that we have ifnet.if_afdata, we're not really 
using it, it makes sense to use it for some things.


Help from big companies as well as little folks is always appreciated, 
providing we can reach consensus.

Otherwise one other policy would be to specify an allocation
policy such that new AF_ constants are allocated only for even
numbers where odd numbers are left to vendors.

This would slow the "bloat" and still provide vendors with something
useful.

How does that sound?
  


EPARSE? I don't follow this at all.

BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Network stack locking question

2007-09-04 Thread Bruce M. Simpson

Ivo Vachkov wrote:

panic: mtx_lock() of spin mutex 'some_strange_chars' ../../../net/route.c:114

ether_demux() at ether_demux+
my_func() at my_func+
rtalloc_ign() at rtalloc_ign+
_mtx_lock_flags() at _mtx_lock_flags+
panic() at panic+

I do not include GIANT_REQUIRED in my code. Can you propose a solution
or a pointer to information where I can make myself familiar with the
networking code locking ... besides 'man 9 locking' and related.
  


It really isn't as simple as 'read this doc' because the code is subject 
to change - the code *is* the reference - it is constantly evolving. If 
you want to contribute docs, please feel free, Robert may have something 
lying around.


How is ether_demux() calling your function, and does ether_input() 
appear in this call trace? This is counterintuitive and I don't really 
have enough data to go on.


Looking at the code, it seems your backtrace hits the RTFREE() call when 
trying to allocate an rtentry through rtalloc_ign(), are you attempting 
to cache the results of a previous call which may still be locked?


On a more general note.

I suggest is that you *do not* hold any locks when calling ether_demux() 
for whatever reason. I wouldn't recommend calling that function directly 
- the only things outside of the ethernet paths which do this are 
dummynet and netgraph. tap(4) doesn't - it dispatches through ether_input().


When re-entering the bottom of the stack in this way, you *should not* 
hold any locks. rtalloc_ign() currently acquires a lock on its rtentry 
by default, please release it before reentering the bottom half of the 
network stack.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Allocating AF constants for vendors.

2007-09-04 Thread Bruce M. Simpson

Alfred Perlstein wrote:
Can you merge them into the list in such a way that AF_MAX does not need 
to slide forward?

Or do they need to be referenced from within the kernel tree itself?



They are refenced inside the kernel.
  


Let me rephrase that: are protocol domains attached in the kernel using 
these constants?


Pull an AF_MAX 'referenced from' list from kscope or fxr.watson.org and 
you'll see what I mean - we use this constant in a LOT of places, 
particularly in the socket path, it is used to bound loops and allocate 
static array space.


1 extra slot won't make much difference, 20 will -- and this is why it's 
an invasive change.


Prevention of code bloat is better than the cure.  Not having the code 
in front of me I couldn't say for sure if we're talking about a dozen 
bytes or several pages potentially being wasted, so it is impossible to 
judge.



Well, for the most part it's going to be something like 32*sizeof(void*)
so 128 or 256 bytes depending on arch.
  


Adding bloat to ifnet is far from ideal. It isn't a show stopper for 
most use cases, but there are folk out there who use FreeBSD as a VPN 
access concentrator, and creating 10,000 ifnets for their purposes is 
not out of the question for them.


There is also the case of smaller embedded systems. It might not seem 
like a lot now, but it could be a show stopper for some.


One of my concerns is that we have ifnet.if_afdata, we're not really 
using it, it makes sense to use it for some things.



I'll have ot look into this.
  


To give examples:

Consider for example the case of the 'router_info' structure in IGMP. 
Its relationship cardinality w.r.t struct ifnet is 1:1 - for each ifnet, 
there is a router_info - but only if AF_INET is attached to that ifnet.


This is where if_afdata is typically used. I would like to take the 
per-link IPv4 stack structures and hang them off ifnet directly. Reasons:


1. Lazy allocation of router_info leads to potential races in igmp.

These are worked around by constantly checking if a struct is allocated 
or deallocated upon any reference to them. Easy to do in C++. Tedious 
and error prone in C. The struct also needs to be garbage collected when 
an ifnet goes away.


2. We can potentially move protocol domain addresses for upper layers 
into their own struct.


We don't lock a global address list every time we perform an address 
operation.


3. Right now, we can't detach and re-attach protocol stacks to ifnets.

A number of folks are affected, particularly for IPv6, where there are 
perfectly legitimate reasons to bounce the stack in this way, say, 
link-local addresses get screwed up, or even just need to be enabled.




As you can see we are defering the "bloat".
Does that make sense?
  


I follow but it still doesn't really make sense.

Granted, you are deferring the growth of arrays sized off AF_MAX but 
only ever by 1 slot.

What if Vendor Z wants to add 25 entries at once?

We would also be tying ourselves down to the notion of a vendor in any 
AF_ allocation. Is this an avenue that people are happy to pursue?


regards,
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/116077: 6.2-STABLE panic during use of multi-cast networking client

2007-09-04 Thread Bruce M Simpson
The following reply was made to PR kern/116077; it has been noted by GNATS.

From: Bruce M Simpson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc:  
Subject: Re: kern/116077: 6.2-STABLE panic during use of multi-cast networking
 client
Date: Tue, 04 Sep 2007 14:17:53 +0100

 I wrote this, but I may not have time to fix it, because I need to do 
 work other than FreeBSD to support myself.
 
 I have no idea what an elvin or avis is. It isn't clear to me how you 
 are triggering this panic, it looks like you are removing or tearing 
 down interfaces from the system? Are you using a network driver which 
 has IFF_NEEDSGIANT set?
 
 Unfortunately because the ifp lock has to be taken before other locks if 
 IFF_NEEDSGIANT is set, it dereferences the ifp provided which may have 
 already gone away.
 
 The link layer multicast code will try to invalidate the ifp pointer in 
 the underlying ifma. However in this case the cached ifp used is the one 
 in struct in_multi.
 
 Try the following. Change
 
 1063 ifp = inm->inm_ifp;
 1064 IFF_LOCKGIANT(ifp);
 1065 IN_MULTI_LOCK();
 ...
 
 to
 
 ifp = inm->inm_ifma->ifp;
 if (ifp != NULL)
   IFF_LOCKGIANT(ifp);
 ...
 and put
 if (ifp != NULL)
   IFF_UNLOCKGIANT(ifp);
 
 at the end of the function.
 
 It is safe to deref inm->inm_ifma as ifma is refcounted.
 
 
 The real fix is to either eliminate Giant completely or to implement 
 reference counting for struct ifnet.
 
 I should point out that this code gets rewritten for IGMPv3.
 
 Please let me know if this works around the issue. If it doesn't, I'll 
 leave it to someone else for now - there should be enough in here to go on.
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Killing IPTOS_CE and IPTOS_ECT

2007-09-04 Thread Bruce M. Simpson

Rui Paulo wrote:
Well, I was asking for comments regarding on the usage of these flags. 
I was hoping to commit ip.h along with TCP ECN.


This doesn't really need to be before the branch, I think.


Looks fine to me. ECN would be a useful feature to have. AFAIK nothing 
else uses these flags.


Although I do remember it being possible to fingerprint Solaris boxes 
based on their response to the ECN Echo, this was around 6 years ago.


I second Andre's request for a full unified diff if you want it to go in 
ASAP.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Network stack locking question

2007-09-05 Thread Bruce M. Simpson

Ivo Vachkov wrote:

My lookup code looks like the following:

struct sockaddr_in6 *dst = NULL;
struct route_in6 out_rt;

/* ... */

dst = (struct sockaddr_in6 *)&out_rt.ro_dst;
bzero(dst, sizeof(*dst));
dst->sin6_len = sizeof(struct sockaddr_in6);
dst->sin6_family = AF_INET6;
dst->sin6_addr = ip6->ip6_dst;

rtalloc((struct route *)&out_rt);
  


You need to remember to drop the lock which rtalloc() acquires on your 
behalf using RTFREE() before leaving the function or possibly calling a 
function which needs exclusive/write access to the rtentry.


If your code needs this rtentry to remain in the system, a call to 
RT_ADDREF() with the lock held may be necessary, although you should 
remember to RT_REMREF() with the lock held when done with the rtentry.


See «net/route.h» for more info.

regards,
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Thoughts on vlan filter

2007-09-05 Thread Bruce M. Simpson

Jack Vogel wrote:

I had an idea, I was debugging a problem on my new 10G driver a week back,
and found I had the hardware vlan filter enabled by accident, this led me to
wonder about supporting this hardware feature in the driver...

I have done some experimentation, and find that when the vlan device is
configured, ultimately the SETMULTI ioctl will happen in my driver, this
means I could add code that checks the trunk, finds there is a vlan and
then sets the tag into the filter.

Any interest, or thoughts ya or nay about my doing this?
  


I can't say for sure what the right answer here is. It seems reasonable 
to have a means of checking if the setmulti is happening from a stacked 
vlan(4) instance. I think it is reasonable to only support this for 2 
layers of nesting levels i.e. Q-in-Q, in the mainline stack, and 
encourage folks to use Netgraph if they need arbitrary nesting levels.


Kip raised some performance related concerns about the driver lock being 
taken whenever multicast address list changes happen, thus deferring or 
delaying packet flows on other transmit queues, perhaps he can chime in?


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: (forw) Re: Allocating AF constants for vendors.

2007-09-06 Thread Bruce M. Simpson

Alfred Perlstein wrote:

Bruce, I haven't heard back from you on this.  can you please comment?

I'd like to add the policy to the header.
  


I'm not 100% happy with this suggestion, however, it is a loosely 
working compromise.


I would be happier if the static index dependency on AF_MAX is ironed 
out at some later date.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Strange behaviour of route command

2007-09-07 Thread Bruce M. Simpson

Tom Judge wrote:

Hi,

While making some changes to the routing table on one of our routers 
today I noticed that "route add" was showing some strange behaviour. 
When adding a route for 128/8 to the table rather than adding 
128.0.0.0/8 it would add 0.0.0.0/8, however adding 10/9 works correctly.


Is this a bug in route or the routing table?


Run 'route -nv monitor' in another shell while you do this and post the 
output to this list so someone can get more of a handle on it.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Quagga as border router

2007-09-21 Thread Bruce M. Simpson

Folks have been asking about XORP in this thread.

XORP can take a full BGP feed just fine as long as you have enough 
memory.; for a full default-free-zone feed, you are looking at in the 
region of 1GB - 1.5GB, perhaps less if you use aggregation.


If you look at the NSDI '05 paper you'll see that it has a number of 
benefits over existing designs, BGP route propagation in particular 
should be faster:

   http://www.usenix.org/events/nsdi05/tech/handley.html

The architecture is deliberately structured so that forwarding 
functionality may be implemented in hardware. I believe XORP may work 
with the NetFPGA but don't have firm information about this.


IPv6 support is strong as XORP was designed to route IPv6 from the start 
as a whole suite - multicast support is also strong.


regards,
BMS

[Note: my opinion may be biased as I served on XORP core team for a few 
years, and still actively contribute code to the project.]

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Routing problems

2007-10-25 Thread Bruce M. Simpson

LiuJiusheng wrote:

Linux takes 6.6.6.2 as gateway for route 4.4.4/24. But some Oses have the 
gateway 2.2.2.2. (treat 4.4.4/24 as a recursive route).
Is there any standard for this? 
  
No, this is entirely implementation specific. Some implementations of IP 
forwarding resolve the next-hop recursively. Some don't. There is no de 
facto requirement for them to do so in any published standard I'm aware of.


The fact that FreeBSD doesn't is largely out of keeping the 
implementation simple - if the code were to perform recursive resolution 
of the next-hop, then safe bounds would need to be found for the 
recursion. It is cheaper to do this on a forwarding entry add of course, 
but routes can and do change.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: pppoa connection

2007-10-26 Thread Bruce M. Simpson

Nikos Vassiliadis wrote:

flakey fingers...

On Friday 26 October 2007 10:06:30 Kim Shrier wrote:
  

Other people successfully use this modem to connect to their ISP
when the ISP accepts pppoe connections and the modem is configured
as a bridge.  Unfortunately, my ISP doesn't support pppoe, only
pppoa.



The only way to do PPPoA is to have a device that does the DSL and
ATM layers and handles the rest to FreeBSD. 


Nope - there are devices out there such as the D-link single port 
DSL-5xxx modems which are able to bridge ethernet to PPPoA, which allows 
you to *not* use NAT on the device.


They do this by running a DHCP client on the outward face, a DHCP server 
on the inward 'face and allocating itself YourIP+1/24 on that face. Your 
machine inside then gets assigned YourIP.


This is a hybrid form of router/bridging which relies on the IP 
addressing trick. Obviously the subnet mask is wrong - I haven't figured 
out if this can be changed in the firmware - which means trouble if you 
have to route to folk in the same net block.


Most consumer DSL 'access devices' force you to use NAT because they 
don't know how to bridge in this way.


However, if you really need to do native PPPoA in BSD, you need an ngATM 
supported device - /usr/sbin/ppp knows about pppoa devices and should 
suffice for running it over a single VC. This support was originally 
added for the Alcatel Speedtouch.


Of course if you have an ngatm supported ATM card, and an ATM25-to-ADSL 
modem (such beasts exist) you can do it that way too - this is how you 
plug an old Cisco 4xxx into consumer ADSL, by the way.


[I'm not sure if MPD groks PPPoA too, but that would let you channel 
bond with multiple physical circuits.]


I should point out that the use of ATM over xDSL is actually part of the 
G.DMT-lite specs... inquiring individuals can make their own minds up 
about this and why that happened


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: QinQ

2007-10-26 Thread Bruce M. Simpson

Jon,

Thanks for the patch.

Jon Otterholm wrote:

I was wondering about the possibility of adding support for QinQ
("Double tagged frames" / "Nested vlans"). Attached is a patch against
-STABLE to add this support. I have not tested this but was told it
should work.

Would it be possible to get this into CURRENT?
  


In the 7.x train, I made some changes to always decode the VLAN tags and 
embed the information in the mbuf header. I did this to support 802.1p 
quality-of-service in the stack - VLAN 0 frames mean 'the whole subnet, 
not its vlan', and previously the stack just ignored these.


I can't remember off the top of my head if I merged this to 6.x - it 
means the patch herein may not even be needed, unless you need to do 
demux of vlan tags to arbitrary depth, something I think is best left to 
netgraph.


What I can tell you is that if you look at the comments in 
if_ethersubr.c, I left q-in-q as a possible todo item. I don't use it 
myself - however - the same approach might be considered for cards which 
have q-in-q support in their hardware tag/queue processing.


Kip Macy may be able to advise further - I understand the newer 10gbps 
cards are quite programmable in this respect.


However I believe it means we may not need to apply vlan(4)'s notion of 
having to call if_promisc() if the card already knows to supply the 
stack with frames for VLAN 'foo', i.e. if VLAN 'bar' is nested in 'foo'. 
Promiscuous mode is best avoided particularly with high rates of 
packets-per-second.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: icmp echo_user

2007-10-26 Thread Bruce M. Simpson

Matus Harvan wrote:

Hi,

I was wondering if I could get some feedback about the patch and
whether others think it could be committed.
  


Thanks for your hard work on mtund.  I'm not keen on this patch going 
into a mainline kernel, though.


It stomps on bandwidth limitation if that's in effect -- which is a 
possible DoS vector -- and also stops updating icmp protocol counters.


I believe we should track echo requests in netstat -p regardless of 
whether the kernel calls icmp_reflect() or not, as it can readily be 
inferred if a) your diversion to SOCK_RAW is in effect or b) the kernel 
processed the echo request.


I also believe that a user who installs and configures the tunneling 
daemon is in a position to know that the ICMP thresholds need to be changed.


Assuming the tunneling daemon doesn't process echoes unrelated to its 
tunneling (I haven't read the code), then the fact that rip_input() may 
exhaust its socket input buffer will provide a basic form of hysteresis, 
however I would suggest that if you intend to deploy this on the open 
Internet that the daemon either a) provides its own hysteresis too, b) 
tunes itself around the bandwidth limit in effect or c) tunes the 
bandwidth limit itself.


A better approach would be to conditionalise the 'goto raw' next to the 
'goto reflect'.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: UDP catchall

2007-10-26 Thread Bruce M. Simpson

Matus Harvan wrote:

Hi,

I was wondering if I could get some feedback about the patch and
whether others think it could be committed.
  


The UDP catchall patch as submitted here clashes with the blackhole 
functionality, and also bypasses the update of the protocol statistics 
and unreachable port rate limiting. It is not yet suitable for a 
production kernel.


It probably shouldn't trigger the log_in_vain message, however that log 
message is misleading anyway (the reception of UDP datagrams destined 
for unbound ports is not a 'connection attempt').


I would argue that the UDP and TCP catchall feature should perhaps have 
a configurable port range as well, under 
net.inet.ip.portrange.relayhigh/relaylow. This would allow the inpcb 
code to avoid allocating sockets from that range at all -- as well as 
allowing inbound packets for that range to be immediately relayed to 
mtund without the cost of a hash lookup.


Whether or not multicasts are O.K. for catchall should also be 
configurable (bypassing the noportbcast check), however the only way for 
this to work reliably without running multicast forwarding on the same 
node is for the mtund to explicitly join multicast groups -- because the 
code which maps inbound multicasts to sockets has to run further up in 
another block inside udp_input().


If you needed to intercept multicasts on a multicast router, note that 
there is an upcall mechanism. This works along similar lines to the 
RTF_XRESOLVE flag -- the multicast forwarding table is implemented in 
FreeBSD as a hash-table based cache which does not hold all of the 
state, and it communicates with userland using custom IGMP messages on a 
raw socket which never actually appear on the wire in the IGMP protocol. 
Some implementations of PIM depend on this too.


Unfortunately these only go to one socket, ip_mrouter -- however with 
some code changes you could tell mtund about these IGMP upcalls as well.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: TCP listenall

2007-10-26 Thread Bruce M. Simpson

Matus Harvan wrote:

Hi,
  
I was wondering if I could get some feedback about the patch and

whether others think it could be committed. A slightly update version
of the patch is at the end of this email.
  


I have mixed feelings about this patch.

The idea of a TCP socket which magically loses its TCP semantics is 
unattractive -- SOCK_RAW is traditionally where we've put things which 
don't fit the rest of the BSD socket API -- however in this case I don't 
see we have much choice, if what we desire is the ability for a client 
to establish a connection to any ephemeral port with the mtund returning 
from an accept() as usual.


We are bending the rules of the usual TCP semantics here, but that is OK 
because if we directed tlistenall to be a raw IP socket, we'd need a way 
to say to TCP, 'I'd like to create a socket which is already in SYN_RCVD 
state with a SYN whose mbuf has now gone to lunch', assuming we wish to 
create TCP streams business as usual.


The relay port idea I pointed out in my message about udp catchall would 
be especially applicable here -- we may not always want catchalls for 
the entire 16-bit tcp port space.


listenallr is static and is going to get trashed by concurrent threads, 
unless there is a serialization with a lock, which I don't see.


How will inp_tlistenall appear in netstat output? Perhaps assigning a 
LISTEN_ALL state would be helpful for an administrator to clearly see 
that a listenall socket is active? Perhaps checking for TCP_LISTENALL 
set on an unbound socket in tcp_usr_listen() when listen() is called is 
the way to go instead of, or in addition to, using inp_tlistenall?


Again, good work, but needs more polish before it can go into mainline 
(IMHO).


best,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: TCP listenall

2007-10-26 Thread Bruce M. Simpson

Bruce M. Simpson wrote:


The relay port idea I pointed out in my message about udp catchall 
would be especially applicable here -- we may not always want 
catchalls for the entire 16-bit tcp port space.

...
How will inp_tlistenall appear in netstat output? Perhaps assigning a 
LISTEN_ALL state would be helpful for an administrator to clearly see 
that a listenall socket is active? Perhaps checking for TCP_LISTENALL 
set on an unbound socket in tcp_usr_listen() when listen() is called 
is the way to go instead of, or in addition to, using inp_tlistenall?


P.S. This is probably how you get INET6 support for little cost. Hint 
hint. ;-)


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: MPLS implementatrion!

2007-10-28 Thread Bruce M. Simpson

Ermal Luçi wrote:

I was wondering why this implementation of MPLS isn't integrated into FreeBSD?!
http://www.info.ucl.ac.be/~iannone/Files/MPLS-Complete.zip
  


At least two reasons spring to mind:
1. It seems to be targeted at FreeBSD 4.2, which is very old indeed.
2. No mention of it in GNATS or the mailing list that I can see or recall.
3. I'd certainly never heard of it until now, and I've been keeping my 
eyes peeled for these things.


Also the work doesn't seem to be complete: I'm really not sure that the 
ability to open an MPLS socket is useful in anything other than an 
experimental context.


MPLS is not a protocol which is designed with end-stations in mind -- 
it's for routers -- and like any form of traffic engineering, it depends 
on a packet filtering engine at the ingress point. pf could offer such a 
filtering engine.


Whilst it's very cool that someone appeared to have done some of the 
work...  Matthew Luckie came forward a few months back and volunteered 
to work on porting Ayame to modern FreeBSDs.


It is more likely a better fit for FreeBSD and other projects which can 
build on it, so I think it is best we hold off for now.


regards,
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: UDP catchall

2007-10-29 Thread Bruce M. Simpson

Brooks Davis wrote:

While I think this idea has some merit, I think we specifically want
the current wildcard ability to allow for a system that requires
minimal configuration.  The problem with a range is that it doesn't
allow disjoint sets and it requires that if you really do want all the
ports you need to produce a list of currently allocated ports to avoid
allocating.  A more (over)engineered solution holds some attraction, but
I'm not yet convinced the fact that it could exist precludes the current
implementation.


Actually I concur with you on this point, based solely on the disjoint 
sets point.


Another vector of attack would be to put the relay functionality into 
PF, which can do the packet matching. However this of course suffers 
from the problem that if you just want a plain old UDP socket for mtund, 
you won't get that unless you go to the inpcb layer anyway.


But who says mtund needs to use sockets for its traffic relay? There is 
definite appeal in *not* doing it in the socket layer at all -- an 
adaptation of pf's log socket may suffice...


Just my 2c for now...
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Interface address sourced packets go thru default gateway on another interface

2007-11-16 Thread Bruce M. Simpson

Brian Hawk wrote:
Then what would be the reason to bind a connection to a specific 
source address? We do

ping -S A.B.C.D x.y.z.t
to make ping send packets to x.y.z.t over A.B.C.D's interface (and 
source address) or

telnet -s A.B.C.D x.y.z.t

I believe binding an IP's source address to an interface address 
(instead of INADDR_ANY) is to make packets go out from *that* 
interface, not the default gw.


Nope, this has never been the case.

Binding a socket to an address does just that -- it does NOT bind a 
socket to an interface.


The source address selection during an accept() or bind() is chosen 
based on the address provided to the bind() call, or the address from 
which the SYN originated which your code is accept()-ing; the kernel 
will then choose the address 'nearest' to the node which sent the SYN 
for further communication, by doing a route lookup.


During ip_output() the actual interface pointer lookup will take place 
based on the destination address. Then and only then is the actual 
interface selected.


This is a set of behaviours which will have to change in netinet in 
order to support stuff like bind-to-interface, scoped addresses and the 
169.254.0.0/16 link-local block correctly -- we SHOULD be looking at the 
address to which the socket is bound before doing anything (compare with 
Linux's SO_BINDTODEVICE option; which causes layer pollution and I would 
suggest should NOT be implemented in the same way in FreeBSD).


As other contributors have suggested, if you really need source routing, 
use pf or similar for that. I believe ipf also supports route-to on the 
outbound.


cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: plans for multiple routing tables

2007-11-27 Thread Bruce M. Simpson

mosaic wrote:

I would like to ask, whether there are any plans to implement multiple
route tables, like OpenBSD did:

http://archives.neohapsis.com/archives/openbsd/2006-10/2665.html
  


Yup, we're aware of these changes.

[The feature you're referring to is actually the ability to have 
multiple choices for next-hops, not multiple routing tables -- that's 
just how the next-hops might conceptually be presented to the user.]



I'm well aware of fact that i can do policy routing via pf/ipf/ipfw as well of

http://imunes.tel.fer.hr/virtnet/
  


Again, not entirely the same thing. IMUNES is overkill for most people's 
requirements, and is about more than 'just' the forwarding plane; it is 
however a novel and interesting way of doing network simulation or 
virtualization.


There are a whole bunch of potential issues with implementing multipath 
right.


I would suggest, for now, that we just import the OpenBSD changes to the 
existing BSD FIB, as it is a relatively low change in terms of code.


I've responded to Julian off-list about his plans as a number of groups 
and individuals have been looking at this issue.


I would like to see this work out OK, but I do not have 'copious free 
time' in which to do this at the moment -- I gotta earn a living!


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: TCP ECN patch for review

2007-11-27 Thread Bruce M. Simpson

I'm very pleased to see ECN finally being implemented in FreeBSD.

Whilst I can't offer technical assistance in testing or review at this 
time, I would like to thank you for the clearly professional level of 
effort you have put into this.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: tcp md5 checksums broken in 7.0-beta3

2007-11-29 Thread Bruce M. Simpson

Bjoern,

Thanks very much for tracking down and fixing this regression.

When I originally worked on tcp-md5 around 4 years ago, I didn't have 
the luxury of fast enough machines to run VMs, and open source VMs were 
considerably less mature.


One idea that's occurred to me, working on my current project, is to be 
able to run FreeBSD in a virtual machine type emulator (such as QEMU or 
Bochs) as part of a battery of regression tests.


This has the advantage that no invasive changes are needed to regression 
test the networking code, other than customising the kernel config for 
the tests, and hooking up the appropriate software 'test probes' to the 
kernel under test.


It has the disadvantage that some form of temporary store for the root 
filesystem needs to be presented to the kernel under test. I guess this 
could be dealt with by using some kind of NFS server -- again, this also 
has the disadvantage that it goes through the networking code, so being 
able to slap together a very minimal root fs image file would be useful 
here.


Thanks again...
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: per-socket keep-alive options for TCP

2007-11-30 Thread Bruce M. Simpson

Andrew Alcheyev wrote:

I have recently examined the keep-alive mechanism in FreeBSD's TCP
stack and found out that it has no tunable variables for keep-alive on
a per-socket basis.
Is anyone interested in a patch like this one?
http://mail-index.netbsd.org/tech-net/2007/06/19/0001.html

Alternatively, a patch for FreeBSD may introduce a new kernel option.
I would appreciate any suggestions.
  


Seems reasonable. This thread talks about the Solaris implementation and 
the general background to keep-alives:
   
http://jj.tingiris.net/archives/6-TCP_KEEPALIVE-and-SO_KEEPALIVE-on-Solaris.html


And this thread mentions its use in PostgreSQL:
   
http://qaix.com/postgresql-database-development/336-230-re-implement-support-for-tcp-keepcnt-tcp-keepidle-tcp-keepintvl-read.shtml


I'm a bit wary of importing new features into a sensitive and heavily 
used module like TCP without regression tests, though, and it should 
probably default to the current sysctl defaults in use (default to 
keepalives on for each new tcp socket) for traversing stateful firewalls 
on the path.


However in this case we are merely introducing new knobs for fine-tuning 
the keep-alive behaviour, so no big worry here.


Being able to tune on a per-socket basis is *somewhat* useful, however 
what would be useful in the bigger picture is the ability to tune TCP 
behaviour based on path selection, where the path currently chosen has 
radically different characteristics from the general case (e.g. GPRS, 
UMTS, satellite systems).


Cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: TDMA / Interrupts / Pre-emptible

2007-12-07 Thread Bruce M. Simpson

Len Gross wrote:

I have built a "user land" prototype of a custom network protocol for an RF
network.  It is based on Netgraph and using Ethernet rather than real RF.

Eventually, all the code will go into a special piece of hardware, but the
first hardware really will look like an Ethernet card that puts messages out
N microsends after they are put into its memory. Since the protocol employs
some TimeDivisionMultipleAccess (TDMA), "precise" feeding of the board is
important.

In "userland" I seem to have about 1 ms of "delay"/variability from when I
schedule a timer and when it wakes up a thread.  I think this is pretty much
expected behavior and is fine for algorithm testing.

When I move my userland code to "driver/kernel-land" and set a timer to send
a packet to some hardware how much delay / variability will I see in that
timer?  I think the question is more/less equivalent to the pre-emptibility
of driver code and interrupts in general.
  


1ms sounds about right, re the amount of userland scheduler jitter.

I had a horrible experience with the MS Windows userland scheduler. 
Achieving low latency and jitter is particularly difficult there unless 
you go to the kernel. I know there are various methods to get smaller 
timer granularity which I've tried. It was just very variable, appeared 
to be nondeterministic, and it was often off by 10ms or more.


In FreeBSD the userland story is far better, you should be able to just 
crank up HZ, it sounds like you've already done this to arrive at the 
1ms figure.


I can't comment on kernel scheduler jitter though, so someone who is 
working directly in that area will hopefully respond -- arch@ or 
hackers@ might be a better place to field that question.


I believe microsecond resolution for your app should be possible in the 
kernel. If it isn't, I'd like to know why. [It would be really, really 
nice to have better real-time support in FreeBSD, i.e. a deadline 
scheduler.]


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ifconfig: BRDGADD vr1: Invalid argument

2007-12-12 Thread Bruce M. Simpson
My shot from the hip, although I'm pretty much away from this stuff at 
the moment.


Randy Bush wrote:

# ifconfig bridge0 addm ath0 addm vr1 up
ifconfig: BRDGADD ath0: Invalid argument
  
ath0 is IFT_ETHER, so it should be OK to attach it to the bridge -- 
although you won't

get the 802.11 frames bridged.

Could be that for whatever reason, bridge fails to put ath into 
promiscuous mode. Try turning on the dev.ath.0.debug sysctl taps and/or 
watch what src/tools/tools/ath/athdebug/athdebug.c does...


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: arp rewrite...

2007-12-12 Thread Bruce M. Simpson

Julian Elischer wrote:


I think that breaking the arp code from the routing code
need to proceed.


I agree wholeheartedly. The coupling of the ARP code to the forwarding 
code in the BSDs has been largely historical. Other implementations have 
done this, and it generally simplifies the layer 3 forwarding code.


If done carefully, the performance impact should be minimal. rwlocks 
might be the way to go here.


In my opinion this kind of change has been needed for a long time, sadly 
I can't offer any resources to help move this along just now.


Best regards
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bikeshed for all!

2007-12-13 Thread Bruce M. Simpson

Julian Elischer wrote:


I need a word to use to describe the network view one is currently on..
e.g. if you are usinghe second routing table, you could say I've set 
xxx to 1

(0 based)..


current;y in my code I'm using 'universe' but I don't like that..


I would really really like it if we could stop using the term "routing" 
here.

The kernel forwards, it does not route -- routing protocols route.

I know that when BSD started out the distinction was not so clear, but 
it is in most modern implementations, Windows, IOS etc all draw a 
distinction between the currently winning routes used for forwarding, 
and the routes which are actually exchanged or learnt.


So my vote is for "forwarding domain".

I understand that this feature is something which swaps in a different 
forwarding table for the application one is currently running?


And that it works in a manner similar to chroot() ? Is this different or 
the same as the pf/ipf/ipfw tag you mention?


Also, can we retain compatibility with OpenBSD for now, for any 
equal-cost path stuff we do?


Cheers...
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bikeshed for all!

2007-12-13 Thread Bruce M. Simpson

How about "setfib"?

I strongly believe we should deprecate the use of the term "routing" 
where the BSD forwarding plane is concerned, whilst familiar to many it 
is misleading as to what that part of the system is actually doing.


2c
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bikeshed for all!

2007-12-13 Thread Bruce M. Simpson

Hi,

Just to chime in and agree with Bjoern, I'm finishing up a routing 
protocol right now so this discussion is somewhat timely.


I disagree that this is a "bikeshed", quite the contrary -- the visual 
and the verbal have to live together, and it's easy for those of us who 
have the semantic map in our minds right now to dismiss the discussion 
as such.


Try walking away from it for 6-9 months, come back, and try to get back 
into it -- choosing good terminology upfront DOES make a difference to 
maintainability of code, and it will make it easier for others 
(students, newbies, other folk) to get involved.


Anyway:

Some folk (e.g. Marko) prefer the term table, though any way you look at 
it, the fib usually uses a trie as its backend data structure -- 
although the TRASH structure Linux has been using is a cross between the 
trie and the hash table.


So perhaps there is some merit in say... setroutetbl.

after all, folk tend to call a "forwarding table entry" a route for the 
sake of brevity.


Bjoern A. Zeeb wrote:


FIB (Forwarding Information Base) has been very standard for years and
is often confused with foo and bar;-)


Microsoft use this logical separation of routing and forwarding 
functions in their implementation of IP routing, although they don't 
call their "routing table" a FIB, they call it a "forwarding table", and 
the entries in this are called "forwarding table entries".


XORP adopted the RIB/FIB split from the start as a design decision, in 
doing so the functions of routing protocols can be kept logically 
separate from the forwarding plane, which could be hardware, software, 
or even a combination thereof (e.g. Cisco CEF).


The way this has played out follows the traditional BSD way, where 
routing protocols (e.g. routed) live in userland, whilst forwarding 
(e.g. ip_forward()) lives in the kernel.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bikeshed for all!

2007-12-13 Thread Bruce M. Simpson

Julian Elischer wrote:

What I'm implementing is, as Qing said, a form of policy based forwarding
i.e. you can use a broad set of criteria to select a "FIB" (to use the 
terms here) dependent on a number of criteria.

Criteria include source socket (for local connections) which
is derived from process information at socket creation time, or a
socket option. Firewalls such as pf or ipfw can also select a FIB for 
a particular incoming packet to be forwarded.


Thanks. This is exactly how I believe it should play out -- pf/ipfw/ipf 
can be used as packet classifier engines for stuff like this, as well as 
MPLS in future.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: is carp on if_bridge possible?

2007-12-14 Thread Bruce M. Simpson

Niki Denev wrote:

Hello,

Is this possible?
I've tried adding IFT_BRIDGE next to IFT_ETHER and IFT_L2VLAN in ip_carp.c
but this probably is not enough. Any ideas?
  


CARP is 'special' in that it needs to add its own MAC addresses to your 
interface, needs a bit of special cooperation between the IP layer and 
the MAC layer, and it's more than likely that this doesn't work with 
if_bridge.


Like Max says, this is an unusual configuration what are you trying 
to do?


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: initial call for review.. initial multi-fib (routing table) support

2007-12-14 Thread Bruce M. Simpson

Julian,

First of all, thank you very much for starting this work in a much 
needed area.


Julian Elischer wrote:

This is a call for review for a change that is part of a
longer term project.

This implements multiple routing tables. Eventually the implementation 
will be much cleaner but

the first implementation is designed to be backported to 6.x
and thus must be ABI compatible. It need not be particularly 'clean'
as the version in 8.x will be..  First itis needs to be committed to 
-current in its 6.x form so an MFC can occur, then the cleaner version 
can be committed over the top of it to clean it up.


Few comments:
Allocating multiple radix trie heads is one way of doing this, but it 
would be nice to be able to clean up the memory management in the radix 
trie in general.


I've seen implementations which do this by assigning index numbers or 
bit sets to the radix trie entries. That way, you don't need to keep 
multiple redundant copies of the same data around -- this IS the kernel 
FIB after all, and if you're running a router in the Default Free Zone, 
or with a considerable BGP topology, this kind of redundancy in the 
forwarding plane is not an OK use of memory resources.


It's been a few months, but I believe this is how OpenBSD does it; ipfw 
also does something similar deep in its innards, the rules are tagged 
with bitsets to specify which sets they are present in.
 [I see similar memory management issues with C++ STL containers, which 
irritates me; Boost++'s multi_index_container is an analogous idiom.]


One of the big strengths of the BSD radix trie, as implemented by Keith 
Sklower, was that it could be regression tested independently of the 
kernel. I'd very much like to see this capability retained, and perhaps 
expanded upon, as this is a sensitive area of work.


I'd encourage you to take a look at the OpenBSD changes. They are much 
less invasive than this patch, and whilst they don't provide the 
setfib() syscall functionality, that could be easily grafted on top. I 
understand your folk's requirements for multiple tables, I'm sure there 
is a possible fit here given the idioms described herein.


As I say it's been months since I last had a chance to look at this, and 
I am busy finishing up the first phase of another project, so I don't 
have all of these changes to hand -- however -- here's a good date and 
starting position:


   
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/radix.c.diff?r1=1.20&r2=1.21


I know there is an element of Not-Invented-Here which creeps in, but, 
when all is said and done, OpenBSD's approach is viable, compact, and 
simple, and addresses folk's immediate requirements for multi-path support.


They don't address SMP, multicast, or source address selection, but 
those are future development stories.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Spurious error from i[pf]_carp

2007-12-14 Thread Bruce M. Simpson

Tom Judge wrote:
I guess that there will be more than one VRRP implementation that does 
not generate packets with a header the same size as the carp header.


I will look into generating a patch for this over the weekend,  
however any thoughts/suggestions would be appreciated before I start 
working on it.

Sounds fine to me, thanks for doing this.

It is regrettable that CARP had to come into existence in the first 
place because of the VRRP intellectual property situation, and I guess 
this is one of the turds which end up floating in everyone's midst as a 
result, if you'll pardon the analogy.


regards,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Spurious error from i[pf]_carp

2007-12-15 Thread Bruce M. Simpson

Max Laier wrote:
Alternatively you could change IPPROTO_CARP in netinet/in.h to another 
unused protocol number.  This is really the preferred way of dealing with 
mixed CARP and VRRP environments as the CARP packets might in turn 
irritate the VRRP routers, too.
  
This sounds like a common use case. Perhaps there is motivation for 
making the protocol number used by CARP a loader tunable?


[I'd really like it if we had a kernel API for adding the virtual MAC 
addresses to ifnet too, then again I'd like the cheat for infinite 
chocolate fudge sundaes in life, bed and breakfast at The Savoy with my 
choice of actress, etc]

/* no comment */
  
No disrespect to anyone intended, just that CARP does duplicate the 
functionality of VRRP.


It's worth reiterating that this is what happens when software patents 
are allowed to creep in to the nuts and bolts of the operational 
Internet -- and thus, CARP was born, and thus Tom runs into the issue he 
has seen.


later
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Routing SMP benefit

2008-01-02 Thread Bruce M. Simpson

Andre Oppermann wrote:

So far the PPS rate limit has primarily been the cache miss penalties
on the packet access.  Multiple CPUs can help here of course for bi-
directional traffic.  Hardware based packet header cache prefetching as
done by some embedded MIPS based network processors at least doubles the
performance.  Intel has something like this for a couple of chipset and
network chip combinations.  We don't support that feature yet though.


What sort of work is needed in order to support header prefetch?



Many of the things you mention here are planned for FreeBSD 8.0 in the
same or different form.  Work in progress is the separation of the ARP
table from kernel routing table.  If we can prevent references to radix
nodes generally almost all locking can be done away with.  Instead only
a global rmlock (read-mostly) could govern the entire routing table.
Obtaining the rmlock for reading is essentially free.


This is exactly what I'm thinking, this feels like the right way forward.

A single rwlock should be fine, route table updates should generally 
only be happening from one process, and thus a single thread, at any 
given time.



Table changes
are very infrequent compared to lookups (like 700,000 to 300-400) in
default free Internet routing.  The radix trie nodes are rather big
and could use some more trimming to make the fit a single cache line.
I've already removed some stuff a couple of years ago and more can be
done.

It's very important to keep this in mind: "profile, don't speculate".

Beware though that functionality isn't sacrificed at the expense of this.

For example it would be very, very useful to be able to merge the 
multicast routing implementation with the unicast -- with the proviso of 
course that mBGP requires that RPF can be performed with a separate set 
of FIB entries from the unicast FIB.


Of course if next-hops themselves are held in a container separately 
referenced

from the radix node, such as a simple linked list as per the OpenBSD code.

If we ensure the parent radix trie node object fits in a cache line, 
then that's fine.


[I am looking at some stuff in the dynamic/ad-hoc/mesh space which is 
really going to need support for multipath similar to this.]


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Outstanding multipath related PR; Floating statics.

2008-01-03 Thread Bruce M. Simpson

Bruce M. Simpson wrote:


The problem which caused Thomas to raise the PR, is that of allowing 
the prefix route for 192.168.1.0/24 to float over to the other 
interface which *can* reach that prefix.


I should also point out that the problem exists at Layer 2 and Layer 3.

To be sure, if a route already exists for 192.168.1.0/24 which transits 
the downed interface, it will be chosen first. However if ARP entries 
are still in the stack, they represent the most specific match and their 
/32 entries will be chosen *before* any Layer 3 route, regardless of 
interface status.


In other words, the removal of ARP entries from the routing table would 
solve only half of the problem that Thomas has observed.


So I guess the real question is: should the lookup of the next-hop for a 
"connected" route, that is, the network prefix route associated with an 
interface, be allowed to "float" to the next interface which is UP? I 
think it should. Interface routes are NOT STATIC routes.


I'm in favour of retaining the existing meaning of STATIC routes. But I 
also think we should allow the IFP resolution for STATIC routes to 
"float" if they are configured as FLOATING STATICs as this is what most 
folk really want the forwarding code to do.


regards
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Outstanding multipath related PR; Floating statics.

2008-01-03 Thread Bruce M. Simpson

Hi there,

I have been cleaning up my PRs.

The last PR I have remaining in my queue is directly related to the 
multipath work you are doing:

   http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/71474

The problem which caused Thomas to raise the PR, is that of allowing the 
prefix route for 192.168.1.0/24 to float over to the other interface 
which *can* reach that prefix.


Follow the code in in_rmx.c for in_ifadown() to understand why the 
problem with downed interfaces happens.


Traditionally, BSD has not skipped ifnets which are marked as being down 
during outbound interface selection. I'm not entirely sure why we should 
even be looking at downed interfaces for that last part of next-hop 
selection in the first place. Ruslan (Cc'd) suggests there are cases 
where this is necessary, I can think of none.


The mere fact that the first match may be used fails to take account of 
the ifnet referenced by rt_ifp being down would clobber correct path 
selection behaviour for multipath.


To be sure, recursive resolution of the next-hop could solve the problem 
described in the PR, at the expense of introducing unnecessary 
complexity into the kernel FIB.


However, at the end of the PR I suggest a more general approach which 
could be used to deal with the situation where the destination is *not* 
covered by interface routes -- floating statics.


Both Cisco and Juniper implement floating statics in their FIBs, and 
they would actually allow folks who unplug their cable and expect to be 
able to seamlessly fail over to wireless in the most general case, by 
making 0.0.0.0/0 a floating static.


cheers
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Routing SMP benefit

2008-01-03 Thread Bruce M. Simpson

Andre Oppermann wrote:

Haven't looked at the multicast code so I can't comment.  The other
stuff is just talk so far.  No work in progress, at least from my side.


Insofaras rmlocks and cache line size vs rtentry size applies to multicast:

I know there are implementations out there which use the unicast BSD 
routing code to do multicast. This is preferable as the MROUTING 
implementation in the main tree has a 32 vif limitation. Moving this 
into the main radix trie code allows us to overcome these limitations.


Recall that a multipath FIB holds multiple next-hops for each route. 
Multicast routes need the same, but they also need to send traffic to 
all of the next-hops. This is basically what the MROUTING code does, but 
it does so completely separately from the unicast forwarding code. The 
reasons for this are mostly historical -- folks wanted to develop it 
separately from unicast IPv4.


For IETF MANET, ie tactical mobile IP networks, we need to be able to 
address multicast next-hops by their unicast address -- most of the time 
we can't reliably use link layer multicast or even IGMP to reach all 
subscribers, or use PIM shared trees, so flooding is necessary -- as 
well as being able to disable the existing RPF checks on inbound traffic 
from MANET interfaces. In situations like this, 32 next-hop


I'm aware this is only marginally related to the DFZ/tier-1 router 
scenario, but, it's something I want FreeBSD to support as it allows IP 
networks to be deployed in novel situations i.e. where no existing 
infrastructure exists, and centralized/hierarchical network 
infrastructure isn't suitable (think International Rescue).


So it's something to think about for folks doing multipath work -- the 
same performance constraints which affect struct rtentry *now* for SMP 
and multipath work will potentially affect multicast forwarding in future.


regards
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Text for IPv6 Scope

2008-01-05 Thread Bruce M. Simpson

Crist J. Clark wrote:

Anyone up for adding text to the scopeid field in the ifconfig(8)
output for IPv6 addresses? Other OSes do. To avoid too much
disruption to the current format, the text is appended after the
currently printed hexadecimal field.

Example:

fxp0: flags=8843 mtu 1500
options=8
inet6 fe80::290:27ff:fe13:2540%fxp0 prefixlen 64 scopeid 0x5(site-local) 


While we're at it, update the in6.h file to include all scopes
in RFC4291.

Look OK? Anyone up for applying these?
  


This kind of output might be cooler?

fxp0: flags=8843 mtu 1500
   options=8
   inet6 fe80::290:27ff:fe13:2540%fxp0 prefixlen 64 scope site


just my 2c
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: resend: multiple routing table roadmap (format fix)

2008-01-06 Thread Bruce M. Simpson

Vadim Goncharov wrote:


Is multicast and multipath routing the same?


No. They are currently orthogonal.

However it makes sense to merge the multicast and unicast forwarding 
code as currently MROUTING is limited to a fan-out of 32 next-hops only. 
In multicast, next-hops are normally just interfaces.


Also the IETF MANET ad-hoc IP is going to need hooks there; multicast in 
MANET needs to address its next-hops by their unicast address, and 
encapsulate the traffic with a header. This is not true link layer 
multicast -- although it might use link layer multicast to leverage the 
hash filters in 802.11 MACs.


As regards getting ARP out of forwarding tables, this should have 
happened a long time ago...


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: resend: multiple routing table roadmap (format fix)

2008-01-06 Thread Bruce M. Simpson

Julian Elischer wrote:


OK, but we should think about it in the future. In theory, routing 
socket's messages are easily extendable with FIB number in uint16_t, 
as message keeps it's length...


I will do that with the advice of people who know that protocol better 
than I do.


I'm afraid Linux is still ahead of the game here. They adopted a 
tag-length-value protocol called NETLINK which solves many of the 
problems inherent in PF_ROUTE. It even has an RFC.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: resend: multiple routing table roadmap (format fix)

2008-01-08 Thread Bruce M. Simpson

Julian Elischer wrote:

Andre Oppermann wrote:


People with the ultimate need for speed have to maintain their own
trees anyway (Bluecoat, Juniper, Sandvine, Isilon,...) and can afford
to cut some more corners anyway.  


We are trying to get away from that. We are trying to get more BACK
from those companies.



I know I keep rattling my sabre about co-operative development in "that" 
IRC channel.


The IP stack stuff everyone is looking at right now is just one example 
of the kind of development which organisations are normally not prepared 
to sponsor other than in the context of their own projects -- which is 
fair enough, they are, after all, acting in their own interests, even 
though we all stand to gain more from mutualism.


The weevil is eating away at the apple from the inside, the question is, 
who's going to tell it like it is -- and who's actually going to do 
something about it?


Hint: The grass is not necessarily greener on the Linux side of the fence.

cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Text for IPv6 Scope

2008-01-08 Thread Bruce M. Simpson

Bjoern A. Zeeb wrote:

I'd go with something number+space+text as that going to least likely
break existing scripts (unless they match on line end;).

Another thing I am worried about is the output getting more likely 80 
chars which is a bit of a pain.


Drop 'prefixlen' and use / instead:
   inet6 fe80::290:27ff:fe13:2540%fxp0/64 scope site


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: resend: multiple routing table roadmap (format fix)

2008-01-08 Thread Bruce M. Simpson

Vadim Goncharov wrote:
Compare it with a scheme where for EVERY forwarded packet, there is a 
need for DOUBLE lookup - after a routing one, do another in L2 table.


ARP lookups will generally use a cheap hash once split. What's the problem?

The PATRICIA lookups are more expensive, to be sure. Don't forget, 
though, that with moving L2 info out of PATRICIA, those host routes 
disappear from the table too, and thus their overhead during the tree walk.


rmlocks for L2 and L3 are probably going to be cheaper compared to a 
global mutex.




Current routing table implementation, with all disadvantages of 
combining L2 and L3, have from the same combinig a one HUGE benefit - 
performance. And never, ever, ever, ever even try to split L2 from L3 
with losing that performance - then it should be still never split, 
despite all disadvantages, and you'll become an enemy of many, many 
users. Especially while caching allows to do things reasonably fast.




I disagree. The architectural benefits of taking ARP cache entries out 
of the routing table seem quite clear to me.


Other implementations have done this and seen it bear fruit, and your 
argument here sounds like hyperbole rather than cogent and reasoned 
argument about why this shouldn't be done.


If you have grave doubts about this which the rest of us aren't seeing, 
publish benchmarks?


One place to start might be to take Qing's code, run with it, and look 
seriously at it in a profiler such as Valgrind. But I'm preaching to the 
choir here...


Cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: the socket can't bind?

2008-01-11 Thread Bruce M. Simpson

SnaiX wrote:

So it would run 1-4. but it reports EADDRINUSE after bind.

Why?


The stack assumes that SO_REUSEADDR is never cleared on a socket after 
it gets set.



 How to resolve it? Should I dup() the fd?
  

Did you close the affected socket in (5) ?

Presumably you are still trying to use the same port; is it a 
non-ephemeral port? You might want to consider SO_REUSEPORT although I 
believe FreeBSD doesn't fully support it for non-multicast.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Unexpected multicast IPv4 socket behavior

2008-01-12 Thread Bruce M. Simpson

Hi,

This is ironic because I've been up against a similar problem with 
255.255.255.255 on my current project, which also requires a 'bump in 
the stack', and the same code you've posted the patch for, I found 
myself reading yesterday to answer another chap's query.


Fredrik Lindberg wrote:

Hi

I find the following socket behavior a bit unexpected. Multicast from
an IPv4 socket (with IP_MULTICAST_IF set) with its source address bound
to INADDR_ANY only works if there is a default route defined, otherwise
send() returns ENETUNREACH.

Default route set, src INADDR_ANY : Works
Default route set, src bind() to interface address : Works
No default route, src INADDR_ANY : Returns ENETUNREACH
No default route, src bind() to interface address : Works


Totally expected behaviour. There's no way for the stack to know which 
interface to originate the traffic from in the case where there is no 
default route, and no IP layer source information elsewhere in the stack.


It could be argued that case 3 is in fact an abuse of the APIs. In IPv6, 
the use of multicast requires that you create a socket and bind to the 
interface where you wish to send and receive the channel. This is 
reasonable because both IGMP and MLD require that their group state 
traffic is bound to a specific address. Thus the glaring holes in IGMP 
due to the lack of IPv4 link-local addressing.


The newer multicast APIs in fact require you to do this, precisely to 
avoid this ambiguity. As such IP_MULTICAST_IF should be considered 
legacy -- however -- as we've seen, there's a lack of knowledge out 
there about exactly how this stuff is supposed to work.




In all cases IP_MULTICAST_IF was set to the outgoing interface and
IP_ADD_MEMBERSHIP was properly called. IGMP membership reports
were seen on the link in all cases.


Now, if you are explicitly telling the stack which interface to use with 
IP_MULTICAST_IF, and you are seeing the regression in case 3 above, THAT 
looks like a bug.




I believe the cause of this (unless this is the expected behavior?)
is in in_pcbconnect_setup() (netinet/in_pcb.c) [1].
The check for a multicast destination address is run after the attempt
to get the source address by finding a directly connected interface,
this check also returns ENETUNREACH if it fails (which it does for the
destination 224.0.0.0/24 if no default route is set).


But but but. Sends to 224.0.0.0/*24* should never fail as it is strictly 
scoped to a link, and does not require any IPv4 route information. This 
is the lonesome kicker -- IP needs to know where to source the send 
from, however, you've told it to already with IP_MULTICAST_IF, so there 
is definitely a bug.


See the IN_LOCAL_GROUP() macro in -CURRENT's netinet/in.h for how to 
check for 224.0.0.0/24 in code.


In fact we should probably disallow multicast sends to this address when 
the socket HAS NOT been bound, except of course for the case where the 
interface is unnumbered -- but we still need a means of telling the 
stack about this case. The answer might be something called IP_SENDIF... 
Linux uses SO_BINDTODEVICE for this. It's a case of sitting down and 
doing it.


It's reasonable to assume that multicast applications should know that 
they need to walk the system's interface tree and be aware of interfaces 
and their addresses. Apps which don't do this are legacy and need to be 
updated to reflect how IP stacks actually behave now.




Moving the multicast check before the directly connected check solves
this (or any other combinations that makes sure that the
IN_MULTICAST() check is executed).


You are quite right that the imo_multicast_ifp check needs to happen 
further up.


This is probably OK as a workaround -- but -- bigger changes need to 
happen in that code as currently source selection is mostly based on 
destination. This isn't always the case, and in multicast it certainly 
ISN'T the case as you have seen.


SO_DONTROUTE is something of a misnomer anyway. Routes still need to be 
present in the forwarding table for certain lookups, and the source 
interface selection is almost wholly based on the destination faddr in 
the inpcb, in both the cases of connect() and temporary connect during a 
sendto().


Your patch should be OK to go in. Regardless of whether there are routes 
for
the multicast channel you're using or not, IP_MULTICAST_IF is a 
sledgehammer which says 'I use THIS interface for multicast', and until 
our IPv4 stack has link scope addresses, it will be needed.


Thanks again...
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Unexpected multicast IPv4 socket behavior

2008-01-12 Thread Bruce M. Simpson

Fredrik Lindberg wrote:


I would expect this _without_ IP_MULTICAST_IF set, however as I said
the interface had been explicitly set with IP_MULTICAST_IF in all 4
cases, so there indeed is enough information in the stack to send
the packet.


Correct. You found a bug. Well done.



If IP_MULTICAST_IF should be considered legacy, I'll move away from it.
But, as you said, there is probably a lack of knowledge on how the
APIs should be used and I have never seen anyone or any document
(maybe I haven't looked hard enough) that suggests that this usage is
deprecated.


The fact that IPv4 multicast sends appear to work using the default 
route is a historical quirk. It is not multicast forwarding.


For a host/endstation, the mere fact that the group was joined on a 
given socket, on a given interface, should be enough IP layer 
reachability information for the inpcb layer to figure out where to send 
the packets. From that point on, it's the problem of the multicast 
routers on the path between the end-station and the other members of the 
channel, which are normally speaking PIM-SM.


If one follows how IGMP works, then the problem with multicast joins 
which are not scoped to an interface is readily obvious. IGMP/MLD is 
necessary to inform upstream routers that the channel is being opened -- 
otherwise, you will not receive traffic for the group, as the state 
about the end-station's participation in the channel is never 
communicated to routers.


The endpoint address used by the local end of the path in MLD is the 
link-scope IPv6 address. In IGMP, it's the first IPv4 address configured 
on the interface. Both IGMP and MLD are always scoped to the local link 
-- they deal with multicast forwarding and membership state ONLY in the 
domain of the link they are used on.


IPv4 has historically not had link-scope addresses, which are one 
possible answer to the problem. Ergo there is a problem if the interface 
is unnumbered -- or if the inpcb laddr is 0.0.0.0 -- which you have 
seen. It should be possible to use IP_MULTICAST_IF as a workaround for 
this, however, you found that path is buggy...


I guess the textbooks out there haven't caught up with reality.



I wouldn't expect anything in 224.0.0.0/4 to fail
_with_ IP_MULTICAST_IF set.


Correct. This makes the bug even more damaging. It is reasonable for a 
system to be using multicast during early boot when all interfaces are 
unnumbered.


In fact the IGMPv3 RFC suggests no IGMP traffic should be sent for 
groups in 224.0.0.0/24, becuase upstream IGMP routers should never be 
forwarding these groups between links.


Unfortunately, in practice, this can break layer 2 multicasts for these 
groups which traverse IGMP snooping switches.




IP_SENDIF/SO_BINDTODEVICE seems to show up from time to time, is
the only reason that it hasn't been implemented simply that nobody
has done it?


Yup. Everyone seems to be too worried about unicast traffic and bulk I/O 
performance to bother much with other applications of IP, so, this sort 
of issue gets more airtime elsewhere.


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Programming interface MAC filter without enabling PROMISC on an interface from user space.

2008-01-14 Thread Bruce M. Simpson

Tom Judge wrote:

Hi,

I have just started experimenting with OpenLLDP and come across a 
little bit of a nasty.  When it opens the interface, it puts it into 
PROMISC mode,  which I don't really want to happen.  Is there any way 
to add the LLDP MAC address (01-80-C2-00-00-0E) to the interface mac 
filter from user space, so that the interface does not have to be set 
to PROMISC?


There *is* an API for this but it's not integrated into pcap or bpf; see 
SIOCADDMULTI and SIOCDELMULTI. There are some issues with doing that 
portably, Windows and Linux do things somewhat differently in this space.


Really we could do with a KPI for this so that the references are 
properly refcounted. If you have other link layer multicast listeners 
it's not guaranteed that the stack will correctly restore the hash 
filters at the driver level if it has to enable ALLMULTI mode.


You almost certainly don't want to set PROMISC if you are ever going to 
do any kind of IP forwarding, although I believe I fixed that historic 
bug whereby the IP layer kept seeing its own packets about a year ago.


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Programming interface MAC filter without enabling PROMISC on an interface from user space.

2008-01-14 Thread Bruce M. Simpson

Tom Judge wrote:
Thanks for the response.  I have a quick grep of the src tree to find 
an example of this being used and only found the following from 
wpa_supplicant and I have a few questions:


* I am presuming that this will do what I want, am I correct?


Yes, it will attempt to add the given link layer multicast group to the 
ifnet's underlying device driver.


* If I was only ever to add the address to an interface an never 
delete it would this cause any problems?  I.e. when lldpd ends, or is 
restarted and tries to add the address again?


SIOCADDMULTI is very low level, no resource tracking is performed; I 
changed its semantics to only allow one userland opener so that 
in-kernel refcounting would work, as there is no per-process or 
per-client resource tracking -- so it's a really good idea to clean up 
after it.




* Alternatively is there a way to query the filter to ask what 
addresses it is currently programmed for?


Nope, there is no userland or kernel API for that unless you hack up the 
driver.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Programming interface MAC filter without enabling PROMISC on an interface from user space.

2008-01-14 Thread Bruce M. Simpson

Tom Judge wrote:


Ok, so if I can safely assume that the process sending/receiving the 
LLDP frames should always be running would it be safe to use a helper 
program to add the mac on system startup so it is always registered on 
particular interfaces for the uptime of the system rather than having 
the daemon add/remove the address on startup shutdown?

  If not what problems would this create?


If the daemon doesn't unregister its use of the link layer group, the 
kernel will not clean up after it. It won't prevent kernel entities from 
joining the group -- they will just acquire another reference -- but 
other userland clients will not be able to join the group until 
SIOCDELMULTI is called by at least one client.


By the way, other processes can hijack this, but only if they have 
permission to use SIOCDELMULTI. I believe this requires root privileges.


I believe it should be possible to use mtest to clean up manually.

This is far from ideal and it really does want an API. NDIS, 
incidentally, can do all of what you describe.


Personally I can't see why this approach would be a problem,  but I am 
not a expert.  The address is defined in IEEE Std 802.1D-2004 as to 
not be forwarded by bridges (which I interpret as it being link local 
in a sense as switches/bridges are not allowed to forward the frame), 
so I can't see it being a problem registered on multiple interfaces.


SIOCADDMULTI memberships are specific to the interface you request them 
on. I can't speak for the bridging code -- I don't think it does any 
special handling of multicast frames, however I'm not sure if it's smart 
enough not to forward this group. Like IN_LOCALGROUP() it might need its 
own 'don't forward this' clause.


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Relayd (former hoststated) status for freebsd 7.0RC1

2008-01-15 Thread Bruce M. Simpson

Alexandre Vieira wrote:

Hello all,

I remember that there was a port (net/hoststated) where I could install
hoststated to use with PF. Anyone can shed a light on what is the status of
this software implementation on 7.0?
  


Perhaps ports/net/ifstated is the answer?

BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: help

2008-01-21 Thread Bruce M. Simpson

Enovation Technologies wrote:

i configure with sysinstall my second nic , but when i restart my box  i have 
this message
arp: 10.200.1.1 is on re0 but got reply from 00:50:7f:b0:a0:f8 on re1


my question is how to configure  2 nics with different ip on same box  in the 
same subnet.
  


Configure the second with a /32 prefix (netmask 255.255.255.255) instead 
of the usual netmask.


You will always receive the arp warning unless you disable it by setting 
sysctl
net.link.ether.inet.log_arp_wrong_iface and 
net.link.ether.inet.log_arp_movements to 0.


These limitations *may* go away in future releases.

later
BMS

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: network interface monitoring?

2008-01-23 Thread Bruce M. Simpson

Yousif Hassan wrote:

ifwatchd has not been ported to FreeBSD - does FreeBSD have anything
similar?


Try ports/net/ifstated (from OpenBSD).

BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: tcp-md5 check for incomming connection

2008-01-29 Thread Bruce M. Simpson

Ingo Flaschberger wrote:

Hi,

linux does already support tcp-md5 checks for incomming connections, 
but freebsd not.


I would like to implement this feature into freebsd.
Any hints/wishes/considerations that I should consider?


Someone(tm) keeps threatening to do this every 9-12 months, but I've yet 
to see patches.

- Another example of open sorce (What's missing? U!)

Inbound processing for tcp-md5 isn't really that big a deal, I'm amazed 
it hasn't been deprecated and replaced with something less gnarly, but 
that's the inertia of stuff at internet exchanges for you and with good 
reason too.


I don't have free time to do any of this (volunteer work doesn't pay the 
rent, and the costs of living spiral ever upwards), but I can try to 
make time to review patches if Someone(tm) writes the support.


I believe one of the KAME guys took this and ran with it in NetBSD, so 
look there first, pretty sure it checks the inbound.

And of course Kip needs to be in the loop so it works with TOE.

One of the things which I didn't finish was integrating TCP-MD5 with the 
SPD too instead of only the SADB. This meant gnarly syntax for setkey(8).


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: tcp-md5 check for incomming connection

2008-01-30 Thread Bruce M. Simpson

The bigger issue w/tcp-md5 is getting security policy 'right'.
bz has more IPSEC hacking experience than I, so I defer to his advice in 
this area.


The way the socket option was originally specified was that once it was 
set, all further activity on the socket had to be tcp-md5'd. For an 
outgoing connect() this is pretty much assumed in the beginning. For a 
listen() and bind(), it means all further sessions on that port must use 
tcp-md5 to be accepted.


However this obviously poses problems if you want to be able to accept 
connections on the same port from non tcp-md5 peers. And for BGP, which 
can open the underlying tcp session in either direction ('passive open', 
jittered) it's also important that the tcp-md5 state of the socket is in 
sync with the routing process's notion of policy.


ospf sidestepped all this by using raw IP datagrams, so there was no 
need to implement authentication in the network transport layer.


So, the SPD seems like the way to go! Trouble is, routing daemons aren't 
IPSEC daemons, nor do they speak the RFC specified protocol for this, 
PF_KEY. I toyed with the idea of rolling one for XORP but there hasn't 
been any demand.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: modifying permissions in /dev

2008-02-04 Thread Bruce M. Simpson

lysergius2001 wrote:

Hi

Recently installed AMD64 6.3-stable and I am having a problem with
devfs.conf and /dev.  I understand the entries in devfs.conf should modify
the permissions on devices in /dev.  For some reason or other this is not
happening.  Can anyone shed some light on this?  What am I doing wrong?
  


Try using devfs.rules -- devfs.conf entries will not be applied after 
boot, unless you force them to be reapplied by running /etc/rc.d/devfs 
start from a superuser shell.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: traceroute AS path patch

2008-02-17 Thread Bruce M. Simpson

Rui Paulo wrote:


On Feb 17, 2008, at 9:30 PM, Rui Paulo wrote:


Hi,
The attached patch ports a traceroute functionality from FreeBSD 
called AS path.
I mean, "ported from NetBSD". 


AS lookup is already in the NANOG traceroute in ports -- however I like 
the look of this patch better, it looks much cleaner. +1 from me.


cheers,
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Multiple default routes on multihome host

2008-02-19 Thread Bruce M. Simpson

Tom Judge wrote:


However FreeBSD's routing table does not currently support policy 
routing without some help from the firewall.  The only way to achieve 
your goal is to use one of the firewalls (pf/ipfw/ipf) to do the 
policy routing for you.


If anyone wants to take this on, start looking at inpcb, bind, and 
ip_output(), and try to bug me for help -- "human resources", tcaahh 
I'm getting old :-)


later
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Multiple default routes on multihome host

2008-02-19 Thread Bruce M. Simpson

Nick Barnes wrote:

I want packets from address A1 to be sent via gateway G1, but packets
from address A2 to be sent via gateway G2.

How do I do this?  Can I just have more than one default route?  I'm
remote from the machine in question, so I don't want to tinker with
the default route until I'm sure of the answer.
  


Others have chimed in saying that having redundant routes constitutes 
poor network design: it really depends where one draws the distinction 
between router and host. In ad-hoc and peer-to-peer networks, there is 
no such distinction.


The forwarding code doesn't support multiple routes to the same 
destination, largely out of development inertia. People are looking at 
this now.


The forwarding code doesn't support load balancing yet, it's being 
considered for the future. There are problems with load balancing and 
TCP as it can result in loss of the original packet ordering. Of course 
this is something which stuff like SACK *begins* to address, it is a 
scenario more common in satellite networks.


However you want next-hop selection based on the "laddr" for a socket 
which is a different thing. The stack doesn't do this on its own, it 
needs help from packet filtering code.


You should be able to achieve this using "route-to" rules in IPFW or PF, 
there are tutorials out there on the subject.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in 6.3-RELEASE when multi-cast client exits

2008-02-19 Thread Bruce M. Simpson

Rob Watt wrote:

Hi.

We recently upgraded some of our machines to 6.3-RELEASE and we have been
plagued by repeatable panics when our multi-cast client applications exit.
Our machines have Intel X5365 processors, LSI MegaSAS 1064R cards, and Intel
Pro 1000 MF nic cards (although we have seen this problem with the onboard
Intel copper nics as well). We have seen this panic with machines that have
Tyan boards as well as Super Micro. I have seen a few postings that seem to
refer to related panics, and bug
http://www.freebsd.org/cgi/query-pr.cgi?pr=116077 contains a patch that
seems like it should address the problem, but our patched system still
panics. I have attached the output from 3 of the dumps/backtraces. Dump #1
is probably the most useful. I am happy to provide more info if necessary.
  


Some folk reported that they didn't see this problem occur with the code 
in 7.x, which jibes as I rewrote some of the logic in that branch. It's 
been nearly a year since I last had time to look at anything related to 
this.


My understanding is that 7.0 is getting closer to release status so you 
may wish to try reproducing the problem there.


The human resource situation hasn't changed much on my end, though I am 
getting closer to having time to finishing IGMPv3 (it's needed for other 
stuff in the future). I haven't been able to reproduce the bug in the 
PR, which makes suggesting other courses of action difficult.


cheers
BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Two interfaces sharing the same IP address: how to change default route's interface on link change?

2008-02-19 Thread Bruce M. Simpson

Jeremie Le Hen wrote:

In summary, favor wired connectivity over the wireless one, at any time:
could this be at boot time or not.

I'm pretty sure I'm not the only one who wants this kind of setup.  So
how did you achieve this setup?


The forwarding code needs to be changed to support the notion of a 
floating static, regardless.


Recall that in BSD default routes configured statically, whether 
manually or by DHCP, have the RTF_STATIC flg set.


Currently, the BSD behaviour is NOT to update the rt_ifp for an 
RTF_STATUC route when ifadown is called. I believe this to be correct 
and honours the original API contract of RTF_STATIC. It is not what you 
desire in your use case however.


Configuring ifstated to manually replumb addresses and routes is 
probably an easier place to start. Seamless migration is not possible 
yet; generally sockets are tied to the interface where they were 
implicitly bound, also nexthop selection happens purely on the basis of 
destination address.


BMS
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


  1   2   3   4   5   6   7   >