On Thu, Dec 08, 2005 at 02:14:45PM -0700, andrew fresh wrote: > On Fri, Dec 02, 2005 at 04:08:13PM -0700, andrew fresh wrote: > > I am getting 3 different DDB's. Mostly "kernel: page fault trap, > > code=0" and "Panic: rtfree 2". I have also gotten some "Panic: sbdrop", > > but not since I got the serial console attached. When I got the sbdrop, > > trace showed calls to pf_* but I did not write it down as I thought I > > would see it again with the > > serial console. > > > > It seems to DDB anywhere from 5 minutes to 90 minutes after a reboot. > > Once I got 6.5 hours, but mostly closer to 10 minutes. The only thing > > that seems to make a difference is disabling pf, I am up 17.5 hours now > > with pf disabled. > > > > DMESG and the trace/ps from the DDBs are below. > > They are actually available in the archives so as not to waste > bandwidth. > http://marc.theaimsgroup.com/?l=openbsd-misc&m=113356535818065&w=2
the whole thread is here: http://marc.theaimsgroup.com/?t=113333257900001&r=1&w=2 > > > > or something with 'route-to' in pf? > > It appears that it is the route-to that is causing it to crash. I believe my router has been crashing because I was generating routing loops the way I was using route-to. It appears after a route-to, the packet then gets re-evaluated by additional rules including additional route-to rules (as it probably should). If I have this rule pass out on { san0, san1, san2, san3 } route-to { (san0, 10.0.0.1), (san1, 10.1.1.1), (san2, 10.2.2.1), (san3, 10.3.3.1) } round-robin If san0 is the default route that the kernel picks (no kernel multipath), I think it does something like this: First packet hits san0 and gets routed out san0. Second packet hits san0 and gets routed to san1, then san0, then san2, then san0, then san3, then san0, and out san0. Third packet hits san0 and gets routed to san1, and out san1. Fourth packet hits san0 and gets routed to san2, then san1, then san2, and out san2 Fifth packet kits san0 and gets routed to san3 then san2, then san3, and out san3. Sixth packet hits san0 and gets routed out san0. Seventh packet hits san0 and gets routed to san1, then san2, then san1, then san3, then san0, then san2, and out san2. At some point, the loop becomes long enough to cause ddbs. With multiple packets at once, the round robining may be able to get the loops even longer. I don't know what the proper fix for this would be if anything, but something that says "Rule X has already rerouted this packet, there may be a loop somewhere" error message would be nicer than a page fault, or rtfree 2 ddb. I could also be completely wrong as to the cause of the crashes, but this seems to be a fairly good guess. I resolved the crashing by adding some tagging smarts to the rule: pass out on { san0, san1, san2, san3 } route-to { (san0, 10.0.0.1), (san1, 10.1.1.1), (san2, 10.2.2.1), (san3, 10.3.3.1) } round-robin tag ROUTED ! tagged ROUTED This has so far made the load balancing work very well, and it has gone for over 48 hours and not DDB'd yet. l8rZ, -- andrew - ICQ# 253198 - JID: [EMAIL PROTECTED] Proud member: http://www.mad-techies.org BOFH excuse of the day: Dyslexics retyping hosts file on servers