Doug,

:> :ping another device with interpacket delay of 0 and a count
...
:> Define what you mean by "interpacket delay".  Are you referring to an
...
:cisco router. extending ping. 0 delay.
:I was speaking of cisco ping.
:I should have said 'timeout'. mea culpa.

Ahhhh.... between your using the term "interpacket" and your saying
that Sun was talking about "jabber", I had assumed you were talking
about the ethernet IPG / IFG.  Ignore my "don't complain about your
ethernet being DoS-ed if it's out-of-spec" remarks.  :)

:> For that manner, define "ping".  You're certainly not talking about
:running ping on the cisco to another device (preferably a fast
:cisco as the source and a nice fast interface like a gige or
:a IP/sonet)

Cisco extended ping where you answer the prompts in a way to perform
do a flood ping...  gotcha, makes somewhat more sense now.  

:dedicated, switched Ethernet here.
:it seems to mostly overwhelm the sun's interupt processing, but
:that's just a theory since Sun has decided that the solution is to
:unplug the machine on the other end.
:
:We're only talking about 14000 packets per second to kill a netra
:T1. I've been able to drive one faster than that via other means
:without causing a 'jabber effect'.

How are you concluding that this is "jabber"?  Does "netstat -k" show
the jabber and relevant physical-layer counters incrementing on the
ethernet interface in question, or is Sun just labelling that kind of
traffic flow as jabber in some generic sense?  The term "jabber" has
specific meaning in some network contexts, and you should be able to
determine if it really is or isn't physical-layer jabber if relevant
netstat -k counters are incrementing.

...
:> Now, that doesn't mean the -console-
:> should go out to lunch (sounds like you're getting a little too much
:> "The Network Is The Computer" :) ), 
...
:indeed. that's my issue, the console should not be hung. The machine
:should not require a hard reset. And, I do not believe there is
:an electrical problem. I'm not doing anything down that low, It's
:just a TCP/IP stream, and, a not outrageous one at at that.

Well, it's an IP stream, not necessarily TCP/IP -- more like ICMP/IP.
I don't see an option for Cisco's extended ping to do a "TCP" ping:

http://www.cisco.com/en/US/tech/tk365/technologies_tech_note09186a0080093f22.shtml

(though I haven't had a need to memorize every nook and cranny of IOS
in pursuit of some certification -- "I just make the stuff work...")
Your talking about "disabling Nagle" in your initial post made it sound
like a TCP stream, but if all you meant was setting the timeout to 0 on
a typical ping to do a flood, then TCP and TCP-specific mechanisms like
the Nagle algorithm aren't in play.

:> My -suspicion- here is that it's the interrupts that the "stream of
:> small TCP packets" generates that is leading to the system hang, but
:> it'd take some kernel profiling to understand the specific impact.
:> If the only way to generate the particular concentration of network
:> interrupts along that ethernet interface involves outright breaking
:> the ethernet spec, I can see where Sun rejects this as bogus from a
:> -security- perspective.
:>
:See, that's where I have trouble. From a Security perspective, you'd
:want to avoid the DOS via some kind of drop or disable mechanism
:in the first place... IMHO.

Well, I was talking about out-of-spec stuff when I wrote the above,
but a similar thing would apply to network traffic that totally fills
the pipe.  I can drop/block/disable all the traffic in the world, but
if more and more comes, my network is dead.  Depending on the type of
traffic, there may be absolutely nothing you can do to prevent the 
traffic from filling your network pipe and DoSing the interface.  Of
course, it shouldn't cause your console to silently hang up.

In this particular case, if you're talking about ICMP, and there 
really isn't a "jabber"/physical layer issue afoot, the idea is for
some combination of you and|or Sun to:

a) Find out if it's really a function of interrupt load.
 
   Someone with expertise with lockstat or other Sun kernel
   profiling tools should be able to discern that.  Look at
   the lockstat output for your system during a flood ping 
   vs. lockstat output from the system sitting there under 
   a "normal" network load and see if something stands out.

   It could be the case that the problem isn't necessarily
   interrupts, but the system being stuck in a particular 
   lock or codepath related to the traffic.  lockstat being
   driven by someone with clue should shake that out.
   
   Depending on the level of interrupt starvation, it may 
   be the case that lockstat won't log anything useful during
   the flood ping.  But, there's a whole lot of options and
   I'm not any particular expert in Solaris kernel internals.

b) If so, mitigate the interrupt load.  Assuming you isolate
   as an interrupt problem, the first step is to see if you
   can make the Sun do less in the face of the interrupts.  
   There's basically two big things that generate interrupts:

   1) the interrupts generated from the inbound ping traffic
      coming to your Sun from the Cisco.
   2) the interrupts generated by your Sun's attempts to    
      do outbound 'replies' to the Cisco.

   There may be some ndd tunables or kernel packet filtering
   that can be enabled or tweaked to limit the interrupt load 
   that the inbound and|or outbound ICMP traffic generates and 
   that may be sufficient to get the console to stay working.
   
   Alternatively, you will want to pursue "how do I mitigate
   interrupt load" with Sun.  Perhaps they can monitor counters
   over time and not go into a console-stunning tizzy in the
   face of network interrupt floods.  I dunno.  Selectively 
   throttling interrupts isn't the easiest dance in the world
   to do if you actually care about performance, and the last
   thing you want to do is introduce a DoS vector that might
   not have existed before.  You may want to present this as
   more a "network" problem than "security" problem.  

Hope this helps...  good luck!

-- 
 Mail: [EMAIL PROTECTED]  WWW: http://dojo.mi.org/~mjo/  Phone: +1 650 933 9487
 =--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--==--=
"Reality has a well-known liberal bias."                     -Stephen Colbert

Reply via email to