Any chance this is not actually my fault?

Once I bring down the interface, any packets which were going out
myri10ge0 should start going out the default route, right?
On this box, that's bge. What happens if you send bge a frame
which is much larger than its MTU?  Will it blindly copy the
packet regardless of size and smash the heap?

Look at this stack from my latest panic.  Notice how
the tcp_lsosend_data() is sending an LSO packet to bge:


ffffff0007c7b5c0 bcopy+0xa()
ffffff0007c7b600 bge_m_tx+0x60(ffffff01d09eb000, ffffff01f0cf9080)
ffffff0007c7b620 dls_tx+0x1d(ffffff01d3273eb0, ffffff01f0cf9080)
ffffff0007c7b650 dld_tx_single+0x2a(ffffff01d327ad48, ffffff01f0cf9080)
ffffff0007c7b680 str_mdata_fastpath_put+0x7f(ffffff01d327ad48, 
ffffff01f0cf9080)
ffffff0007c7b770 tcp_lsosend_data+0x581(ffffff01dbbabe40, 
ffffff01f0cf9080, ffffff01e78cc960,
ffffff01d133a7e8, 22f4, 7)
ffffff0007c7b8a0 tcp_send+0xaa2(ffffff01ddddfe08, ffffff01dbbabe40, 
22f4, 34, 20, 0, ffffff0007c7b95c,
ffffff0007c7b960, ffffff0007c7b964, ffffff0007c7b918, 278b9, 7fffffff)
ffffff0007c7b980 tcp_wput_data+0x774(ffffff01dbbabe40, 0, 0)
ffffff0007c7bae0 tcp_rput_data+0x2cfb(ffffff01dbbabc40, 
ffffff01eb2dcca0, ffffff01cedc2d00)
ffffff0007c7bb70 squeue_drain+0x1e0(ffffff01cedc2d00, 10, 18546fc9471)
ffffff0007c7bbf0 squeue_enter+0x437(ffffff01cedc2d00, ffffff01ec7ad4a0, 
fffffffff7ab0030,
ffffff01d34dc780, 7)
ffffff0007c7bc40 tcp_wput+0xc4(ffffff01dddce510, ffffff01ec7ad4a0)
ffffff0007c7bcc0 sostream_direct+0x113(ffffff01dbbdb880, 
ffffff0007c7be10, 0, ffffff01d5d51028)
ffffff0007c7bd50 sotpi_sendmsg+0x3a7(ffffff01dbbdb880, ffffff0007c7be40, 
ffffff0007c7be10)
ffffff0007c7bdf0 sendit+0x160(5, ffffff0007c7be40, ffffff0007c7be10, 8)
ffffff0007c7be90 send+0x7d(5, 80b24a8, 3c, 0)
ffffff0007c7bec0 send32+0x22(5, 80b24a8, 3c, 0)
ffffff0007c7bf10 sys_syscall32+0x101()

It looks like this interface is not LSO capable:

 > ffffff01d133a7e8::print ill_t ill_capabilities
ill_capabilities = 0x38


And if I look at the mblk which is being sent, it is from one of my
netperfs:

 > ffffff01f0cf9080::walk b_cont |::mblk
             ADDR FL TYPE    LEN   BLEN              RPTR             DBLK
ffffff01f0cf9080 0  data    82    208   ffffff01e7c575d2 ffffff01e7c57500
ffffff01f0d05620 0  data    60    208   ffffff01e7f78568 ffffff01e7f78480
ffffff01e9370620 0  data    60    208   ffffff01e80115e8 ffffff01e8011500
ffffff01ea78f1a0 0  data    60    208   ffffff01e84d0a68 ffffff01e84d0980
ffffff01eb234600 0  data    60    208   ffffff01e747cd28 ffffff01e747cc40
ffffff01f28d5300 0  data    60    208   ffffff01f2949ba8 ffffff01f2949ac0
ffffff01ebba0320 0  data    60    208   ffffff01f1b41ea8 ffffff01f1b41dc0
ffffff01f08439a0 0  data    60    208   ffffff01f2f98ba8 ffffff01f2f98ac0

<goes on for pages and pages and pages, how do I sum up the length??>

It looks like this dblk has LSO enabled with an MSS of 8948:

 > ffffff01e7c57500::print dblk_t db_struioun
{
     db_struioun.enforce_alignment = +2.6242139e-140
     db_struioun.data = [ 0xfe, 0xca, 0xdd, 0xba, 0x15, 0, 0xf4, 0x22 ]
     db_struioun.cksum = {
         cksum_val = {
             u32 = 0xbaddcafe
             u16 = 0xcafe
         }
         flags = 0x15
         pad = 0x22f4
     }
}



I've tried and failed to duplicate this bug on boxes using e1000g
and bnx interfaces for their default route, and I haven't been able
to trigger the problem there.

Drew
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to