Re: [Ntop-misc] pf_ring possible memory corruption (packet contents appear to change over time)

Oren Mon, 24 Mar 2014 07:28:35 -0700

Hi,
There are at least two (possibly related) issues here:
 1. Corruption when handle->ring->slots_info->remove_off  == 0
 2. Corruption when pf_ring loaded with quick_mode=1 enable_tx_capture=0
When we apply both workarounds everything seems to work as expected. (note that 
the original workaround assumes libpcap.)
Best,
Oren
 
From: [email protected]
To: [email protected]
Date: Mon, 24 Mar 2014 13:11:30 +0000
Subject: Re: [Ntop-misc] pf_ring possible memory corruption (packet contents 
appear to change over time)

For note, when I encountered corruption, quick_mode=1 was NOT enabled.
Seth
From:  Alfredo Cardigliano <[email protected]>
Reply-To:  "[email protected]" <[email protected]>
Date:  Monday, March 24, 2014 at 6:31 AM
To:  "[email protected]" <[email protected]>
Subject:  Re: [Ntop-misc] pf_ring possible memory corruption (packet contents   
appear to change over time)

Hi Orenso this only happens with quick_mode=1 enable_tx_capture=0?I will try 
with those settings and your patch.
Thank youAlfredo
On 24 Mar 2014, at 11:23, Oren <[email protected]> wrote:Hi,

I re-tested with 

PF_RING-5.6.2

ixgbe 3.19.1-PF-RING-AWARE + libpcap-1.1.1

kernel 2.6.32-220.4.2.el6.i686

Loading pf_ring with quick_mode=1 enable_tx_capture=0

seems to demonstrate similar issues.

The symptoms you may see are corrupt headers/data, negative or zero packet 
length/time, etc.

As a workaround: avoid using these flags together.

Attached is a user-space a-patch that catches the faults. 

Best,

Oren

P.S.

The ring pkt headers seems to hold the original cap/len values while the ring 
structure pkt headers offsets are uncorrelated/denser causing the faults

From: [email protected]

To: [email protected]

Date: Mon, 17 Mar 2014 21:29:00 +0000

Subject: Re: [Ntop-misc] pf_ring possible memory corruption (packet contents 
appear to change over time)

Hi Seth,

I believe we share the same sentiment and hope it will be identified and fixed 
soon.

The key point is to add the right memory assertion to catch the event, e.g., 
memcpy+memcmp.

You can troubleshoot either by generating app core file or by running live gdb 
on the app. Wait till assertion happens.

Best,

Oren

From: [email protected]

To: [email protected]

Date: Mon, 17 Mar 2014 17:55:02 +0000

Subject: Re: [Ntop-misc] pf_ring possible memory corruption (packet contents 
appear to change over time)

Oren,
Thanks for sharing your experience, it's good to know this problem isn't 
limited to me!
I appreciate the workaround; perhaps I can extend the approach to my situation 
(I'm not using libpcap) although I'd prefer not to lose any more packets than I 
have to!
A few things that are different, is that I've had this happen on at least two 
machines. While both NICs are Intel, they are not the same chipset:
driver: igbversion: 5.0.6firmware-version: 1.2.1bus-info:supports-statistics: 
yessupports-test: yessupports-eeprom-access: yessupports-register-dump: 
yessupports-priv-flags: no
driver: ixgbeversion: 3.18.7firmware-version: 
0x80000309bus-info:supports-statistics: yessupports-test: 
yessupports-eeprom-access: yessupports-register-dump: yessupports-priv-flags: no
For me this has happened on an even later version of ixgbe than when you 
reported.
Were you using gdb on the kernel or the app to get the buffer info? How did you 
stop it right as the problem occurred?I'd be interesting to confirm your 
condition, as I expect it is the same.
Regards,
Seth

From: Oren <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Sunday, March 16, 2014 at 10:49 AM
To: "[email protected]" <[email protected]>
Subject: Re: [Ntop-misc] pf_ring possible memory corruption (packet contents 
appear to change over time)

Hi Seth,

I reported analogue issue a while ago (using libpcap+libpfring), see 
http://www.gossamer-threads.com/lists/ntop/misc/32872

A solution would be welcome. 

Anyway, here is a workaroung for libpcap - this may be adapted; It is based on 
the fact the issue happens only when remove offset is 0.

userland/libpcap-1.1.1-ring/pcap-linux.c

static int

pcap_read_packet(pcap_t *handle, pcap_handler callback, u_char *userdata)

{

...

   myhdr.ts.tv_sec = pcap_header.ts.tv_sec, myhdr.ts.tv_usec = 
pcap_header.ts.tv_usec;

   myhdr.caplen = pcap_header.caplen, myhdr.len = pcap_header.len;

   myhdr.ns = pcap_header.extended_hdr.timestamp_ns;

   if ( handle->ring->slots_info->remove_off ) 

     //Oren: better minimal packet loss then memory corruption
   callback(userdata, (struct pcap_pkthdr*)&myhdr, bp);

...

Best,

Oren

From: [email protected]

To: [email protected]

Date: Fri, 14 Mar 2014 18:48:07 +0000

Subject: [Ntop-misc] pf_ring possible memory corruption (packet contents appear 
to change over time)

I¹ve been observing what I can only describe as some kind of possible
memory corruption involving the packet that¹s currently being processed.

The symptom: if you were to decode the packet at T[0] and check the
ethernet type or IP protocol, and then check it again at T[1], it would
appear to be different.

Here¹s the scenario I used to reproduce this:

I¹ve got a loop which receives packets from a pfring. It then does various
kinds of processing to the packet. Sometimes, by the time I am at the end
of the loop, it seems as though the content has changed.

To catch this, I wrote a program that basically does:
  Open pfring
    Forever:
       pfring_recv(Š)
       d1=digest(packet)
       ( do work; I just count a random amount )
       d2=digest(packet)
       assert(d1 == d2);

It hashes the value immediately after the packet is received. It is hashed
again after some ³arbitrary work² has been performed. An assertion is
raised if the hashes differ.

In my observation, the assert is triggered after a relatively short amount
of 
time (15-30 minutes) when exposed to relatively high volumes of traffic.

The traffic that the program is receiving is random, and may consist of
valid and invalid packets.

I¹ve tested this on kernel 2.6.32-431 (CentOS) with pfring-5.5.2 and have
observed the same behavior with pfring-5.6.2.

I¹ve also observed this behavior in separate program that performs similar
tasks.

What I¹ve got:
- A 400 MB debug log (from 5.5.2) that gets increasingly corrupt (possibly
due to a concurrency problem?) In this case, the kernel eventually crashed
with debug on.
- Captured pfring statistics showing the ring offsets, slots, and memory
(for 5.6.2)

I¹ve attached some abbreviated versions of this information, if you think
you¹d find the complete information useful, please let me know.

Any help you can offer with this issue would be appreciated.

Regards,

Seth

_______________________________________________ Ntop-misc mailing list 
[email protected] 
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________ Ntop-misc mailing list 
[email protected] 
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________ Ntop-misc mailing list 
[email protected] 
http://listgateway.unipi.it/mailman/listinfo/ntop-misc<PF_RING-5.6.2-pfring_mod.c.memassert.patch>_______________________________________________

Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: [Ntop-misc] pf_ring possible memory corruption (packet contents appear to change over time)

Reply via email to