Re: [casper] Dropped packets during HASHPIPE data acquisition

Mark Ruzindana Thu, 01 Oct 2020 23:23:46 -0700

Hi David,

Sorry it's been a while, I've been working on other tasks besides the
packet socket implementation and I've gotten the opportunity to come back
to it. I know you have access to the previous emails, but just to catch you
up with a summary of what the issue was in implementing packet sockets:


I was able to install hashpipe with the suid bit set as you suggested
previously. So far, I have been able to capture data with the first round
of frames of the circular buffer i.e. if I have 160 frames, I am able to
capture packets of frames 0 to 159 at which point right at the memcpy() in
the process_packet() function of the net thread, I get a segmentation fault.

And the suggestions that you provided were very helpful with diagnosis, but
the problem hasn't been resolved yet.

I'm currently using gdb to debug and it either tells me that I have a
segmentation fault at the memcpy() in process_packet() or something very
strange happens where the starting mcnt of a block greatly exceeds the mcnt
corresponding to the packet being processed and there's no segmentation
fault because the mcnt distance becomes negative so the memcpy() is
skipped. Hopefully that wasn't too hard to track. Very strange problem that
only occurs with gdb and not when I run hashpipe without it. Without gdb, I
get the same segmentation fault at the end of the circular buffer as
mentioned above.

I also omitted the "+ input_databuf_idx(...)" to test for buffer overflow,
and the same result (segmentation fault).

I checked to make sure that the blocks are large enough for the number of
frames. Right now, I have 480 total frames and 60 blocks so 8 frames per
block. And my frame size (8192) is a multiple of the kernel page size
(4096). I've also tried frame sizes 4096, and 16384 with the same results.

I tried using 'hashpipe_dump_databuf -b "block number"' and I see binary
symbols in stdout regardless of what values I put in memset(). So that part
wasn't as helpful with diagnosis as I'd hoped.

I should also mention that there is data being received on the same
interface from other ports, but the code ignores data from them as far as I
can tell, and only captures/processes data from the user suggested port.
But maybe somehow it's causing these issues and I'm not able to see how.

As a test, I also tried removing the release_frame() function after
process_packet() is called and I got the same segmentation fault. So I
still think there's something about the implementation of the
release_frame() function that I'm not doing or it's not releasing the
frame. I'm not sure.

I appreciate any feedback. I'll respond ASAP if you have any questions.

Thanks,

Mark Ruzindana




On Mon, May 25, 2020 at 6:14 PM Mark Ruzindana <ruziem...@gmail.com> wrote:

> Thanks for the additional suggestions. I will try those and let you know
> what happens.
>
> Mark
>
> On Mon, May 25, 2020 at 6:07 PM David MacMahon <dav...@berkeley.edu>
> wrote:
>
>> A few more suggestions:
>>
>> 1) Enable core dumps.  Usually you have to run "ulimit -c unlimited" and
>> for suid executables there's an extra step related to
>> /proc/sys/fs/suid_dumpable.  See "man 5 core" and "man 5 proc" for
>> details.  Once you have a core file, you can use gdb to examine the state
>> of things when the segfault happened.  You might want to recompile your
>> plug-in with debugging enabled and fewer optimizations to get the most out
>> of this approach: "gdb /path/to/hashpipe /path/to/core".  (Gotta love how
>> it's still called "core"!).  gdb can be a bit cryptic, but it's also very
>> powerful.
>>
>> 2) Another idea, just for diagnostic purposes, is to omit the "+
>> input_databuf_idx(...)" part of the dest_p assignment.  That will write all
>> payloads to the first part of the data block, so not buffer overflow for
>> sure (assuming idx is in range :)).  It's just a way to eliminate a
>> variable.
>>
>> 3) Make sure the packet socket blocks are large enough for the packet
>> frames.  I agree it looks like you're not reading past the end of the
>> packet payload size, but maybe the payload itself goes beyond the end of
>> the packet socket blocks?  The kernel might silently truncate the packets
>> in that case.
>>
>> 4) If you're using tagged VLANs the PKT_UDP_xxx macros won't work right.
>> It sounds like that's not happening because you're seeing the expected
>> size, but it's worth mentioning for mail archive completeness.
>>
>> 5) You can use hashpipe_dump_databuf to examine the 159 payloads you were
>> able copy before the segfault to see whether every byte is properly
>> positioned and has believable values.  You could change memcpy(..) to
>> memset(p_dest, 'X', PKT_UDP_SIZE(frame)-16) so you'll know the exact value
>> that every byte should have. Instead of 'X' you could use pkt_num+1 (i.e. a
>> 1-based packet counter) so you'll know which bytes correspond to which
>> packets.  Using memset() would also eliminate reading from the packet
>> socket blocks (another variable gone).
>>
>> Happy hunting,
>> Dave
>>
>> On May 25, 2020, at 16:33, Mark Ruzindana <ruziem...@gmail.com> wrote:
>>
>> Thanks for the suggestions. I neglected to mention that I'm printing out
>> the PKT_UDP_SIZE() and PKT_UDP_DST() right before the memcpy(), I take into
>> account the 8 byte UDP header and the size and port are correct. When
>> performing the memcpy(), I am taking into account that PKT_UDP_DATA()
>> returns a pointer of the payload and excludes the UDP header. However, I
>> also have an 8 byte packet header within that payload (this gives me the
>> mcnt, f-engine, and x-engine indices) and I exclude it when performing the
>> memcpy(). This is what it looks like:
>>
>> uint8_t * dest_p = db->block[idx].data + input_databuf_idx(m, f, 0,0,0);
>> // This macro index shifts every mcnt and f-engine index
>> const uint8_t * payload = (uint8_t *)(PKT_UDP_DATA(frame)+8); // Ignore
>> packet header
>>
>> fprintf(...); // prints PKT_UDP_SIZE() and PKT_UDP_DST()
>> memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8
>> bytes) and packet header (8 bytes)
>>
>> I will look into the other possible issues that you suggested, but as far
>> as I can tell, it doesn't seem like there should be a segfault given what
>> I'm doing before that memcpy(). I will let you know what else I find.
>>
>> Thanks again, I really appreciate the help.
>>
>> Mark
>>
>> On Mon, May 25, 2020 at 4:30 PM David MacMahon <dav...@berkeley.edu>
>> wrote:
>>
>>> Hi, Mark,
>>>
>>> Sounds like progress!
>>>
>>> On May 25, 2020, at 13:56, Mark Ruzindana <ruziem...@gmail.com> wrote:
>>>
>>> I have been able to capture data with the first round of frames of the
>>> circular buffer i.e. if I have 160 frames, I am able to capture packets of
>>> frames 0 to 159 at which point right at the memcpy() in the
>>> process_packet() function of the net thread, I get a segmentation fault.
>>>
>>>
>>> The fact that you get a the segfault right at the memcpy of the final
>>> frame of the ring buffer suggests that there is problem with the parameters
>>> passed to memcpy.  Most likely src+length-1 exceeds the end of the frame so
>>> you get a segfault when memcpy tries to read from beyond the allocated
>>> memory.  This would explain why it segfaults on the final frame and not the
>>> previous frames because reading beyond a previous frame still reads from
>>> "legal" (though incorrect) memory locations.  It's also possible that the
>>> segfault happens due to a bad address on the destination side of the
>>> memcpy(), but unless the destination buffer is also 160 frames in size that
>>> seems less likely.
>>>
>>> The release_frame function is not likely to be a culprit here unless the
>>> pointer you are passing it differs from the pointer that the pktsock_recv
>>> function returned.
>>>
>>> For debugging, I suggest logging dst, src, len before calling memcpy.
>>> Normally you wouldn't generate a log message for every packet because that
>>> would ruin your throughput, but since you know it's going to crash after
>>> the first 160 packets there's not much throughout to ruin. :)
>>>
>>> One thing to remember is that PKT_UDP_DATA() evaluates to a pointer to
>>> the UDP payload of the packet, but PKT_UDP_SIZE() evaluates to the total
>>> UDP size (i.e. 8 bytes for the UDP header plus the length of the UDP
>>> payload).  Passing PKT_UDP_SIZE() as "len" to memcpy without subtracting 8
>>> for the header bytes is not correct and could potentially cause this
>>> problem.
>>>
>>> HTH,
>>> Dave
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "casper@lists.berkeley.edu" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to casper+unsubscr...@lists.berkeley.edu.
>>> To view this discussion on the web visit
>>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu
>>> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxVHhDiD6RT6qK86ub3Tq3aQaTFxrGitKFMaNnRh3rKRw%40mail.gmail.com
>> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxVHhDiD6RT6qK86ub3Tq3aQaTFxrGitKFMaNnRh3rKRw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "casper@lists.berkeley.edu" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to casper+unsubscr...@lists.berkeley.edu.
>> To view this discussion on the web visit
>> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/723417E3-C630-4988-84B8-F4F3171DB47E%40berkeley.edu
>> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/723417E3-C630-4988-84B8-F4F3171DB47E%40berkeley.edu?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxqUyG%3DQs7_ijpB1BPLM4e%2BaWUSJ5JciSTYwR_yq5W1AA%40mail.gmail.com.

Re: [casper] Dropped packets during HASHPIPE data acquisition

Reply via email to