Re: [casper] Dropped packets during HASHPIPE data acquisition

David MacMahon Mon, 25 May 2020 17:07:50 -0700

A few more suggestions:

1) Enable core dumps.  Usually you have to run "ulimit -c unlimited" and for 
suid executables there's an extra step related to /proc/sys/fs/suid_dumpable.  
See "man 5 core" and "man 5 proc" for details.  Once you have a core file, you 
can use gdb to examine the state of things when the segfault happened.  You 
might want to recompile your plug-in with debugging enabled and fewer 
optimizations to get the most out of this approach: "gdb /path/to/hashpipe 
/path/to/core".  (Gotta love how it's still called "core"!).  gdb can be a bit 
cryptic, but it's also very powerful.


2) Another idea, just for diagnostic purposes, is to omit the "+ 
input_databuf_idx(...)" part of the dest_p assignment.  That will write all 
payloads to the first part of the data block, so not buffer overflow for sure 
(assuming idx is in range :)).  It's just a way to eliminate a variable.

3) Make sure the packet socket blocks are large enough for the packet frames.  
I agree it looks like you're not reading past the end of the packet payload 
size, but maybe the payload itself goes beyond the end of the packet socket 
blocks?  The kernel might silently truncate the packets in that case.

4) If you're using tagged VLANs the PKT_UDP_xxx macros won't work right.  It 
sounds like that's not happening because you're seeing the expected size, but 
it's worth mentioning for mail archive completeness.

5) You can use hashpipe_dump_databuf to examine the 159 payloads you were able 
copy before the segfault to see whether every byte is properly positioned and 
has believable values.  You could change memcpy(..) to memset(p_dest, 'X', 
PKT_UDP_SIZE(frame)-16) so you'll know the exact value that every byte should 
have. Instead of 'X' you could use pkt_num+1 (i.e. a 1-based packet counter) so 
you'll know which bytes correspond to which packets.  Using memset() would also 
eliminate reading from the packet socket blocks (another variable gone).

Happy hunting,
Dave

> On May 25, 2020, at 16:33, Mark Ruzindana <ruziem...@gmail.com> wrote:
> 
> Thanks for the suggestions. I neglected to mention that I'm printing out the 
> PKT_UDP_SIZE() and PKT_UDP_DST() right before the memcpy(), I take into 
> account the 8 byte UDP header and the size and port are correct. When 
> performing the memcpy(), I am taking into account that PKT_UDP_DATA() returns 
> a pointer of the payload and excludes the UDP header. However, I also have an 
> 8 byte packet header within that payload (this gives me the mcnt, f-engine, 
> and x-engine indices) and I exclude it when performing the memcpy(). This is 
> what it looks like:
> 
> uint8_t * dest_p = db->block[idx].data + input_databuf_idx(m, f, 0,0,0); // 
> This macro index shifts every mcnt and f-engine index
> const uint8_t * payload = (uint8_t *)(PKT_UDP_DATA(frame)+8); // Ignore 
> packet header
> 
> fprintf(...); // prints PKT_UDP_SIZE() and PKT_UDP_DST()
> memcpy(dest_p, payload, PKT_UDP_SIZE(frame) - 16)  // Ignore both UDP (8 
> bytes) and packet header (8 bytes)
> 
> I will look into the other possible issues that you suggested, but as far as 
> I can tell, it doesn't seem like there should be a segfault given what I'm 
> doing before that memcpy(). I will let you know what else I find.
> 
> Thanks again, I really appreciate the help.
> 
> Mark
> 
> On Mon, May 25, 2020 at 4:30 PM David MacMahon <dav...@berkeley.edu 
> <mailto:dav...@berkeley.edu>> wrote:
> Hi, Mark,
> 
> Sounds like progress!
> 
>> On May 25, 2020, at 13:56, Mark Ruzindana <ruziem...@gmail.com 
>> <mailto:ruziem...@gmail.com>> wrote:
>> 
>> I have been able to capture data with the first round of frames of the 
>> circular buffer i.e. if I have 160 frames, I am able to capture packets of 
>> frames 0 to 159 at which point right at the memcpy() in the process_packet() 
>> function of the net thread, I get a segmentation fault.
> 
> The fact that you get a the segfault right at the memcpy of the final frame 
> of the ring buffer suggests that there is problem with the parameters passed 
> to memcpy.  Most likely src+length-1 exceeds the end of the frame so you get 
> a segfault when memcpy tries to read from beyond the allocated memory.  This 
> would explain why it segfaults on the final frame and not the previous frames 
> because reading beyond a previous frame still reads from "legal" (though 
> incorrect) memory locations.  It's also possible that the segfault happens 
> due to a bad address on the destination side of the memcpy(), but unless the 
> destination buffer is also 160 frames in size that seems less likely.
> 
> The release_frame function is not likely to be a culprit here unless the 
> pointer you are passing it differs from the pointer that the pktsock_recv 
> function returned.
> 
> For debugging, I suggest logging dst, src, len before calling memcpy.  
> Normally you wouldn't generate a log message for every packet because that 
> would ruin your throughput, but since you know it's going to crash after the 
> first 160 packets there's not much throughout to ruin. :)
> 
> One thing to remember is that PKT_UDP_DATA() evaluates to a pointer to the 
> UDP payload of the packet, but PKT_UDP_SIZE() evaluates to the total UDP size 
> (i.e. 8 bytes for the UDP header plus the length of the UDP payload).  
> Passing PKT_UDP_SIZE() as "len" to memcpy without subtracting 8 for the 
> header bytes is not correct and could potentially cause this problem.
> 
> HTH,
> Dave
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu <mailto:casper@lists.berkeley.edu>" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> <mailto:casper+unsubscr...@lists.berkeley.edu>.
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu
>  
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/297C1709-AE9C-488D-9110-FD0832BF5951%40berkeley.edu?utm_medium=email&utm_source=footer>.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> <mailto:casper+unsubscr...@lists.berkeley.edu>.
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxVHhDiD6RT6qK86ub3Tq3aQaTFxrGitKFMaNnRh3rKRw%40mail.gmail.com
>  
> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpxVHhDiD6RT6qK86ub3Tq3aQaTFxrGitKFMaNnRh3rKRw%40mail.gmail.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/723417E3-C630-4988-84B8-F4F3171DB47E%40berkeley.edu.

Re: [casper] Dropped packets during HASHPIPE data acquisition

Reply via email to