Hi all, While running hashpipe with the intention of debugging using gdb as suggested, I failed to replicate my segfault issue. On one hand, it should have been working given what I understand about the packet socket implementation and the way that I wrote the code, but on the other, I don't know why it works now, and not before because I didn't make any changes between runs. It's a stretch, but there were a few reboots and improvements in cable organization within the rack, but that's about it.
I'm taking note of the following change for documentation purposes. It's not the reason for my issue. Feel free to ignore or comment on it. This change was made before and remained after I observed the segfault issue. To flush the packets in the port before the thread is run, I am using "p_frame= hashpipe_pktsock_recv_udp_frame_nonblock(p_ps, bindport)" instead of " p_frame=hashpipe_pktsock_recv_frame_nonblock(p_ps, bindport)" in the while loop, otherwise, there's an infinite loop because there are packets with other protocols constantly being captured by the port. I'm hoping I figure out what change was made as I am debugging the rest of this, but for now the specific segfault that I was having is no longer an issue. It's unsatisfying and I'll come back to it if I don't figure it out as I go, but for now, I'm moving on. Okay, so now, I'm still experiencing dropped packets. Given a kernel page size of 4096 bytes and a frame size of 16384 bytes, I have tried buffer parameters ranging from, 480 to 128000 total number of frames and 60 to 1000 blocks respectively. With improvements in throughput in one instance, but not the other three that I have running. The one instance with improvements, on the upper end of that range, exceeds the number of packets expected in a hashpipe shared memory buffer block (the ring buffers in between threads), but only for about four or so of them at the very beginning of a scan. No dropped packets for the rest of the scan. While the other instances, with no recognizable improvements, drop packets through out the scan with one of them dropping significantly more than the other two. I'm currently trying a few things to debug this, but I figured that I would ask sooner rather than later. Is there a configuration or step that I may have missed in the implementation of packet sockets? My understanding is that it should handle my current data rates with no problem. So with multiple instances running (four in my case), I should be able to capture data with 0 dropped packets (100% data throughput). Just a note, with a packet size of 8168 bytes, and a frame size of 8192 bytes, hashpipe was crashing, but in a completely unrelated way to how it did before. It was *not* a segfault after capturing the exact number of packets that correspond to the number of frames in the packet socket ring buffer as I described in previous emails. The crashes were more inconsistent and I think it's because the frame size needs to be considerably larger than the packet size. An order of 2 seemed to be enough. I currently have the frame size set to 16384 (also a multiple of the kernel page size), and do not have an issue with hashpipe crashing. Let me know if you have any thoughts and suggestions. I really appreciate the help. Thanks, Mark Ruzindana On Thu, Dec 3, 2020 at 11:16 AM Mark Ruzindana <ruziem...@gmail.com> wrote: > Thanks for the suggestion David! > > I was starting hashpipe in the debugger. I'll use gdb and the core file, > and let you know what I find. If I still can't figure out the problem, I > will send you a minimum non-working example. I definitely think it's some > sort of pointer arithmetic error as well, I just can't see it yet. I really > appreciate the help. > > Thanks again, > > Mark > > On Thu, Dec 3, 2020 at 1:30 AM David MacMahon <dav...@berkeley.edu> wrote: > >> Hi, Mark, >> >> Sorry to hear you're still getting a segfault. It sounds like you made >> some progress with gdb, but the fact that you ended up with a different >> sort of error suggests that you were starting hashpipe in the debugger. To >> debug your initial segfault problem, you can run hashpipe without the >> debugger, let it segfault and generate a core file, then use gdb and the >> core file (and hashpipe) to examine the state of the program when the >> segfault occurred. The tricky part is getting the core file to be >> generated on a segfault. You typically have to increase the core file size >> limit using "ulimit -c unlimited" and (because hashpipe is typically >> installed with the suid bit set) you have to let the kernel know it's OK to >> dump core files for suid programs using "sudo sysctl -w fs.suid_dumpable=1" >> (or maybe 2 if 1 doesn't quite do it). You can read more about these steps >> with "help ulimit" (ulimit is a bash builtin) and "man 5 proc". >> >> Once you have the core file (typically named "core" but it may have a >> numeric extension from the PID of the crashing process) you can debug >> things with "gbd /path/to/hashpipe /path/to/core/file". Note that the core >> file may be created with permissions that only let root read it, so you >> might have to "sudo chown a+r core" or similar to get read access to it. >> This starts the debugger in a a sort of forensic mode using the core file >> as a snapshot of the process and its memory space at the time of the >> segfault. You can use "info threads" to see which threads existed, "thread >> N" to switch between threads (N is a thread number as shown by "info >> threads"), "bt" to see the function call backtrace fo the current thread, >> and "frame N" to switch to a specific frame in the function call >> backtrace. Once you zero in on which part of your code was executing when >> the segfault occurred you can examine variables to see what exactly caused >> the segfault to occur. You might find that the "interesting" or "relevant" >> variables have been optimized away, so you may want/need to recompile with >> a lower optimization level (e.g. -O1 or maybe even -O0?) to prevent that >> from happening. >> >> Because this happens when you reach the end of your data buffer, I have >> to think it's a pointer arithmetic error of some sort. If you can't figure >> out the problem from the core file, please create a "minimum working >> example" (well, in this case I guess a minimum non-working example), >> including a dummy packet generator script that creates suitable packets, >> and I'll see if I can recreate the problem. >> >> HTH, >> Dave >> >> On Nov 30, 2020, at 14:45, Mark Ruzindana <ruziem...@gmail.com> wrote: >> >> 'm currently using gdb to debug and it either tells me that I have a >> segmentation fault at the memcpy() in process_packet() or something very >> strange happens where the starting mcnt of a block greatly exceeds the mcnt >> corresponding to the packet being processed and there's no segmentation >> fault because the mcnt distance becomes negative so the memcpy() is >> skipped. Hopefully that wasn't too hard to track. Very strange problem that >> only occurs with gdb and not when I run hashpipe without it. Without gdb, I >> get the same segmentation fault at the end of the circular buffer as >> mentioned above. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "casper@lists.berkeley.edu" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to casper+unsubscr...@lists.berkeley.edu. >> To view this discussion on the web visit >> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu >> <https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "casper@lists.berkeley.edu" group. To unsubscribe from this group and stop receiving emails from it, send an email to casper+unsubscr...@lists.berkeley.edu. To view this discussion on the web visit https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpyCYYhi3NaEw%3D4vP15sFiUxRAB41BfG_PBR_mE4fEpyZA%40mail.gmail.com.