Re: [casper] Multi-instance hashpipe "pktsock" on single interface

2020-12-03 Thread Wael Farah
(Email from Ross that didn't make it through because he's not subscribed to
the casper list yet)

Hello all, Ross here,

I have been working on the development Wael mentioned, and was a witness to
the issues, so I'd like to expand on the original message. As per the
points of the two responses:

1. The two instances were masked to unique cores on the same CPU socket.
2. The two instances of hashpipe reported asymmetrically lower rates than
each instance was supposed to be receiving. (1GB/s directed to each with
reports of ~0.24 and ~0.64 GB/s, pretty stable)

The two instances were bound to the same interface, listening for packets
addressed to different ports. The reported packet rate/GBps is computed on
the accepted packets. `nload` reported the correct accumulated GB/s for the
interface.
When the packets directed to one instance were stopped, the GB/s of the
other instance increased to 0.94GB/s, still lower than the 1GB/s of packets
directed at it. It was then noted that `ksoftirqd` was throttling a core at
100%. When a single hashpipe instance is run we have no issue receiving
2GB/s (and some more too), so it was expected that 2 parallel instances (on
separate cores) would be able to process 1GB/s each.

The suspicion that the observations raised was that the binding of two
distinct processes to the same socket file-descriptor caused some
competition or overhead that actually makes it less feasible.

Kind regards,
Ross

On Thu, 3 Dec 2020 at 01:45, David MacMahon  wrote:

> Hi, Wael,
>
> I think I know what's going on here.  You don't say how the reported data
> rate differed from expected, but I suspect the reported data rate was
> higher than expected.  Packet sockets are a low level packet delivery
> mechanism supported by the kernel.  It allows the kernel to copy packets
> directly into memory that is mapped into the memory space of a user process
> (e.g. hashpipe).  The kernel does no filtering (by default) on the incoming
> packets before delivering them to the user process(es) that have requested
> them.  The selection by port happens (by default) at the application
> layer.  This means that two hashpipe instances using packet sockets to
> listen to the same network interface will each receive copies of all
> packets, regardless of the destination UDP port, even if they only want a
> specific UDP destination port.  This is very similar to how two tcpdump
> instances will get copies of all packets.
>
> Alessio Magro has done some work to use the "Berkeley Packet Filter" (
> https://www.kernel.org/doc/html/latest/networking/filter.html) to perform
> low-level packet filtering in the kernel with packet sockets in hashpipe.
> I think that approach could allow you to achieve the packet filtering that
> you want, but it's somewhat non-trivial to implement.
>
> As for 100% CPU utilization, that could be due to using "busywait"
> versions of the status buffer locking and/or data buffer access functions
> or it could just be due to the net threads being very busy processing
> packets.
>
> HTH,
> Dave
>
>
> On Dec 2, 2020, at 19:06, Wael Farah  wrote:
>
> Hi Folks,
>
> Hope everyone's doing well.
>
> I have an application I am trying to develop using hashpipe, and one of
> the solutions might be using multiple instances of hashpipe on a single 40
> GbE interface.
>
> When I tried running 2 instances of hashpipe I faced a problem. The data
> rate reported by the instances does not match that expected from the TXs.
> No issues were seen if I reconfigure the TXs to send data to a single port,
> rather than 2, and initialising a single hashpipe thread. Can the 2
> netthreads compete for resources on the NIC even if they are bound to
> different ports? I've also noticed that the CPU usage for the 2
> netthreads is always 100%.
> I am using "hashpipe_pktsock_recv_udp_frame" for the acquisition.
>
> Has anyone seen this/similar issue before?
>
> Thanks!
> Wael
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CALO2pVe814yov06vb%3DeqSgXJdkN%2BDc3gEcF63Xwb7Kk_YGMy2Q%40mail.gmail.com
> 
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/1BA4B4C2-05F7-4BC2-AB49-C7181748B26A%40berkeley.edu
> 

Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-03 Thread Mark Ruzindana
Thanks for the suggestion David!

I was starting hashpipe in the debugger. I'll use gdb and the core file,
and let you know what I find. If I still can't figure out the problem, I
will send you a minimum non-working example. I definitely think it's some
sort of pointer arithmetic error as well, I just can't see it yet. I really
appreciate the help.

Thanks again,

Mark

On Thu, Dec 3, 2020 at 1:30 AM David MacMahon  wrote:

> Hi, Mark,
>
> Sorry to hear you're still getting a segfault.  It sounds like you made
> some progress with gdb, but the fact that you ended up with a different
> sort of error suggests that you were starting hashpipe in the debugger.  To
> debug your initial segfault problem, you can run hashpipe without the
> debugger, let it segfault and generate a core file, then use gdb and the
> core file (and hashpipe) to examine the state of the program when the
> segfault occurred.  The tricky part is getting the core file to be
> generated on a segfault.  You typically have to increase the core file size
> limit using "ulimit -c unlimited" and (because hashpipe is typically
> installed with the suid bit set) you have to let the kernel know it's OK to
> dump core files for suid programs using "sudo sysctl -w fs.suid_dumpable=1"
> (or maybe 2 if 1 doesn't quite do it).  You can read more about these steps
> with "help ulimit" (ulimit is a bash builtin) and "man 5 proc".
>
> Once you have the core file (typically named "core" but it may have a
> numeric extension from the PID of the crashing process) you can debug
> things with "gbd /path/to/hashpipe /path/to/core/file".  Note that the core
> file may be created with permissions that only let root read it, so you
> might have to "sudo chown a+r core" or similar to get read access to it.
> This starts the debugger in a a sort of forensic mode using the core file
> as a snapshot of the process and its memory space at the time of the
> segfault.  You can use "info threads" to see which threads existed, "thread
> N" to switch between threads (N is a thread number as shown by "info
> threads"), "bt" to see the function call backtrace fo the current thread,
> and "frame N" to switch to a specific frame in the function call
> backtrace.  Once you zero in on which part of your code was executing when
> the segfault occurred you can examine variables to see what exactly caused
> the segfault to occur.  You might find that the "interesting" or "relevant"
> variables have been optimized away, so you may want/need to recompile with
> a lower optimization level (e.g. -O1 or maybe even -O0?) to prevent that
> from happening.
>
> Because this happens when you reach the end of your data buffer, I have to
> think it's a pointer arithmetic error of some sort.  If you can't figure
> out the problem from the core file, please create a "minimum working
> example" (well, in this case I guess a minimum non-working example),
> including a dummy packet generator script that creates suitable packets,
> and I'll see if I can recreate the problem.
>
> HTH,
> Dave
>
> On Nov 30, 2020, at 14:45, Mark Ruzindana  wrote:
>
> 'm currently using gdb to debug and it either tells me that I have a
> segmentation fault at the memcpy() in process_packet() or something very
> strange happens where the starting mcnt of a block greatly exceeds the mcnt
> corresponding to the packet being processed and there's no segmentation
> fault because the mcnt distance becomes negative so the memcpy() is
> skipped. Hopefully that wasn't too hard to track. Very strange problem that
> only occurs with gdb and not when I run hashpipe without it. Without gdb, I
> get the same segmentation fault at the end of the circular buffer as
> mentioned above.
>
>
> --
> You received this message because you are subscribed to the Google Groups "
> casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to casper+unsubscr...@lists.berkeley.edu.
> To view this discussion on the web visit
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CA%2B41hpyphTtDGtJ%3DaremL1gB1atqGOPkDfKFJxR216TJZD5ivg%40mail.gmail.com.


Re: [casper] Multi-instance hashpipe "pktsock" on single interface

2020-12-03 Thread David MacMahon
Hi, Wael,

I think I know what's going on here.  You don't say how the reported data rate 
differed from expected, but I suspect the reported data rate was higher than 
expected.  Packet sockets are a low level packet delivery mechanism supported 
by the kernel.  It allows the kernel to copy packets directly into memory that 
is mapped into the memory space of a user process (e.g. hashpipe).  The kernel 
does no filtering (by default) on the incoming packets before delivering them 
to the user process(es) that have requested them.  The selection by port 
happens (by default) at the application layer.  This means that two hashpipe 
instances using packet sockets to listen to the same network interface will 
each receive copies of all packets, regardless of the destination UDP port, 
even if they only want a specific UDP destination port.  This is very similar 
to how two tcpdump instances will get copies of all packets.

Alessio Magro has done some work to use the "Berkeley Packet Filter" 
(https://www.kernel.org/doc/html/latest/networking/filter.html) to perform 
low-level packet filtering in the kernel with packet sockets in hashpipe.  I 
think that approach could allow you to achieve the packet filtering that you 
want, but it's somewhat non-trivial to implement.

As for 100% CPU utilization, that could be due to using "busywait" versions of 
the status buffer locking and/or data buffer access functions or it could just 
be due to the net threads being very busy processing packets.

HTH,
Dave


> On Dec 2, 2020, at 19:06, Wael Farah  wrote:
> 
> Hi Folks,
> 
> Hope everyone's doing well.
> 
> I have an application I am trying to develop using hashpipe, and one of the 
> solutions might be using multiple instances of hashpipe on a single 40 GbE 
> interface.
> 
> When I tried running 2 instances of hashpipe I faced a problem. The data rate 
> reported by the instances does not match that expected from the TXs. No 
> issues were seen if I reconfigure the TXs to send data to a single port, 
> rather than 2, and initialising a single hashpipe thread. Can the 2 
> netthreads compete for resources on the NIC even if they are bound to 
> different ports? I've also noticed that the CPU usage for the 2 netthreads is 
> always 100%.
> I am using "hashpipe_pktsock_recv_udp_frame" for the acquisition.
> 
> Has anyone seen this/similar issue before?
> 
> Thanks!
> Wael
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "casper@lists.berkeley.edu" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to casper+unsubscr...@lists.berkeley.edu 
> .
> To view this discussion on the web visit 
> https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/CALO2pVe814yov06vb%3DeqSgXJdkN%2BDc3gEcF63Xwb7Kk_YGMy2Q%40mail.gmail.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/1BA4B4C2-05F7-4BC2-AB49-C7181748B26A%40berkeley.edu.


Re: [casper] Dropped packets during HASHPIPE data acquisition

2020-12-03 Thread David MacMahon
Hi, Mark,

Sorry to hear you're still getting a segfault.  It sounds like you made some 
progress with gdb, but the fact that you ended up with a different sort of 
error suggests that you were starting hashpipe in the debugger.  To debug your 
initial segfault problem, you can run hashpipe without the debugger, let it 
segfault and generate a core file, then use gdb and the core file (and 
hashpipe) to examine the state of the program when the segfault occurred.  The 
tricky part is getting the core file to be generated on a segfault.  You 
typically have to increase the core file size limit using "ulimit -c unlimited" 
and (because hashpipe is typically installed with the suid bit set) you have to 
let the kernel know it's OK to dump core files for suid programs using "sudo 
sysctl -w fs.suid_dumpable=1" (or maybe 2 if 1 doesn't quite do it).  You can 
read more about these steps with "help ulimit" (ulimit is a bash builtin) and 
"man 5 proc".

Once you have the core file (typically named "core" but it may have a numeric 
extension from the PID of the crashing process) you can debug things with "gbd 
/path/to/hashpipe /path/to/core/file".  Note that the core file may be created 
with permissions that only let root read it, so you might have to "sudo chown 
a+r core" or similar to get read access to it.  This starts the debugger in a a 
sort of forensic mode using the core file as a snapshot of the process and its 
memory space at the time of the segfault.  You can use "info threads" to see 
which threads existed, "thread N" to switch between threads (N is a thread 
number as shown by "info threads"), "bt" to see the function call backtrace fo 
the current thread, and "frame N" to switch to a specific frame in the function 
call backtrace.  Once you zero in on which part of your code was executing when 
the segfault occurred you can examine variables to see what exactly caused the 
segfault to occur.  You might find that the "interesting" or "relevant" 
variables have been optimized away, so you may want/need to recompile with a 
lower optimization level (e.g. -O1 or maybe even -O0?) to prevent that from 
happening.

Because this happens when you reach the end of your data buffer, I have to 
think it's a pointer arithmetic error of some sort.  If you can't figure out 
the problem from the core file, please create a "minimum working example" 
(well, in this case I guess a minimum non-working example), including a dummy 
packet generator script that creates suitable packets, and I'll see if I can 
recreate the problem.

HTH,
Dave

> On Nov 30, 2020, at 14:45, Mark Ruzindana  wrote:
> 
> 'm currently using gdb to debug and it either tells me that I have a 
> segmentation fault at the memcpy() in process_packet() or something very 
> strange happens where the starting mcnt of a block greatly exceeds the mcnt 
> corresponding to the packet being processed and there's no segmentation fault 
> because the mcnt distance becomes negative so the memcpy() is skipped. 
> Hopefully that wasn't too hard to track. Very strange problem that only 
> occurs with gdb and not when I run hashpipe without it. Without gdb, I get 
> the same segmentation fault at the end of the circular buffer as mentioned 
> above.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"casper@lists.berkeley.edu" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to casper+unsubscr...@lists.berkeley.edu.
To view this discussion on the web visit 
https://groups.google.com/a/lists.berkeley.edu/d/msgid/casper/AC9534AD-390F-44D8-ABFE-8AE76F059957%40berkeley.edu.