Re: ixgbe and fast interrupts
The comments haven't kept up with the code. You are correct; in the legacy interrupt case ixgbe is using an ITHREAD, not a fast handler. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Thursday, November 17, 2011 6:38:21 am Matteo Landi wrote: > Hi everybody, > > trying to measure the interrupt latency of a 10G Intel network > adapter, I find out that the the used driver (ixgbe) can can be > configured to work with both fast and standard interrupts. From my > understanding of the BUS_SETUP_INTR(9) man page and > sys/kern/kern_intr.c file, it seems that drivers in need of > registering fast interrupts should call bus_setup_intr() specifying a > filter function instead of a handler. > > My question is: why ixgbe_allocate_legacy() says it is allocating a > fast interrupt (comments and error messages say so) but instead passes > a handler instead of a filter to pci_setup_intr()? Is there anything I > am not taking into account? It is not a fast handler and is not using the fast handler stuff. OTOH, you probably want to be using MSI-X for a 10G NIC instead of INTx anyway. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
> you probably want to be using MSI-X for a 10G NIC instead of INTx anyway. Why do you say that? Is MSI-X faster than INTx in terms of interrupt latency? When should I use MSI-X, instead of fast filters interrupts (fast interrupt?), instead of ithread interrupts? Thanks in advace. Regards, Matteo -- http://www.matteolandi.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Thu, Nov 17, 2011 at 3:56 PM, Ryan Stone wrote: > The comments haven't kept up with the code. You are correct; in the > legacy interrupt case ixgbe is using an ITHREAD, not a fast handler. Do I have to send an email to the maintainer of the ixgbe driver and ask him to update the comments, or could I send you a patch for that? Regards, Matteo -- http://www.matteolandi.net/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Friday, November 18, 2011 3:46:02 am Matteo Landi wrote: > > you probably want to be using MSI-X for a 10G NIC instead of INTx anyway. > > Why do you say that? Is MSI-X faster than INTx in terms of interrupt > latency? When should I use MSI-X, instead of fast filters interrupts > (fast interrupt?), instead of ithread interrupts? Thanks in advace. With MSI-X you can have more than one interrupt and those interrupts can be distributed across CPUs. This means you can (somewhat) tie each queue on your NIC to a different CPU. MSI-X vs INTx is orthogonal to fast vs filter, but in general MSI and MSI-X interrupts are not shared, and require no interrupt masking in hardware (they are effectively edge-triggered), so using a filter for MSI is rather pointless and only adds needless complexity. For MSI I would just use a theraded interrupt handler. For INTx, I would only use a fast interrupt handler if there is a really good reason to do so (e.g. em(4) does so to work around broken Intel Host-PCI bridges). -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Fri, Nov 18, 2011 at 08:00:06AM -0500, John Baldwin wrote: > On Friday, November 18, 2011 3:46:02 am Matteo Landi wrote: > > > you probably want to be using MSI-X for a 10G NIC instead of INTx anyway. > > > > Why do you say that? Is MSI-X faster than INTx in terms of interrupt > > latency? When should I use MSI-X, instead of fast filters interrupts > > (fast interrupt?), instead of ithread interrupts? Thanks in advace. > > With MSI-X you can have more than one interrupt and those interrupts can be > distributed across CPUs. This means you can (somewhat) tie each queue on > your > NIC to a different CPU. > > MSI-X vs INTx is orthogonal to fast vs filter, but in general MSI and MSI-X > interrupts are not shared, and require no interrupt masking in hardware (they > are effectively edge-triggered), so using a filter for MSI is rather > pointless > and only adds needless complexity. For MSI I would just use a theraded > interrupt handler. For INTx, I would only use a fast interrupt handler if > there is a really good reason to do so (e.g. em(4) does so to work around > broken Intel Host-PCI bridges). A bit more context: Matteo is looking at the latency of RPCs across the network involving userspace processes, and possibly using the netmap API. As we understand it: if you are not using a filter, the interrupt calls a "predefined" filter (kern_intr.c::intr_event_schedule_thread() ? ) which wakes up the handler thread which in turn wakes up the user process. This means two scheduler invocations on each side. In the case of netmap, all the handler needs to do is a selwakeup() of the user thread blocked on the file descriptor, so if this can be done in the filter we can save an extra step through the scheduler. cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Friday, November 18, 2011 12:06:15 pm Luigi Rizzo wrote: > On Fri, Nov 18, 2011 at 08:00:06AM -0500, John Baldwin wrote: > > On Friday, November 18, 2011 3:46:02 am Matteo Landi wrote: > > > > you probably want to be using MSI-X for a 10G NIC instead of INTx > > > > anyway. > > > > > > Why do you say that? Is MSI-X faster than INTx in terms of interrupt > > > latency? When should I use MSI-X, instead of fast filters interrupts > > > (fast interrupt?), instead of ithread interrupts? Thanks in advace. > > > > With MSI-X you can have more than one interrupt and those interrupts can be > > distributed across CPUs. This means you can (somewhat) tie each queue on > > your > > NIC to a different CPU. > > > > MSI-X vs INTx is orthogonal to fast vs filter, but in general MSI and MSI-X > > interrupts are not shared, and require no interrupt masking in hardware > > (they > > are effectively edge-triggered), so using a filter for MSI is rather > > pointless > > and only adds needless complexity. For MSI I would just use a theraded > > interrupt handler. For INTx, I would only use a fast interrupt handler if > > there is a really good reason to do so (e.g. em(4) does so to work around > > broken Intel Host-PCI bridges). > > A bit more context: Matteo is looking at the latency of RPCs across > the network involving userspace processes, and possibly using the > netmap API. As we understand it: > > if you are not using a filter, the interrupt calls a "predefined" > filter (kern_intr.c::intr_event_schedule_thread() ? ) which wakes > up the handler thread which in turn wakes up the user process. This > means two scheduler invocations on each side. Yes, but if you use a filter you still have to do that as your filter would just be queueing a task to on a taskqueue which would then do the actual selwakeup() from a taskqueue thread. Filters are typically used to avoid masking the interrupt in the PIC, or to limit the handlers executed on a shared interrupt. > In the case of netmap, all the handler needs to do is a selwakeup() > of the user thread blocked on the file descriptor, so if this > can be done in the filter we can save an extra step through the > scheduler. You can't call selwakeup() from a filter, it is not safe since it uses mutexes, etc. There are only a few things you can do from a filter. You could do a plain wakeup() if you let userland use a custom ioctl to block on the filter, but not selwakeup(). -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Fri, Nov 18, 2011 at 12:20:04PM -0500, John Baldwin wrote: > On Friday, November 18, 2011 12:06:15 pm Luigi Rizzo wrote: ... > > A bit more context: Matteo is looking at the latency of RPCs across > > the network involving userspace processes, and possibly using the > > netmap API. As we understand it: > > > > if you are not using a filter, the interrupt calls a "predefined" > > filter (kern_intr.c::intr_event_schedule_thread() ? ) which wakes > > up the handler thread which in turn wakes up the user process. This > > means two scheduler invocations on each side. > > Yes, but if you use a filter you still have to do that as your filter would > just be queueing a task to on a taskqueue which would then do the actual > selwakeup() from a taskqueue thread. Filters are typically used to avoid > masking the interrupt in the PIC, or to limit the handlers executed on a > shared interrupt. > > > In the case of netmap, all the handler needs to do is a selwakeup() > > of the user thread blocked on the file descriptor, so if this > > can be done in the filter we can save an extra step through the > > scheduler. > > You can't call selwakeup() from a filter, it is not safe since it uses > mutexes, etc. There are only a few things you can do from a filter. > You could do a plain wakeup() if you let userland use a custom ioctl to > block on the filter, but not selwakeup(). ok, this is good to know - i wasn't sure if selwakeup() could block (and i am a bit unclear why). Will look at the selrecord/selwakeup pair, thanks for the suggestion. One more thing (i am mentioning it here for archival purposes, as i keep forgetting to test it). Is entropy harvesting expensive ? I see it is on by default > sysctl -a | grep rando kern.randompid: 0 kern.random.yarrow.gengateinterval: 10 kern.random.yarrow.bins: 10 kern.random.yarrow.fastthresh: 192 kern.random.yarrow.slowthresh: 256 kern.random.yarrow.slowoverthresh: 2 kern.random.sys.seeded: 1 kern.random.sys.harvest.ethernet: 1 kern.random.sys.harvest.point_to_point: 1 kern.random.sys.harvest.interrupt: 1 kern.random.sys.harvest.swi: 0 ... and there seems to be a call to random_harvest() in the default filter that wakes up the threaded handler. cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On 11/18/2011 09:54, Luigi Rizzo wrote: > One more thing (i am mentioning it here for archival purposes, > as i keep forgetting to test it). Is entropy harvesting expensive ? No. It was designed to be inexpensive on purpose. :) -- "We could put the whole Internet into a book." "Too practical." Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote: > On 11/18/2011 09:54, Luigi Rizzo wrote: > > One more thing (i am mentioning it here for archival purposes, > > as i keep forgetting to test it). Is entropy harvesting expensive ? > > No. It was designed to be inexpensive on purpose. :) hmmm unfortunately I don't have a chance to test it until monday (probably one could see if the ping times change by modifying the value of kern.random.sys.harvest.* ). But in the code i see the following: - the harvest routine is this: void random_harvest(void *entropy, u_int count, u_int bits, u_int frac, enum esource origin) { if (reap_func) (*reap_func)(get_cyclecount(), entropy, count, bits, frac, origin); } - the reap_func seems to be bound to dev/random/randomdev_soft.c::random_harvest_internal() which internally uses a spinlock and then moves entries between two lists. I am concerned that the get_cyclecount() might end up querying an expensive device (is it using kern.timecounter.hardware ?) > sysctl -a | grep timecounter kern.timecounter.tick: 1 kern.timecounter.choice: TSC(-100) HPET(900) ACPI-fast(1000) i8254(0) dummy(-100) kern.timecounter.hardware: ACPI-fast So between the indirect function call, spinlock, list manipulation and the cyclecounter i wouldn't be surprised it the whole thing takes a microsecond or so. Anyways, on monday i'll know better. in the meantime, if someone wants to give it a try... in our tests between two machines and ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with a ping -f . BTW the reason for the difference is totally unclear to me (ping -f uses a non-blocking select() but i don't think it can explain such a large delta). cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote: > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote: > > On 11/18/2011 09:54, Luigi Rizzo wrote: > > > One more thing (i am mentioning it here for archival purposes, > > > as i keep forgetting to test it). Is entropy harvesting expensive ? > > > > No. It was designed to be inexpensive on purpose. :) > > hmmm > unfortunately I don't have a chance to test it until monday > (probably one could see if the ping times change by modifying > the value of kern.random.sys.harvest.* ). > > But in the code i see the following: > > - the harvest routine is this: > > void > random_harvest(void *entropy, u_int count, u_int bits, u_int frac, > enum esource origin) > { > if (reap_func) > (*reap_func)(get_cyclecount(), entropy, count, bits, frac, > origin); > } > > - the reap_func seems to be bound to > > dev/random/randomdev_soft.c::random_harvest_internal() > > which internally uses a spinlock and then moves entries between > two lists. > > I am concerned that the get_cyclecount() might end up querying an > expensive device (is it using kern.timecounter.hardware ?) On modern x86 it just does rdtsc(). > So between the indirect function call, spinlock, list manipulation > and the cyclecounter i wouldn't be surprised it the whole thing > takes a microsecond or so. I suspect it is not quite that expensive. > Anyways, on monday i'll know better. in the meantime, if someone > wants to give it a try... in our tests between two machines and > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with > a ping -f . Did you time it with harvest.interrupt disabled? -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote: > On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote: > > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote: > > > On 11/18/2011 09:54, Luigi Rizzo wrote: > > > > One more thing (i am mentioning it here for archival purposes, > > > > as i keep forgetting to test it). Is entropy harvesting expensive ? > > > > > > No. It was designed to be inexpensive on purpose. :) > > > > hmmm > > unfortunately I don't have a chance to test it until monday > > (probably one could see if the ping times change by modifying > > the value of kern.random.sys.harvest.* ). > > > > But in the code i see the following: > > > > - the harvest routine is this: > > > > void > > random_harvest(void *entropy, u_int count, u_int bits, u_int frac, > > enum esource origin) > > { > > if (reap_func) > > (*reap_func)(get_cyclecount(), entropy, count, bits, frac, > > origin); > > } > > > > - the reap_func seems to be bound to > > > > dev/random/randomdev_soft.c::random_harvest_internal() > > > > which internally uses a spinlock and then moves entries between > > two lists. > > > > I am concerned that the get_cyclecount() might end up querying an > > expensive device (is it using kern.timecounter.hardware ?) > > On modern x86 it just does rdtsc(). > > > So between the indirect function call, spinlock, list manipulation > > and the cyclecounter i wouldn't be surprised it the whole thing > > takes a microsecond or so. > > I suspect it is not quite that expensive. > > > Anyways, on monday i'll know better. in the meantime, if someone > > wants to give it a try... in our tests between two machines and > > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping > > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with > > a ping -f . > > Did you time it with harvest.interrupt disabled? yes, thanks for reminding me to post the results. Using unmodified ping (which has 1us resolution on the reports), there is no measurable difference irrespective of the setting of kern.random.sys.harvest.ethernet, kern.random.sys.harvest.interrupt and kern.timecounter.hardware. Have tried to set hw mitigation to 0 on the NIC (ixgbe on both sides) but there is no visible effect either. However I don't trust my measurements because i cannot explain them. Response times have a min of 20us (about 50 out of 5000 samples) and a median of 27us, and i really don't understand if the low readings are real or the result of some races. Ping does a gettimeofday() for the initial timestamp, and relies on in-kernel timestamp for the response. cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: ixgbe and fast interrupts
On Monday, November 21, 2011 12:36:15 pm Luigi Rizzo wrote: > On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote: > > On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote: > > > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote: > > > > On 11/18/2011 09:54, Luigi Rizzo wrote: > > > > > One more thing (i am mentioning it here for archival purposes, > > > > > as i keep forgetting to test it). Is entropy harvesting expensive ? > > > > > > > > No. It was designed to be inexpensive on purpose. :) > > > > > > hmmm > > > unfortunately I don't have a chance to test it until monday > > > (probably one could see if the ping times change by modifying > > > the value of kern.random.sys.harvest.* ). > > > > > > But in the code i see the following: > > > > > > - the harvest routine is this: > > > > > > void > > > random_harvest(void *entropy, u_int count, u_int bits, u_int frac, > > > enum esource origin) > > > { > > > if (reap_func) > > > (*reap_func)(get_cyclecount(), entropy, count, bits, frac, > > > origin); > > > } > > > > > > - the reap_func seems to be bound to > > > > > > dev/random/randomdev_soft.c::random_harvest_internal() > > > > > > which internally uses a spinlock and then moves entries between > > > two lists. > > > > > > I am concerned that the get_cyclecount() might end up querying an > > > expensive device (is it using kern.timecounter.hardware ?) > > > > On modern x86 it just does rdtsc(). > > > > > So between the indirect function call, spinlock, list manipulation > > > and the cyclecounter i wouldn't be surprised it the whole thing > > > takes a microsecond or so. > > > > I suspect it is not quite that expensive. > > > > > Anyways, on monday i'll know better. in the meantime, if someone > > > wants to give it a try... in our tests between two machines and > > > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping > > > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with > > > a ping -f . > > > > Did you time it with harvest.interrupt disabled? > > yes, thanks for reminding me to post the results. > > Using unmodified ping (which has 1us resolution on the reports), > there is no measurable difference irrespective > of the setting of kern.random.sys.harvest.ethernet, > kern.random.sys.harvest.interrupt and kern.timecounter.hardware. > Have tried to set hw mitigation to 0 on the NIC (ixgbe on both > sides) but there is no visible effect either. I had forgotten that kern.random.sys.harvest.interrupt only matters if the interrupt handlers pass the INTR_ENTROPY flag to bus_setup_intr(). I suspect your drivers probably aren't doing that anyway. > However I don't trust my measurements because i cannot explain them. > Response times have a min of 20us (about 50 out of 5000 samples) > and a median of 27us, and i really don't understand if the low > readings are real or the result of some races. Hmm, 7 us does seem a bit much for a spread. > Ping does a gettimeofday() for the initial timestamp, and relies > on in-kernel timestamp for the response. Hmm, gettimeofday() isn't super cheap. What I do for measuring RTT is to use an optimized echo server (not the one in inetd) on the remote host and reflect packets off of that. The sender/receiver puts a TSC timestamp into the packet payload and computes a TSC delta when it receives the reflected response. I then run ministat over the TSC deltas to get RTT in TSC counts and use machdep.tsc_freq of the sending machine to convert the TSC delta values to microseconds. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"