Re: ixgbe and fast interrupts

2011-11-17 Thread Ryan Stone
The comments haven't kept up with the code.  You are correct; in the
legacy interrupt case ixgbe is using an ITHREAD, not a fast handler.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-17 Thread John Baldwin
On Thursday, November 17, 2011 6:38:21 am Matteo Landi wrote:
> Hi everybody,
> 
> trying to measure the interrupt latency of a 10G Intel network
> adapter, I find out that the the used driver (ixgbe) can can be
> configured to work with both fast and standard interrupts. From my
> understanding of the BUS_SETUP_INTR(9) man page and
> sys/kern/kern_intr.c file, it seems that drivers in need of
> registering fast interrupts should call bus_setup_intr() specifying a
> filter function instead of a handler.
> 
> My question is: why ixgbe_allocate_legacy() says it is allocating a
> fast interrupt (comments and error messages say so) but instead passes
> a handler instead of a filter to pci_setup_intr()? Is there anything I
> am not taking into account?

It is not a fast handler and is not using the fast handler stuff.  OTOH,
you probably want to be using MSI-X for a 10G NIC instead of INTx anyway.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-18 Thread Matteo Landi
> you probably want to be using MSI-X for a 10G NIC instead of INTx anyway.

Why do you say that? Is MSI-X faster than INTx in terms of interrupt
latency? When should I use MSI-X, instead of fast filters interrupts
(fast interrupt?), instead of ithread interrupts? Thanks in advace.


Regards,
Matteo

-- 
http://www.matteolandi.net/
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-18 Thread Matteo Landi
On Thu, Nov 17, 2011 at 3:56 PM, Ryan Stone  wrote:
> The comments haven't kept up with the code.  You are correct; in the
> legacy interrupt case ixgbe is using an ITHREAD, not a fast handler.

Do I have to send an email to the maintainer of the ixgbe driver and
ask him to update the comments, or could I send you a patch for that?


Regards,
Matteo

-- 
http://www.matteolandi.net/
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-18 Thread John Baldwin
On Friday, November 18, 2011 3:46:02 am Matteo Landi wrote:
> > you probably want to be using MSI-X for a 10G NIC instead of INTx anyway.
> 
> Why do you say that? Is MSI-X faster than INTx in terms of interrupt
> latency? When should I use MSI-X, instead of fast filters interrupts
> (fast interrupt?), instead of ithread interrupts? Thanks in advace.

With MSI-X you can have more than one interrupt and those interrupts can be 
distributed across CPUs.  This means you can (somewhat) tie each queue on your 
NIC to a different CPU.

MSI-X vs INTx is orthogonal to fast vs filter, but in general MSI and MSI-X 
interrupts are not shared, and require no interrupt masking in hardware (they 
are effectively edge-triggered), so using a filter for MSI is rather pointless 
and only adds needless complexity.  For MSI I would just use a theraded 
interrupt handler.  For INTx, I would only use a fast interrupt handler if 
there is a really good reason to do so (e.g. em(4) does so to work around 
broken Intel Host-PCI bridges).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-18 Thread Luigi Rizzo
On Fri, Nov 18, 2011 at 08:00:06AM -0500, John Baldwin wrote:
> On Friday, November 18, 2011 3:46:02 am Matteo Landi wrote:
> > > you probably want to be using MSI-X for a 10G NIC instead of INTx anyway.
> > 
> > Why do you say that? Is MSI-X faster than INTx in terms of interrupt
> > latency? When should I use MSI-X, instead of fast filters interrupts
> > (fast interrupt?), instead of ithread interrupts? Thanks in advace.
> 
> With MSI-X you can have more than one interrupt and those interrupts can be 
> distributed across CPUs.  This means you can (somewhat) tie each queue on 
> your 
> NIC to a different CPU.
> 
> MSI-X vs INTx is orthogonal to fast vs filter, but in general MSI and MSI-X 
> interrupts are not shared, and require no interrupt masking in hardware (they 
> are effectively edge-triggered), so using a filter for MSI is rather 
> pointless 
> and only adds needless complexity.  For MSI I would just use a theraded 
> interrupt handler.  For INTx, I would only use a fast interrupt handler if 
> there is a really good reason to do so (e.g. em(4) does so to work around 
> broken Intel Host-PCI bridges).

A bit more context: Matteo is looking at the latency of RPCs across
the network involving userspace processes, and possibly using the
netmap API. As we understand it:

if you are not using a filter, the interrupt calls a "predefined"
filter (kern_intr.c::intr_event_schedule_thread() ? ) which wakes
up the handler thread which in turn wakes up the user process.  This
means two scheduler invocations on each side.

In the case of netmap, all the handler needs to do is a selwakeup()
of the user thread blocked on the file descriptor, so if this
can be done in the filter we can save an extra step through the
scheduler.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-18 Thread John Baldwin
On Friday, November 18, 2011 12:06:15 pm Luigi Rizzo wrote:
> On Fri, Nov 18, 2011 at 08:00:06AM -0500, John Baldwin wrote:
> > On Friday, November 18, 2011 3:46:02 am Matteo Landi wrote:
> > > > you probably want to be using MSI-X for a 10G NIC instead of INTx 
> > > > anyway.
> > > 
> > > Why do you say that? Is MSI-X faster than INTx in terms of interrupt
> > > latency? When should I use MSI-X, instead of fast filters interrupts
> > > (fast interrupt?), instead of ithread interrupts? Thanks in advace.
> > 
> > With MSI-X you can have more than one interrupt and those interrupts can be 
> > distributed across CPUs.  This means you can (somewhat) tie each queue on 
> > your 
> > NIC to a different CPU.
> > 
> > MSI-X vs INTx is orthogonal to fast vs filter, but in general MSI and MSI-X 
> > interrupts are not shared, and require no interrupt masking in hardware 
> > (they 
> > are effectively edge-triggered), so using a filter for MSI is rather 
> > pointless 
> > and only adds needless complexity.  For MSI I would just use a theraded 
> > interrupt handler.  For INTx, I would only use a fast interrupt handler if 
> > there is a really good reason to do so (e.g. em(4) does so to work around 
> > broken Intel Host-PCI bridges).
> 
> A bit more context: Matteo is looking at the latency of RPCs across
> the network involving userspace processes, and possibly using the
> netmap API. As we understand it:
> 
> if you are not using a filter, the interrupt calls a "predefined"
> filter (kern_intr.c::intr_event_schedule_thread() ? ) which wakes
> up the handler thread which in turn wakes up the user process.  This
> means two scheduler invocations on each side.

Yes, but if you use a filter you still have to do that as your filter would
just be queueing a task to on a taskqueue which would then do the actual
selwakeup() from a taskqueue thread.  Filters are typically used to avoid
masking the interrupt in the PIC, or to limit the handlers executed on a
shared interrupt.

> In the case of netmap, all the handler needs to do is a selwakeup()
> of the user thread blocked on the file descriptor, so if this
> can be done in the filter we can save an extra step through the
> scheduler.

You can't call selwakeup() from a filter, it is not safe since it uses
mutexes, etc.  There are only a few things you can do from a filter.
You could do a plain wakeup() if you let userland use a custom ioctl to
block on the filter, but not selwakeup().

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-18 Thread Luigi Rizzo
On Fri, Nov 18, 2011 at 12:20:04PM -0500, John Baldwin wrote:
> On Friday, November 18, 2011 12:06:15 pm Luigi Rizzo wrote:
...
> > A bit more context: Matteo is looking at the latency of RPCs across
> > the network involving userspace processes, and possibly using the
> > netmap API. As we understand it:
> > 
> > if you are not using a filter, the interrupt calls a "predefined"
> > filter (kern_intr.c::intr_event_schedule_thread() ? ) which wakes
> > up the handler thread which in turn wakes up the user process.  This
> > means two scheduler invocations on each side.
> 
> Yes, but if you use a filter you still have to do that as your filter would
> just be queueing a task to on a taskqueue which would then do the actual
> selwakeup() from a taskqueue thread.  Filters are typically used to avoid
> masking the interrupt in the PIC, or to limit the handlers executed on a
> shared interrupt.
> 
> > In the case of netmap, all the handler needs to do is a selwakeup()
> > of the user thread blocked on the file descriptor, so if this
> > can be done in the filter we can save an extra step through the
> > scheduler.
> 
> You can't call selwakeup() from a filter, it is not safe since it uses
> mutexes, etc.  There are only a few things you can do from a filter.
> You could do a plain wakeup() if you let userland use a custom ioctl to
> block on the filter, but not selwakeup().

ok, this is good to know - i wasn't sure if selwakeup() could block
(and i am a bit unclear why). Will look at the selrecord/selwakeup
pair, thanks for the suggestion.

One more thing (i am mentioning it here for archival purposes,
as i keep forgetting to test it). Is entropy harvesting expensive ?
I see it is on by default

> sysctl -a | grep rando
kern.randompid: 0
kern.random.yarrow.gengateinterval: 10
kern.random.yarrow.bins: 10
kern.random.yarrow.fastthresh: 192
kern.random.yarrow.slowthresh: 256
kern.random.yarrow.slowoverthresh: 2
kern.random.sys.seeded: 1
kern.random.sys.harvest.ethernet: 1
kern.random.sys.harvest.point_to_point: 1
kern.random.sys.harvest.interrupt: 1
kern.random.sys.harvest.swi: 0
...

and there seems to be a call to random_harvest() in the default
filter that wakes up the threaded handler.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-18 Thread Doug Barton
On 11/18/2011 09:54, Luigi Rizzo wrote:
> One more thing (i am mentioning it here for archival purposes,
> as i keep forgetting to test it). Is entropy harvesting expensive ?

No. It was designed to be inexpensive on purpose. :)


-- 

"We could put the whole Internet into a book."
"Too practical."

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-18 Thread Luigi Rizzo
On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote:
> On 11/18/2011 09:54, Luigi Rizzo wrote:
> > One more thing (i am mentioning it here for archival purposes,
> > as i keep forgetting to test it). Is entropy harvesting expensive ?
> 
> No. It was designed to be inexpensive on purpose. :)

hmmm
unfortunately I don't have a chance to test it until monday
(probably one could see if the ping times change by modifying
the value of kern.random.sys.harvest.* ).

But in the code i see the following:

- the harvest routine is this:

void
random_harvest(void *entropy, u_int count, u_int bits, u_int frac,
enum esource origin)
{
if (reap_func)
(*reap_func)(get_cyclecount(), entropy, count, bits, frac,
origin);
}

- the reap_func seems to be bound to

dev/random/randomdev_soft.c::random_harvest_internal()

  which internally uses a spinlock and then moves entries between
  two lists.

I am concerned that the get_cyclecount() might end up querying an
expensive device (is it using kern.timecounter.hardware ?)

> sysctl -a | grep timecounter
kern.timecounter.tick: 1
kern.timecounter.choice: TSC(-100) HPET(900) ACPI-fast(1000) i8254(0) 
dummy(-100)
kern.timecounter.hardware: ACPI-fast

So between the indirect function call, spinlock, list manipulation
and the cyclecounter i wouldn't be surprised it the whole thing
takes a microsecond or so.

Anyways, on monday i'll know better. in the meantime, if someone
wants to give it a try... in our tests between two machines and
ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping
time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with
a ping -f .
BTW the reason for the difference is totally unclear to me (ping
-f uses a non-blocking select() but i don't think it can explain
such a large delta).

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-21 Thread John Baldwin
On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote:
> On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote:
> > On 11/18/2011 09:54, Luigi Rizzo wrote:
> > > One more thing (i am mentioning it here for archival purposes,
> > > as i keep forgetting to test it). Is entropy harvesting expensive ?
> > 
> > No. It was designed to be inexpensive on purpose. :)
> 
> hmmm
> unfortunately I don't have a chance to test it until monday
> (probably one could see if the ping times change by modifying
> the value of kern.random.sys.harvest.* ).
> 
> But in the code i see the following:
> 
> - the harvest routine is this:
> 
> void
> random_harvest(void *entropy, u_int count, u_int bits, u_int frac,
>   enum esource origin)
> {
> if (reap_func)
> (*reap_func)(get_cyclecount(), entropy, count, bits, frac,
> origin);
> }
> 
> - the reap_func seems to be bound to
> 
> dev/random/randomdev_soft.c::random_harvest_internal()
> 
>   which internally uses a spinlock and then moves entries between
>   two lists.
> 
> I am concerned that the get_cyclecount() might end up querying an
> expensive device (is it using kern.timecounter.hardware ?)

On modern x86 it just does rdtsc().

> So between the indirect function call, spinlock, list manipulation
> and the cyclecounter i wouldn't be surprised it the whole thing
> takes a microsecond or so.

I suspect it is not quite that expensive.

> Anyways, on monday i'll know better. in the meantime, if someone
> wants to give it a try... in our tests between two machines and
> ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping
> time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with
> a ping -f .

Did you time it with harvest.interrupt disabled?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-21 Thread Luigi Rizzo
On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote:
> On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote:
> > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote:
> > > On 11/18/2011 09:54, Luigi Rizzo wrote:
> > > > One more thing (i am mentioning it here for archival purposes,
> > > > as i keep forgetting to test it). Is entropy harvesting expensive ?
> > > 
> > > No. It was designed to be inexpensive on purpose. :)
> > 
> > hmmm
> > unfortunately I don't have a chance to test it until monday
> > (probably one could see if the ping times change by modifying
> > the value of kern.random.sys.harvest.* ).
> > 
> > But in the code i see the following:
> > 
> > - the harvest routine is this:
> > 
> > void
> > random_harvest(void *entropy, u_int count, u_int bits, u_int frac,
> > enum esource origin)
> > {
> > if (reap_func)
> > (*reap_func)(get_cyclecount(), entropy, count, bits, frac,
> > origin);
> > }
> > 
> > - the reap_func seems to be bound to
> > 
> > dev/random/randomdev_soft.c::random_harvest_internal()
> > 
> >   which internally uses a spinlock and then moves entries between
> >   two lists.
> > 
> > I am concerned that the get_cyclecount() might end up querying an
> > expensive device (is it using kern.timecounter.hardware ?)
> 
> On modern x86 it just does rdtsc().
> 
> > So between the indirect function call, spinlock, list manipulation
> > and the cyclecounter i wouldn't be surprised it the whole thing
> > takes a microsecond or so.
> 
> I suspect it is not quite that expensive.
> 
> > Anyways, on monday i'll know better. in the meantime, if someone
> > wants to give it a try... in our tests between two machines and
> > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping
> > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with
> > a ping -f .
> 
> Did you time it with harvest.interrupt disabled?

yes, thanks for reminding me to post the results.

Using unmodified ping (which has 1us resolution on the reports),
there is no measurable difference irrespective
of the setting of kern.random.sys.harvest.ethernet,
kern.random.sys.harvest.interrupt and kern.timecounter.hardware.
Have tried to set hw mitigation to 0 on the NIC (ixgbe on both
sides) but there is no visible effect either.

However I don't trust my measurements because i cannot explain them.
Response times have a min of 20us (about 50 out of 5000 samples)
and a median of 27us, and i really don't understand if the low
readings are real or the result of some races.
Ping does a gettimeofday() for the initial timestamp, and relies
on in-kernel timestamp for the response.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ixgbe and fast interrupts

2011-11-22 Thread John Baldwin
On Monday, November 21, 2011 12:36:15 pm Luigi Rizzo wrote:
> On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote:
> > On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote:
> > > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote:
> > > > On 11/18/2011 09:54, Luigi Rizzo wrote:
> > > > > One more thing (i am mentioning it here for archival purposes,
> > > > > as i keep forgetting to test it). Is entropy harvesting expensive ?
> > > > 
> > > > No. It was designed to be inexpensive on purpose. :)
> > > 
> > > hmmm
> > > unfortunately I don't have a chance to test it until monday
> > > (probably one could see if the ping times change by modifying
> > > the value of kern.random.sys.harvest.* ).
> > > 
> > > But in the code i see the following:
> > > 
> > > - the harvest routine is this:
> > > 
> > > void
> > > random_harvest(void *entropy, u_int count, u_int bits, u_int frac,
> > >   enum esource origin)
> > > {
> > > if (reap_func)
> > > (*reap_func)(get_cyclecount(), entropy, count, bits, frac,
> > > origin);
> > > }
> > > 
> > > - the reap_func seems to be bound to
> > > 
> > > dev/random/randomdev_soft.c::random_harvest_internal()
> > > 
> > >   which internally uses a spinlock and then moves entries between
> > >   two lists.
> > > 
> > > I am concerned that the get_cyclecount() might end up querying an
> > > expensive device (is it using kern.timecounter.hardware ?)
> > 
> > On modern x86 it just does rdtsc().
> > 
> > > So between the indirect function call, spinlock, list manipulation
> > > and the cyclecounter i wouldn't be surprised it the whole thing
> > > takes a microsecond or so.
> > 
> > I suspect it is not quite that expensive.
> > 
> > > Anyways, on monday i'll know better. in the meantime, if someone
> > > wants to give it a try... in our tests between two machines and
> > > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping
> > > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with
> > > a ping -f .
> > 
> > Did you time it with harvest.interrupt disabled?
> 
> yes, thanks for reminding me to post the results.
> 
> Using unmodified ping (which has 1us resolution on the reports),
> there is no measurable difference irrespective
> of the setting of kern.random.sys.harvest.ethernet,
> kern.random.sys.harvest.interrupt and kern.timecounter.hardware.
> Have tried to set hw mitigation to 0 on the NIC (ixgbe on both
> sides) but there is no visible effect either.

I had forgotten that kern.random.sys.harvest.interrupt only matters if the
interrupt handlers pass the INTR_ENTROPY flag to bus_setup_intr().  I
suspect your drivers probably aren't doing that anyway.

> However I don't trust my measurements because i cannot explain them.
> Response times have a min of 20us (about 50 out of 5000 samples)
> and a median of 27us, and i really don't understand if the low
> readings are real or the result of some races.

Hmm, 7 us does seem a bit much for a spread.

> Ping does a gettimeofday() for the initial timestamp, and relies
> on in-kernel timestamp for the response.

Hmm, gettimeofday() isn't super cheap.  What I do for measuring RTT is to
use an optimized echo server (not the one in inetd) on the remote host and
reflect packets off of that.  The sender/receiver puts a TSC timestamp into
the packet payload and computes a TSC delta when it receives the reflected
response.  I then run ministat over the TSC deltas to get RTT in TSC counts
and use machdep.tsc_freq of the sending machine to convert the TSC delta
values to microseconds.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"