Dear Linux folks,
A very interesting issue.
Am 27.08.25 um 14:57 schrieb Kurt Kanzenbach:
On Tue Aug 26 2025, Jacob Keller wrote:
On 8/26/2025 5:59 AM, Sebastian Andrzej Siewior wrote:
On 2025-08-25 16:28:38 [-0700], Jacob Keller wrote:
Ya, I don't think we fully understand either. Miroslav said he tested on
I350 which is a different MAC from the I210, so it could be something
there. Theoretically we could handle just I210 directly in the interrupt
and leave the other variants to the kworker.. but I don't know how much
benefit we get from that. The data sheet for the I350 appears to have
more or less the same logic for Tx timestamps. It is significantly
different for Rx timestamps though.
From logical point of view it makes sense to retrieve the HW timestamp
immediately when it becomes available and feed it to the stack. I can't
imagine how delaying it to yet another thread improves the situation.
The benchmark is about > 1k packets/ second while in reality you have
less than 20 packets a second. With multiple applications you usually
need a "second timestamp register" or you may lose packets.
Delaying it to the AUX worker makes sense for hardware which can't fire
an interrupt and polling is the only option left. This is sane in this
case but I don't like this solution as some kind compromise for
everyone. Simply because it adds overhead and requires additional
configuration.
I agree. Its just frustrating that doing so appears to cause a
regression in at least one test setup on hardware which uses this method.
Also I couldn't really see a performance degradation with ntpperf. In my
tests the IRQ variant reached an equal or higher rate. But sometimes I
get 'Could not send requests at rate X'. No idea what that means.
Anyway, this patch is basically a compromise. It works for Miroslav and
my use case.
This is also what the igc does and the performance improved
afa141583d827 ("igc: Retrieve TX timestamp during interrupt handling")
igc supports several hardware variations which are all a lot similar to
i210 than i350 is to i210 in igb. I could see this working fine for i210
if it works fine in igb.. I honestly am at a loss currently why i350 is
much worse.
and here it causes the opposite?
As said above, I'm out of ideas here.
Same. It may be one of those things where the effort to dig up precisely
what has gone wrong is so large that it becomes not feasible relative to
the gain :(
Could we please use the direct retrieval/ submission for HW which
supports it and fallback to the AUX worker (instead of the kworker) for
HW which does not have an interrupt for it?
I have no objection. Perhaps we could assume the high end of the ntpperf
benchmark is not reflective of normal use case? We *are* limited to only
one timestamp register, which the igb driver does protect by bitlock.
Does that mean we're going back to v1 + the AUX worker for 82576? Let me
prepare v3 then.
Good question. Personally, I’d interpret Linux’ no-regression-policy
that, if a possible regression is known, even for a synthetic benchmark,
it must not be introduced unrelated how upsetting this is. So the
current approach needs to be taken.
Kind regards,
Paul