Hi Hillf, Cong, Paolo,

Sorry for the late reply due to other urgent task.

I tried Hillf's patch (shown below on my tree) and it doesn't help and
the jitter shows up very quickly.

--- ./include/net/sch_generic.h.orig 2020-08-21 15:13:51.787952710 +0800
+++ ./include/net/sch_generic.h 2020-09-04 10:48:32.081217156 +0800
@@ -108,6 +108,7 @@

  spinlock_t busylock ____cacheline_aligned_in_smp;
  spinlock_t seqlock;
+ int run, seq;
 };

 static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
@@ -127,8 +128,11 @@
 static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 {
  if (qdisc->flags & TCQ_F_NOLOCK) {
+ qdisc->run++;
+ smp_wmb();
  if (!spin_trylock(&qdisc->seqlock))
  return false;
+ qdisc->seq = qdisc->run;
  } else if (qdisc_is_running(qdisc)) {
  return false;
  }
@@ -143,8 +147,15 @@
 static inline void qdisc_run_end(struct Qdisc *qdisc)
 {
  write_seqcount_end(&qdisc->running);
- if (qdisc->flags & TCQ_F_NOLOCK)
+ if (qdisc->flags & TCQ_F_NOLOCK) {
+ int seq = qdisc->seq;
+
  spin_unlock(&qdisc->seqlock);
+ smp_rmb();
+ if (seq != qdisc->run)
+ __netif_schedule(qdisc);
+
+ }
 }


I also tried Cong's patch (shown below on my tree) and it could avoid
the issue (stressing for 30 minutus for three times and not jitter
observed).

--- ./include/net/sch_generic.h.orig 2020-08-21 15:13:51.787952710 +0800
+++ ./include/net/sch_generic.h 2020-09-03 21:36:11.468383738 +0800
@@ -127,8 +127,7 @@
 static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 {
  if (qdisc->flags & TCQ_F_NOLOCK) {
- if (!spin_trylock(&qdisc->seqlock))
- return false;
+ spin_lock(&qdisc->seqlock);
  } else if (qdisc_is_running(qdisc)) {
  return false;
  }

I am not actually know what you are discussing above. It seems to me
that Cong's patch is similar as disabling lockless feature.

Anyway, we are going to use fq_codel instead, since CentOS 8/kernel
4.18 also uses fq_codel as the default qdisc, not sure whehter they
found some thing related to this.

Thanks,
Kehuan

Hillf Danton <hdan...@sina.com> 于2020年9月3日周四 下午6:20写道:
>
>
> On Thu, 03 Sep 2020 10:39:54 +0200 Paolo Abeni wrote:
> > On Wed, 2020-09-02 at 22:01 -0700, Cong Wang wrote:
> > > Can you test the attached one-line fix? I think we are overthinking,
> > > probably all
> > > we need here is a busy wait.
> >
> > I think that will solve, but I also think that will kill NOLOCK
> > performances due to really increased contention.
> >
> > At this point I fear we could consider reverting the NOLOCK stuff.
> > I personally would hate doing so, but it looks like NOLOCK benefits are
> > outweighed by its issues.
> >
> > Any other opinion more than welcome!
>
> Hi Paolo,
>
> I suspect it's too late to fix the -27% below.
> Surgery to cut NOLOCK seems too early before the fix.
>
> Hillf
>
> >pktgen threads vanilla         patched[II]     delta
> >nr             kpps            kpps            %
> >1              3240            3240            0
> >2              3910            2830            -27%
> >4              5140            5140            0
>

Reply via email to