Re: reduce iwm's RTS retry limit

2016-11-20 Thread Stefan Sperling
On Sun, Nov 20, 2016 at 11:43:54PM +0100, Stefan Sperling wrote:
> Linux uses 60 retries for Block Ack Requests, which makes some sense (they
> are not sent often, but usually periodically, and are used to negotiate an
> agreement to set up and use block ack).

Oops, sorry, I confused Block Ack Requests (BAR) with ADDBA Requests here!

So the reason 60 retries make sense for a BAR frame is that this frame is
used to elicit a Block Ack (list of ACKs) from the peer, when Tx can't
advance any further due to outstanding ACKs. Without ACKs, the sender's only
options are to retry or, eventually, discard frames in the current window.
So getting this list of ACKs ASAP is important.



Re: reduce iwm's RTS retry limit

2016-11-20 Thread Stefan Sperling
On Sun, Nov 20, 2016 at 04:54:58PM +0100, Mark Kettenis wrote:
> > Date: Sat, 19 Nov 2016 18:11:23 +0100
> > From: Stefan Sperling 
> > 
> > The RTS retry limit we inherited from Linux seems insanely high.
> > 
> > It seems to be the cause for "bursty" pings and high latency for
> > smaller packets while larger packets from TCP streams are stuck
> > in the Tx queue:
> > 
> >  64 bytes from 192.168.1.12: icmp_seq=84 ttl=251 time=380.203 ms
> >  64 bytes from 192.168.1.12: icmp_seq=85 ttl=251 time=710.714 ms
> >  64 bytes from 192.168.1.12: icmp_seq=86 ttl=251 time=279.594 ms
> >  64 bytes from 192.168.1.12: icmp_seq=87 ttl=251 time=893.879 ms
> >  64 bytes from 192.168.1.12: icmp_seq=88 ttl=251 time=34800.236 ms
> >  64 bytes from 192.168.1.12: icmp_seq=89 ttl=251 time=33815.364 ms
> >  64 bytes from 192.168.1.12: icmp_seq=90 ttl=251 time=32824.247 ms
> >  64 bytes from 192.168.1.12: icmp_seq=91 ttl=251 time=31822.355 ms
> >  64 bytes from 192.168.1.12: icmp_seq=92 ttl=251 time=30817.395 ms
> >  64 bytes from 192.168.1.12: icmp_seq=93 ttl=251 time=29822.478 ms
> >  64 bytes from 192.168.1.12: icmp_seq=94 ttl=251 time=28817.508 ms
> > 
> > With this diff, while in bad channel conditions, instead of the above
> > I am seeing 22% lost ping packets, and reasonable latency for those
> > packets which make it through.
> > 
> > SSH into the machine is even possible (yet rather unusable), whereas
> > before it didn't work at all.
> > 
> > ok?
> 
> Is it really a good idea to deviate from what Linux does here?
> Perhaps for TCP this is slightly better, but how about other
> protocols?  Anything that does broadcasts/multicasts will suffer from
> higher packet loss.

I respect your intuition and doubt, but I think your reasoning is
based on wrong assumptions.

I believe your assumption that this can hurt broadcast isn't right.
RTS is only used for unicast frames, so loss probability for broadcast
frames is unaffected. Unless the magic firmware has a special queuing
policy for broadcasts anyway, this change may actually help broadcasts
which would otherwise get stuck in the queue behind large unicast frames.
 
The assumption that this was a "slight improvement" for TCP doesn't do
justice to my experience while testing this fix. It makes a world of
difference when the signal is weak: between no traffic at all and an average
15% packet loss. Which is very important when people can't fix the setup,
e.g. in hotel rooms with weak wifi signal.

34.8 seconds round-trip time for a ping packet is simply not acceptable,
especially if the trigger for that behaviour is under our control.
(For some time, I suspected the AP was buffering the echo reply so I tried
to fix the block ack receive path, but this assumption turned out to be wrong.)
Such latencies hurt other protocols, too. Given the choice between unreasonable
latencies or real packet loss, I would assume most protocols are better
equipped for dealing with real loss.

We can certainly argue about which retry threshold to use.
I don't see the point of sending up to 60 RTS frames per large data frame,
and I suppose you'll agree that 60 is definitely too high. In some packet
traces I checked the RTS frames made up the vast majority of frames sent
by a laptop far away from the AP, and very little useful data was sent.
I am happy to give it more than 3 times, if you have a better suggestion.

The reason I chose 3 is that I suspect this may be a copy-pasto in the Linux
driver code, based on the numbers shown in the context lines of the diff.
It's not unreasonable to treat control frames such as RTS similar to
management frames and the retry limit for the latter is also 3.
Linux uses 60 retries for Block Ack Requests, which makes some sense (they
are not sent often, but usually periodically, and are used to negotiate an
agreement to set up and use block ack).

And also, Linux does not see this problem because they don't enable RTS!
This makes sense for them. Because with Tx aggregation, RTS adds more overhead
than it's worth since losing some subframes is OK (block ack will compensate).
The Intel vendor drivers seem to use RTS only when the chip runs too hot (an
Intel engineer told me that pauses caused by RTS will allow the chip to cool)
or when the threshold is changed manually from userspace. As far as I can
tell that's all -- perhaps someone more familiar with the relevant Linux
sources will find more.

In the long term, I'd like us to move to a dynamic RTS threshold, which would
be adjusted by the rate scaling algorithm (the algo would also be in charge
of setting Tx aggregation limits). We would then use RTS a lot less and this
problem would become much less relevant.

> > Index: if_iwmreg.h
> > ===
> > RCS file: /cvs/src/sys/dev/pci/if_iwmreg.h,v
> > retrieving revision 1.19
> > diff -u -p -r1.19 if_iwmreg.h
> > --- if_iwmreg.h 20 Sep 2016 11:46:09 -  1.19
> > +++ if_iwmreg.h 19 Nov 2016 16:36:

Re: reduce iwm's RTS retry limit

2016-11-20 Thread Mark Kettenis
> Date: Sat, 19 Nov 2016 18:11:23 +0100
> From: Stefan Sperling 
> 
> The RTS retry limit we inherited from Linux seems insanely high.
> 
> It seems to be the cause for "bursty" pings and high latency for
> smaller packets while larger packets from TCP streams are stuck
> in the Tx queue:
> 
>  64 bytes from 192.168.1.12: icmp_seq=84 ttl=251 time=380.203 ms
>  64 bytes from 192.168.1.12: icmp_seq=85 ttl=251 time=710.714 ms
>  64 bytes from 192.168.1.12: icmp_seq=86 ttl=251 time=279.594 ms
>  64 bytes from 192.168.1.12: icmp_seq=87 ttl=251 time=893.879 ms
>  64 bytes from 192.168.1.12: icmp_seq=88 ttl=251 time=34800.236 ms
>  64 bytes from 192.168.1.12: icmp_seq=89 ttl=251 time=33815.364 ms
>  64 bytes from 192.168.1.12: icmp_seq=90 ttl=251 time=32824.247 ms
>  64 bytes from 192.168.1.12: icmp_seq=91 ttl=251 time=31822.355 ms
>  64 bytes from 192.168.1.12: icmp_seq=92 ttl=251 time=30817.395 ms
>  64 bytes from 192.168.1.12: icmp_seq=93 ttl=251 time=29822.478 ms
>  64 bytes from 192.168.1.12: icmp_seq=94 ttl=251 time=28817.508 ms
> 
> With this diff, while in bad channel conditions, instead of the above
> I am seeing 22% lost ping packets, and reasonable latency for those
> packets which make it through.
> 
> SSH into the machine is even possible (yet rather unusable), whereas
> before it didn't work at all.
> 
> ok?

Is it really a good idea to deviate from what Linux does here?
Perhaps for TCP this is slightly better, but how about other
protocols?  Anything that does broadcasts/multicasts will suffer from
higher packet loss.

In the end if you end up in a situation where the wireless connection
is so unreliable, you'll have to fix the wireless setup...


> Index: if_iwmreg.h
> ===
> RCS file: /cvs/src/sys/dev/pci/if_iwmreg.h,v
> retrieving revision 1.19
> diff -u -p -r1.19 if_iwmreg.h
> --- if_iwmreg.h   20 Sep 2016 11:46:09 -  1.19
> +++ if_iwmreg.h   19 Nov 2016 16:36:21 -
> @@ -4268,7 +4268,7 @@ struct iwm_lq_cmd {
>   */
>  #define IWM_DEFAULT_TX_RETRY 15
>  #define IWM_MGMT_DFAULT_RETRY_LIMIT  3
> -#define IWM_RTS_DFAULT_RETRY_LIMIT   60
> +#define IWM_RTS_DFAULT_RETRY_LIMIT   3
>  #define IWM_BAR_DFAULT_RETRY_LIMIT   60
>  #define IWM_LOW_RETRY_LIMIT  7
>  
> 
> 



reduce iwm's RTS retry limit

2016-11-19 Thread Stefan Sperling
The RTS retry limit we inherited from Linux seems insanely high.

It seems to be the cause for "bursty" pings and high latency for
smaller packets while larger packets from TCP streams are stuck
in the Tx queue:

 64 bytes from 192.168.1.12: icmp_seq=84 ttl=251 time=380.203 ms
 64 bytes from 192.168.1.12: icmp_seq=85 ttl=251 time=710.714 ms
 64 bytes from 192.168.1.12: icmp_seq=86 ttl=251 time=279.594 ms
 64 bytes from 192.168.1.12: icmp_seq=87 ttl=251 time=893.879 ms
 64 bytes from 192.168.1.12: icmp_seq=88 ttl=251 time=34800.236 ms
 64 bytes from 192.168.1.12: icmp_seq=89 ttl=251 time=33815.364 ms
 64 bytes from 192.168.1.12: icmp_seq=90 ttl=251 time=32824.247 ms
 64 bytes from 192.168.1.12: icmp_seq=91 ttl=251 time=31822.355 ms
 64 bytes from 192.168.1.12: icmp_seq=92 ttl=251 time=30817.395 ms
 64 bytes from 192.168.1.12: icmp_seq=93 ttl=251 time=29822.478 ms
 64 bytes from 192.168.1.12: icmp_seq=94 ttl=251 time=28817.508 ms

With this diff, while in bad channel conditions, instead of the above
I am seeing 22% lost ping packets, and reasonable latency for those
packets which make it through.

SSH into the machine is even possible (yet rather unusable), whereas
before it didn't work at all.

ok?

Index: if_iwmreg.h
===
RCS file: /cvs/src/sys/dev/pci/if_iwmreg.h,v
retrieving revision 1.19
diff -u -p -r1.19 if_iwmreg.h
--- if_iwmreg.h 20 Sep 2016 11:46:09 -  1.19
+++ if_iwmreg.h 19 Nov 2016 16:36:21 -
@@ -4268,7 +4268,7 @@ struct iwm_lq_cmd {
  */
 #define IWM_DEFAULT_TX_RETRY   15
 #define IWM_MGMT_DFAULT_RETRY_LIMIT3
-#define IWM_RTS_DFAULT_RETRY_LIMIT 60
+#define IWM_RTS_DFAULT_RETRY_LIMIT 3
 #define IWM_BAR_DFAULT_RETRY_LIMIT 60
 #define IWM_LOW_RETRY_LIMIT7