Re: SACK compression patch causing performance drop

2018-11-10 Thread Eric Dumazet



On 11/08/2018 07:14 AM, Eric Dumazet wrote:
> 
> 
> On 11/08/2018 12:23 AM, Jean-Louis Dupond wrote:
>> Hi,
>>
>> Was somebody able to check this?
>> Really think this should be fixed :)
>>
>> Thanks
>> Jean-Louis
>>
> 
> I somehow missed this email.
> 
> Packet captures might help, please send me
> 
> tcpdump -s 128 -i ethX -w sack.pcap
> 
> of some samples with or without the sack compression enabled.

Thanks for the traces.

It seems one of your host (A in following traces) is advertising SACK support
but does not really react to incoming SACK information. (It only reacts to the 
ACK number)

Only receiving consecutive DUPACK triggers a fast retransmit.

With Sack compression enabled, we can see sender resorts to RTO for recovery

01:04:20.609980 IP A > B: Flags [.], seq 133901705:133946445, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 44740
01:04:20.610016 IP A > B: Flags [.], seq 133946445:133963069, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 16624
01:04:20.610027 IP A > B: Flags [.], seq 133963069:133977845, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 14776
01:04:20.610035 IP A > B: Flags [.], seq 133977845:133986793, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 8948
01:04:20.610055 IP B > A: Flags [.], ack 133812253, win 51011, options 
[nop,nop,TS val 2206848846 ecr 1112736761,nop,nop,sack 1 
{133946445:133986793}], length 0
01:04:20.610074 IP A > B: Flags [.], seq 133986793:134016297, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 29504
01:04:20.610118 IP B > A: Flags [.], ack 133812253, win 51011, options 
[nop,nop,TS val 2206848846 ecr 1112736761,nop,nop,sack 1 
{133946445:134016297}], length 0
01:04:20.819665 IP A > B: Flags [.], seq 133812253:133821201, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 8948
01:04:20.819767 IP B > A: Flags [.], ack 133821201, win 48774, options 
[nop,nop,TS val 2206849056 ecr 1112736761,nop,nop,sack 1 
{133946445:134016297}], length 0
01:04:20.819924 IP A > B: Flags [.], seq 133821201:133839097, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 17896
01:04:20.819966 IP B > A: Flags [.], ack 133839097, win 44300, options 
[nop,nop,TS val 2206849056 ecr 1112736761,nop,nop,sack 1 
{133946445:134016297}], length 0
01:04:20.820134 IP A > B: Flags [.], seq 133839097:133874889, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 35792
01:04:20.820185 IP B > A: Flags [.], ack 133874889, win 40896, options 
[nop,nop,TS val 2206849056 ecr 1112736761,nop,nop,sack 1 
{133946445:134016297}], length 0
01:04:20.820296 IP A > B: Flags [.], seq 133874889:133901733, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 26844
01:04:20.820327 IP B > A: Flags [.], ack 133901733, win 40896, options 
[nop,nop,TS val 2206849057 ecr 1112736761,nop,nop,sack 1 
{133946445:134016297}], length 0
01:04:20.820346 IP A > B: Flags [.], seq 133901733:133946473, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 44740
01:04:20.820430 IP B > A: Flags [.], ack 134016297, win 51011, options 
[nop,nop,TS val 2206849057 ecr 1112736761,nop,nop,sack 1 
{133946445:133946473}], length 0
01:04:20.820452 IP A > B: Flags [.], seq 133901733:133946473, ack 7728, win 
32768, options [nop,nop,TS val 1112736761 ecr 2206848846], length 44740
01:04:20.820462 IP B > A: Flags [.], ack 134016297, win 51011, options 
[nop,nop,TS val 2206849057 ecr 1112736761,nop,nop,sack 1 
{133901733:133946473}], length 0


Sack compression disabled, no RTO is triggered, fast rtx are working "properly".

00:52:22.329658 IP A > B: Flags [.], seq 2360149:2369069, ack 1680, win 32768, 
options [nop,nop,TS val 1112018058 ecr 1636504300], length 8920
00:52:22.329666 IP A > B: Flags [.], seq 2369069:2378045, ack 1680, win 32768, 
options [nop,nop,TS val 1112018058 ecr 1636504300], length 8976
00:52:22.329694 IP B > A: Flags [.], ack 2280125, win 51011, options 
[nop,nop,TS val 1636504300 ecr 1112018058], length 0
00:52:22.329716 IP A > B: Flags [.], seq 2378045:2427265, ack 1680, win 32768, 
options [nop,nop,TS val 1112018058 ecr 1636504300], length 49220
00:52:22.329723 IP A > B: Flags [P.], seq 2427265:2433601, ack 1680, win 32768, 
options [nop,nop,TS val 1112018058 ecr 1636504300], length 6336
00:52:22.329728 IP A > B: Flags [.], seq 2433601:2442549, ack 1680, win 32768, 
options [nop,nop,TS val 1112018058 ecr 1636504300], length 8948
00:52:22.329735 IP B > A: Flags [P.], seq 1680:1728, ack 2280125, win 51011, 
options [nop,nop,TS val 1636504300 ecr 1112018058], length 48
00:52:22.329749 IP B > A: Flags [.], ack 2280125, win 51011, options 
[nop,nop,TS val 1636504300 ecr 1112018058,nop,nop,sack 1 {2378045:2427265}], 
length 0
00:52:22.329757 IP B > A: Flags [.], ack 2280125, win 51011, options 
[nop,nop,TS 

Re: SACK compression patch causing performance drop

2018-11-08 Thread Eric Dumazet



On 11/08/2018 12:23 AM, Jean-Louis Dupond wrote:
> Hi,
> 
> Was somebody able to check this?
> Really think this should be fixed :)
> 
> Thanks
> Jean-Louis
> 

I somehow missed this email.

Packet captures might help, please send me

tcpdump -s 128 -i ethX -w sack.pcap

of some samples with or without the sack compression enabled.

> On 3/11/18 16:59, Jean-Louis Dupond wrote:
>> Hi All,
>>
>> On recent kernels we noticed a way lower throughput to our SAN system than 
>> before.
>> While on pre 4.18 kernels we had 400-700MB/sec read speed, on 4.18+ we only 
>> had 70-120MB/sec.
>>
>> The SAN is connected via iSCSI over a 10G network (ixgbe/X520 NICS if it 
>> matters).
>>
>> After some debugging, I tried to bisect between 4.17 and 4.18 to see what 
>> commit caused the slowdown.
>> It showed that the addition of the SACK compression 
>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e)
>>  was the cause.
>>
>> And indeed, if I set net.ipv4.tcp_comp_sack_nr to 0 on 4.19 for example, the 
>> throughput is (almost) back to normal again.
>> So it seems like this change causes quite some performance issues.
>>
>> Any ideas?
>>
>> Thanks
>> Jean-Louis
>>


Re: SACK compression patch causing performance drop

2018-11-08 Thread Jean-Louis Dupond

Hi,

Was somebody able to check this?
Really think this should be fixed :)

Thanks
Jean-Louis

On 3/11/18 16:59, Jean-Louis Dupond wrote:

Hi All,

On recent kernels we noticed a way lower throughput to our SAN system 
than before.
While on pre 4.18 kernels we had 400-700MB/sec read speed, on 4.18+ we 
only had 70-120MB/sec.


The SAN is connected via iSCSI over a 10G network (ixgbe/X520 NICS if 
it matters).


After some debugging, I tried to bisect between 4.17 and 4.18 to see 
what commit caused the slowdown.
It showed that the addition of the SACK compression 
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e) 
was the cause.


And indeed, if I set net.ipv4.tcp_comp_sack_nr to 0 on 4.19 for 
example, the throughput is (almost) back to normal again.

So it seems like this change causes quite some performance issues.

Any ideas?

Thanks
Jean-Louis



SACK compression patch causing performance drop

2018-11-03 Thread Jean-Louis Dupond

Hi All,

On recent kernels we noticed a way lower throughput to our SAN system 
than before.
While on pre 4.18 kernels we had 400-700MB/sec read speed, on 4.18+ we 
only had 70-120MB/sec.


The SAN is connected via iSCSI over a 10G network (ixgbe/X520 NICS if it 
matters).


After some debugging, I tried to bisect between 4.17 and 4.18 to see 
what commit caused the slowdown.
It showed that the addition of the SACK compression 
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e) 
was the cause.


And indeed, if I set net.ipv4.tcp_comp_sack_nr to 0 on 4.19 for example, 
the throughput is (almost) back to normal again.

So it seems like this change causes quite some performance issues.

Any ideas?

Thanks
Jean-Louis