On 2019/6/14 20:27, Eric Dumazet wrote:
> On Fri, Jun 14, 2019 at 2:35 AM maowenan <maowe...@huawei.com> wrote:
>>
>>
>>
>> On 2019/6/14 12:28, Eric Dumazet wrote:
>>>
>>>
>>> On 6/13/19 9:19 PM, maowenan wrote:
>>>>
>>>>
>>>> @Eric, for this issue I only want to check TCP_NEW_SYN_RECV sk, is it OK 
>>>> like below?
>>>>  +       if (!osk && sk->sk_state == TCP_NEW_SYN_RECV)
>>>>  +               reqsk = __inet_lookup_established(sock_net(sk), 
>>>> &tcp_hashinfo,
>>>>  +                                                       sk->sk_daddr, 
>>>> sk->sk_dport,
>>>>  +                                                       sk->sk_rcv_saddr, 
>>>> sk->sk_num,
>>>>  +                                                       
>>>> sk->sk_bound_dev_if, sk->sk_bound_dev_if);
>>>>  +       if (unlikely(reqsk)) {
>>>>
>>>
>>> Not enough.
>>>
>>> If we have many cpus here, there is a chance another cpu has inserted a 
>>> request socket, then
>>> replaced it by an ESTABLISH socket for the same 4-tuple.
>>
>> I try to get more clear about the scene you mentioned. And I have do some 
>> testing about this, it can work well
>> when I use multiple cpus.
>>
>> The ESTABLISH socket would be from 
>> tcp_check_req->tcp_v4_syn_recv_sock->tcp_create_openreq_child,
>> and for this path, inet_ehash_nolisten pass osk(NOT NULL), my patch won't 
>> call __inet_lookup_established in inet_ehash_insert().
>>
>> When TCP_NEW_SYN_RECV socket try to inset to hash table, it will pass osk 
>> with NULL, my patch will check whether reqsk existed
>> in hash table or not. If reqsk is existed, it just removes this reqsk and 
>> dose not insert to hash table. Then the synack for this
>> reqsk can't be sent to client, and there is no chance to receive the ack 
>> from client, so ESTABLISH socket can't be replaced in hash table.
>>
>> So I don't see the race when there are many cpus. Can you show me some clue?
> 
> This is a bit silly.
> You focus on some crash you got on a given system, but do not see the real 
> bug.
> 
> 
> CPU A
> 
> SYN packet
>  lookup finds nothing.
>  Create a NEW_SYN_RECV
>  <long delay, like hardware interrupts calling some buggy driver or something>

I agree that this is a special case.
I propose one point about the sequence of synack, if two synack with two 
different
sequence since the time elapse 64ns, this issue disappear.

tcp_conn_request->tcp_v4_init_seq->secure_tcp_seq->seq_scale
static u32 seq_scale(u32 seq)
{
        /*
         *      As close as possible to RFC 793, which
         *      suggests using a 250 kHz clock.
         *      Further reading shows this assumes 2 Mb/s networks.
         *      For 10 Mb/s Ethernet, a 1 MHz clock is appropriate.
         *      For 10 Gb/s Ethernet, a 1 GHz clock should be ok, but
         *      we also need to limit the resolution so that the u32 seq
         *      overlaps less than one time per MSL (2 minutes).
         *      Choosing a clock of 64 ns period is OK. (period of 274 s)
         */
        return seq + (ktime_get_real_ns() >> 6);
}

So if the long delay larger than 64ns, the seq is difference.


> 
>              CPU B
>              SYN packet
>                -> inserts a NEW_SYN_RECV  sends a SYNACK
>              ACK packet
>              -> replaces the NEW_SYN_RECV by ESTABLISH socket
> 
> CPU A resumes.
>     Basically a lookup (after taking the bucket spinlock) could either find :
>    - Nothing (typical case where there was no race)
>    -  A NEW_SYN_RECV
>    -  A ESTABLISHED socket
>   - A TIME_WAIT socket.
> 
> You can not simply fix the "NEW_SYN_RECV" state case, and possibly add
> hard crashes (instead of current situation leading to RST packets)
> 
> .
> 

Reply via email to