Re: [PATCH net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock

Santosh Shilimkar Wed, 08 Aug 2018 14:52:15 -0700

On 8/8/2018 1:57 PM, Sowmini Varadhan wrote:

The following deadlock, reported by syzbot, can occur if CPU0 is in
rds_send_remove_from_sock() while CPU1 is in rds_clear_recv_queue()


        CPU0                    CPU1
        ----                    ----
   lock(&(&rm->m_rs_lock)->rlock);
                                lock(&rs->rs_recv_lock);
                                lock(&(&rm->m_rs_lock)->rlock);
   lock(&rs->rs_recv_lock);

The deadlock should be avoided by moving the messages from the
rs_recv_queue into a tmp_list in rds_clear_recv_queue() under
the rs_recv_lock, and then dropping the refcnt on the messages
in the tmp_list (potentially resulting in rds_message_purge())
after dropping the rs_recv_lock.

The same lock hierarchy violation also exists in rds_still_queued()
and should be avoided in a similar manner

Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Reported-by: syzbot+52140d69ac6dc6b92...@syzkaller.appspotmail.com
---

This bug doesn't make sense since two different transports are using
same socket (Loop and rds_tcp) and running together.
For same transport, such race can't happen with MSG_ON_SOCK flag.
CPU1-> rds_loop_inc_free
CPU0 -> rds_tcp_cork ...

I need to understand this test better.

Regards,
Santosh

Re: [PATCH net-next] rds: avoid lock hierarchy violation between m_rs_lock and rs_recv_lock

Reply via email to