On 10/15/2015 11:15 AM, Matan Barak wrote:
> 
> 
> On 10/12/2015 7:37 PM, Hefty, Sean wrote:
>>> ib_send_cm_sidr_rep could sometimes erase the node from the sidr
>>> (depending on errors in the process). Since ib_send_cm_sidr_rep is
>>> called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
>>
>> This should clarify that it is the app calling from the callback, and
>> not a direct call from the cm_sidr_req_handler.
>>
> 
> Consider the following error flows:
> 
> Double free:
> cm_sidr_req_handler:3156->cm_reject_sidr_req:663->ib_send_cm_sidr_rep:3233->erase_rb
> 
> cm_sidr_req_handler:3173->ib_destroy_cm_id->cm_destroy_id:846->ib_send_cm_sidr_rep:3233->erase_rb
> 
> 
> RB contains free node:
> cm_sidr_req_handler:3156->cm_reject_sidr_req:663->ib_send_cm_sidr_rep->returns
> error(for example cm_alloc_msg,3219)
> 
> cm_sidr_req_handler:3173->ib_destroy_cm_id->cm_destroy_id:846->cm_reject_sidr_req->cm_reject_sidr_req:663->returns
> 
> error(for example cm_alloc_msg,3219)->RB wasn't erased but memory is
> freed :910 kfree(cm_id_priv)
> 
> 
>>> could be either erased from the rb_tree twice or not erased at all.
>>
>> In an error case, I can see why it would be left in the rbtree, but I
>> don't see how it can be removed twice.
>>
>>
>>> Fixing that by making sure it's erased only once before freeing
>>> cm_id_priv.
>>>
>>> Fixes: a977049dacde ('[PATCH] IB: Add the kernel CM implementation')
>>> Signed-off-by: Doron Tsur <dor...@mellanox.com>
>>> Signed-off-by: Matan Barak <mat...@mellanox.com>
>>> ---
>>>
>>> Hi Doug,
>>> This patch fixes a bug in the CM. In some flow, rb-tree could be
>>> freed twice or used after it was freed. This bug was picked by
>>> our regression tests and this fix was verified.
>>>
>>> Thanks,
>>> Doron and Matan
>>>
>>>   drivers/infiniband/core/cm.c | 10 +++++++++-
>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
>>> index f5cf1c4..56ff0f3 100644
>>> --- a/drivers/infiniband/core/cm.c
>>> +++ b/drivers/infiniband/core/cm.c
>>> @@ -844,6 +844,11 @@ retest:
>>>       case IB_CM_SIDR_REQ_RCVD:
>>>           spin_unlock_irq(&cm_id_priv->lock);
>>>           cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
>>> +        spin_lock_irq(&cm.lock);
>>> +        if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node))
>>> +            rb_erase(&cm_id_priv->sidr_id_node,
>>> +                 &cm.remote_sidr_table);
>>> +        spin_unlock_irq(&cm.lock);
> 
> This change seeks to remove the about to be freed node from the rb tree,
> while verifying it has not been freed already
> 
>>
>> We should be able to use a return value from cm_reject_sidr_req() --
>> passed through from ib_send_cm_sidr_rep() to determine if the id was
>> removed from the tree.
>>
> 
> But this won't protect from double free in ib_send_cm_sidr_rep, unless
> we pass this parameter to the cm destroy function, but this alternative
> is cumbersome.
> 
>>>           break;
>>>       case IB_CM_REQ_SENT:
>>>       case IB_CM_MRA_REQ_RCVD:
>>> @@ -3210,7 +3215,10 @@ int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
>>>       spin_unlock_irqrestore(&cm_id_priv->lock, flags);
>>>
>>>       spin_lock_irqsave(&cm.lock, flags);
>>> -    rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
>>> +    if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node)) {
>>> +        rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
>>> +        RB_CLEAR_NODE(&cm_id_priv->sidr_id_node);
>>> +    }
> 
> This change protects against double free
> 
>>>       spin_unlock_irqrestore(&cm.lock, flags);
>>
>> Something is very wrong in this function if the id is not in the tree
>> at this point.
>>
> 
> We agree, but there's an error flow that triggers this behavior.

Sean, I need to close on this patch.  What is your position after
Matan's explanation?

-- 
Doug Ledford <dledf...@redhat.com>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to