Re: [PATCH] function dispatch should return if mds session does not exist

2019-10-14 Thread Jeff Layton
On Mon, 2019-10-14 at 17:00 +0800, Yanhu Cao wrote:
> we shouldn't call ceph_msg_put, otherwise libceph will pass
> invalid pointer to mm.
> 
> kernel panic - not syncing: fatal exception
> [5452201.213885] [ cut here ]
> [5452201.213889] kernel BUG at mm/slub.c:3901!
> [5452201.213938] invalid opcode:  [#1] SMP PTI
> [5452201.213971] CPU: 35 PID: 3037447 Comm: kworker/35:1 Kdump: loaded 
> Not tainted 4.19.15 #1
> [5452201.214020] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 
> Gen9, BIOS P89 01/22/2018
> [5452201.214088] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [5452201.214129] RIP: 0010:kfree+0x15b/0x170
> [5452201.214156] Code: 8b 02 f6 c4 80 75 08 49 8b 42 08 a8 01 74 1b 49 8b 
> 02 31 f6 f6 c4 80 74 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 95 03 f9 ff 
> <0f> 0b 48 83 e8 01 e9 01 ff ff ff 49 83 ea 01 e9 e9 fe ff ff 90 0f
> [5452201.214262] RSP: 0018:b8c3a0607cb0 EFLAGS: 00010246
> [5452201.214296] RAX: eee84008 RBX: 9130c000 RCX: 
> 80200016
> [5452201.214339] RDX: 6f0ec000 RSI:  RDI: 
> 9130c000
> [5452201.214383] RBP: 91107f823970 R08: 0001 R09: 
> 
> [5452201.214426] R10: eee84000 R11: 0001 R12: 
> c076c45d
> [5452201.214469] R13: 91107f823970 R14: 91107f8239e0 R15: 
> 91107f823900
> [5452201.214513] FS:  () GS:9110bfbc() 
> knlGS:
> [5452201.214562] CS:  0010 DS:  ES:  CR0: 80050033
> [5452201.214598] CR2: 55993ab29620 CR3: 003a1e00a003 CR4: 
> 003606e0
> [5452201.214641] DR0:  DR1:  DR2: 
> 
> [5452201.214685] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [5452201.214728] Call Trace:
> [5452201.214759]  ceph_msg_release+0x15d/0x190 [libceph]
> [5452201.214811]  dispatch+0x66/0xa50 [ceph]
> [5452201.214846]  try_read+0x7f3/0x11d0 [libceph]
> [5452201.214878]  ? dequeue_entity+0x37e/0x7e0
> [5452201.214907]  ? pick_next_task_fair+0x291/0x610
> [5452201.214937]  ? dequeue_task_fair+0x5d/0x700
> [5452201.214966]  ? __switch_to+0x8c/0x470
> [5452201.214999]  ceph_con_workfn+0xa2/0x5b0 [libceph]
> [5452201.215033]  process_one_work+0x16b/0x370
> [5452201.215062]  worker_thread+0x49/0x3f0
> [5452201.215089]  kthread+0xf5/0x130
> [5452201.215112]  ? max_active_store+0x80/0x80
> [5452201.215139]  ? kthread_bind+0x10/0x10
> [5452201.215167]  ret_from_fork+0x1f/0x30
> 
> Link: https://tracker.ceph.com/issues/42288
> 
> Signed-off-by: Yanhu Cao 
> ---
>  fs/ceph/mds_client.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index a8a8f84f3bbf..066358fea347 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -4635,7 +4635,7 @@ static void dispatch(struct ceph_connection *con, 
> struct ceph_msg *msg)
>   mutex_lock(>mutex);
>   if (__verify_registered_session(mdsc, s) < 0) {
>   mutex_unlock(>mutex);
> - goto out;
> + return;
>   }
>   mutex_unlock(>mutex);
>  
> @@ -4672,7 +4672,6 @@ static void dispatch(struct ceph_connection *con, 
> struct ceph_msg *msg)
>   pr_err("received unknown message type %d %s\n", type,
>  ceph_msg_type_name(type));
>   }
> -out:
>   ceph_msg_put(msg);
>  }
>  

This doesn't look quite right. We hold a msg reference here. If we don't
call ceph_msg_put on it then that reference will leak.

Do you know which pointer it was trying to kfree at the time of the
crash? What's your rationale for thinking that this is the problem?

Thanks,
-- 
Jeff Layton 



Re: [PATCH] function dispatch should return if mds session does not exist

2019-10-14 Thread Ilya Dryomov
On Mon, Oct 14, 2019 at 11:01 AM Yanhu Cao  wrote:
>
> we shouldn't call ceph_msg_put, otherwise libceph will pass
> invalid pointer to mm.
>
> kernel panic - not syncing: fatal exception
> [5452201.213885] [ cut here ]
> [5452201.213889] kernel BUG at mm/slub.c:3901!
> [5452201.213938] invalid opcode:  [#1] SMP PTI
> [5452201.213971] CPU: 35 PID: 3037447 Comm: kworker/35:1 Kdump: loaded 
> Not tainted 4.19.15 #1
> [5452201.214020] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 
> Gen9, BIOS P89 01/22/2018
> [5452201.214088] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [5452201.214129] RIP: 0010:kfree+0x15b/0x170
> [5452201.214156] Code: 8b 02 f6 c4 80 75 08 49 8b 42 08 a8 01 74 1b 49 8b 
> 02 31 f6 f6 c4 80 74 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 95 03 f9 ff 
> <0f> 0b 48 83 e8 01 e9 01 ff ff ff 49 83 ea 01 e9 e9 fe ff ff 90 0f
> [5452201.214262] RSP: 0018:b8c3a0607cb0 EFLAGS: 00010246
> [5452201.214296] RAX: eee84008 RBX: 9130c000 RCX: 
> 80200016
> [5452201.214339] RDX: 6f0ec000 RSI:  RDI: 
> 9130c000
> [5452201.214383] RBP: 91107f823970 R08: 0001 R09: 
> 
> [5452201.214426] R10: eee84000 R11: 0001 R12: 
> c076c45d
> [5452201.214469] R13: 91107f823970 R14: 91107f8239e0 R15: 
> 91107f823900
> [5452201.214513] FS:  () GS:9110bfbc() 
> knlGS:
> [5452201.214562] CS:  0010 DS:  ES:  CR0: 80050033
> [5452201.214598] CR2: 55993ab29620 CR3: 003a1e00a003 CR4: 
> 003606e0
> [5452201.214641] DR0:  DR1:  DR2: 
> 
> [5452201.214685] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [5452201.214728] Call Trace:
> [5452201.214759]  ceph_msg_release+0x15d/0x190 [libceph]
> [5452201.214811]  dispatch+0x66/0xa50 [ceph]
> [5452201.214846]  try_read+0x7f3/0x11d0 [libceph]
> [5452201.214878]  ? dequeue_entity+0x37e/0x7e0
> [5452201.214907]  ? pick_next_task_fair+0x291/0x610
> [5452201.214937]  ? dequeue_task_fair+0x5d/0x700
> [5452201.214966]  ? __switch_to+0x8c/0x470
> [5452201.214999]  ceph_con_workfn+0xa2/0x5b0 [libceph]
> [5452201.215033]  process_one_work+0x16b/0x370
> [5452201.215062]  worker_thread+0x49/0x3f0
> [5452201.215089]  kthread+0xf5/0x130
> [5452201.215112]  ? max_active_store+0x80/0x80
> [5452201.215139]  ? kthread_bind+0x10/0x10
> [5452201.215167]  ret_from_fork+0x1f/0x30
>
> Link: https://tracker.ceph.com/issues/42288
>
> Signed-off-by: Yanhu Cao 
> ---
>  fs/ceph/mds_client.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index a8a8f84f3bbf..066358fea347 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -4635,7 +4635,7 @@ static void dispatch(struct ceph_connection *con, 
> struct ceph_msg *msg)
> mutex_lock(>mutex);
> if (__verify_registered_session(mdsc, s) < 0) {
> mutex_unlock(>mutex);
> -   goto out;
> +   return;
> }
> mutex_unlock(>mutex);
>
> @@ -4672,7 +4672,6 @@ static void dispatch(struct ceph_connection *con, 
> struct ceph_msg *msg)
> pr_err("received unknown message type %d %s\n", type,
>ceph_msg_type_name(type));
> }
> -out:
> ceph_msg_put(msg);
>  }
>

Hi Yanhu,

This doesn't look right to me.  The messenger hands its reference to
the dispatch function, the dispatch function is responsible for putting
it.  Even if the session isn't registered, the message should still be
valid and should still be freed.  The bug is somewhere else...

Thanks,

Ilya