Re: [ceph-users] Panic in kernel CephFS client after kernel update

2019-10-05 Thread Kenneth Van Alstyne
Thanks!  I’ll remove my patch from my local build of the 4.19 kernel and 
upgrade to 4.19.77.  Appreciate the quick fix.

Thanks,

--
Kenneth Van Alstyne
Systems Architect
M: 228.547.8045
15052 Conference Center Dr, Chantilly, VA 20151
perspecta

On Oct 5, 2019, at 7:29 AM, Ilya Dryomov 
mailto:idryo...@gmail.com>> wrote:

On Tue, Oct 1, 2019 at 9:12 PM Jeff Layton 
mailto:jlay...@kernel.org>> wrote:

On Tue, 2019-10-01 at 15:04 -0400, Sasha Levin wrote:
On Tue, Oct 01, 2019 at 01:54:45PM -0400, Jeff Layton wrote:
On Tue, 2019-10-01 at 19:03 +0200, Ilya Dryomov wrote:
On Tue, Oct 1, 2019 at 6:41 PM Kenneth Van Alstyne
mailto:kvanalst...@knightpoint.com>> wrote:
All:
I’m not sure this should go to LKML or here, but I’ll start here.  After 
upgrading from Linux kernel 4.19.60 to 4.19.75 (or 76), I started running into 
kernel panics in the “ceph” module.  Based on the call trace, I believe I was 
able to narrow it down to the following commit in the Linux kernel 4.19 source 
tree:

commit 81281039a673d30f9d04d38659030a28051a
Author: Yan, Zheng mailto:z...@redhat.com>>
Date:   Sun Jun 2 09:45:38 2019 +0800

   ceph: use ceph_evict_inode to cleanup inode's resource

   [ Upstream commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]

   remove_session_caps() relies on __wait_on_freeing_inode(), to wait for
   freeing inode to remove its caps. But VFS wakes freeing inode waiters
   before calling destroy_inode().

   Cc: sta...@vger.kernel.org
   Link: https://tracker.ceph.com/issues/40102
   Signed-off-by: "Yan, Zheng" mailto:z...@redhat.com>>
   Reviewed-by: Jeff Layton mailto:jlay...@redhat.com>>
   Signed-off-by: Ilya Dryomov mailto:idryo...@gmail.com>>
   Signed-off-by: Sasha Levin mailto:sas...@kernel.org>>


Backing this patch out and recompiling my kernel has since resolved my issues 
(as far as I can tell thus far).  The issue was fairly easy to create by simply 
creating and deleting files.  I tested using ‘dd’ and was pretty consistently 
able to reproduce the issue. Since the issue occurred in a VM, I do have a 
screenshot of the crashed machine and to avoid attaching an image, I’ll link to 
where they are:  http://kvanals.kvanals.org/.ceph_kernel_panic_images/

Am I way off base or has anyone else run into this issue?

Hi Kenneth,

This might be a botched backport.  The first version of this patch had
a conflict with Al's change that introduced ceph_free_inode() and Zheng
had to adjust it for that.  However, it looks like it has been taken to
4.19 verbatim, even though 4.19 does not have ceph_free_inode().

Zheng, Jeff, please take a look ASAP.


(Sorry for the resend -- I got Sasha's old addr)

Thanks Ilya,

I think you're right -- this patch should not have been merged on any
pre-5.2 kernels. We should go ahead and revert this for now, and do a
one-off backport for v4.19.

Sasha, what do we need to do to make that happen?

I think the easiest would be to just revert the broken one and apply a
clean backport which you'll send me?


Thanks, Sasha. You can revert the old patch as soon as you're ready.
It'll take me a bit to put together and test a proper backport, but
I'll try to have something ready within the next day or so.

Kenneth, this is now fixed in 4.19.77.  Thanks for the report!

   Ilya

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Panic in kernel CephFS client after kernel update

2019-10-05 Thread Ilya Dryomov
On Tue, Oct 1, 2019 at 9:12 PM Jeff Layton  wrote:
>
> On Tue, 2019-10-01 at 15:04 -0400, Sasha Levin wrote:
> > On Tue, Oct 01, 2019 at 01:54:45PM -0400, Jeff Layton wrote:
> > > On Tue, 2019-10-01 at 19:03 +0200, Ilya Dryomov wrote:
> > > > On Tue, Oct 1, 2019 at 6:41 PM Kenneth Van Alstyne
> > > >  wrote:
> > > > > All:
> > > > > I’m not sure this should go to LKML or here, but I’ll start here.  
> > > > > After upgrading from Linux kernel 4.19.60 to 4.19.75 (or 76), I 
> > > > > started running into kernel panics in the “ceph” module.  Based on 
> > > > > the call trace, I believe I was able to narrow it down to the 
> > > > > following commit in the Linux kernel 4.19 source tree:
> > > > >
> > > > > commit 81281039a673d30f9d04d38659030a28051a
> > > > > Author: Yan, Zheng 
> > > > > Date:   Sun Jun 2 09:45:38 2019 +0800
> > > > >
> > > > > ceph: use ceph_evict_inode to cleanup inode's resource
> > > > >
> > > > > [ Upstream commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]
> > > > >
> > > > > remove_session_caps() relies on __wait_on_freeing_inode(), to 
> > > > > wait for
> > > > > freeing inode to remove its caps. But VFS wakes freeing inode 
> > > > > waiters
> > > > > before calling destroy_inode().
> > > > >
> > > > > Cc: sta...@vger.kernel.org
> > > > > Link: https://tracker.ceph.com/issues/40102
> > > > > Signed-off-by: "Yan, Zheng" 
> > > > > Reviewed-by: Jeff Layton 
> > > > > Signed-off-by: Ilya Dryomov 
> > > > > Signed-off-by: Sasha Levin 
> > > > >
> > > > >
> > > > > Backing this patch out and recompiling my kernel has since resolved 
> > > > > my issues (as far as I can tell thus far).  The issue was fairly easy 
> > > > > to create by simply creating and deleting files.  I tested using ‘dd’ 
> > > > > and was pretty consistently able to reproduce the issue. Since the 
> > > > > issue occurred in a VM, I do have a screenshot of the crashed machine 
> > > > > and to avoid attaching an image, I’ll link to where they are:  
> > > > > http://kvanals.kvanals.org/.ceph_kernel_panic_images/
> > > > >
> > > > > Am I way off base or has anyone else run into this issue?
> > > >
> > > > Hi Kenneth,
> > > >
> > > > This might be a botched backport.  The first version of this patch had
> > > > a conflict with Al's change that introduced ceph_free_inode() and Zheng
> > > > had to adjust it for that.  However, it looks like it has been taken to
> > > > 4.19 verbatim, even though 4.19 does not have ceph_free_inode().
> > > >
> > > > Zheng, Jeff, please take a look ASAP.
> > > >
> > >
> > > (Sorry for the resend -- I got Sasha's old addr)
> > >
> > > Thanks Ilya,
> > >
> > > I think you're right -- this patch should not have been merged on any
> > > pre-5.2 kernels. We should go ahead and revert this for now, and do a
> > > one-off backport for v4.19.
> > >
> > > Sasha, what do we need to do to make that happen?
> >
> > I think the easiest would be to just revert the broken one and apply a
> > clean backport which you'll send me?
> >
>
> Thanks, Sasha. You can revert the old patch as soon as you're ready.
> It'll take me a bit to put together and test a proper backport, but
> I'll try to have something ready within the next day or so.

Kenneth, this is now fixed in 4.19.77.  Thanks for the report!

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Panic in kernel CephFS client after kernel update

2019-10-01 Thread Ilya Dryomov
On Tue, Oct 1, 2019 at 6:41 PM Kenneth Van Alstyne
 wrote:
>
> All:
> I’m not sure this should go to LKML or here, but I’ll start here.  After 
> upgrading from Linux kernel 4.19.60 to 4.19.75 (or 76), I started running 
> into kernel panics in the “ceph” module.  Based on the call trace, I believe 
> I was able to narrow it down to the following commit in the Linux kernel 4.19 
> source tree:
>
> commit 81281039a673d30f9d04d38659030a28051a
> Author: Yan, Zheng 
> Date:   Sun Jun 2 09:45:38 2019 +0800
>
> ceph: use ceph_evict_inode to cleanup inode's resource
>
> [ Upstream commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]
>
> remove_session_caps() relies on __wait_on_freeing_inode(), to wait for
> freeing inode to remove its caps. But VFS wakes freeing inode waiters
> before calling destroy_inode().
>
> Cc: sta...@vger.kernel.org
> Link: https://tracker.ceph.com/issues/40102
> Signed-off-by: "Yan, Zheng" 
> Reviewed-by: Jeff Layton 
> Signed-off-by: Ilya Dryomov 
> Signed-off-by: Sasha Levin 
>
>
> Backing this patch out and recompiling my kernel has since resolved my issues 
> (as far as I can tell thus far).  The issue was fairly easy to create by 
> simply creating and deleting files.  I tested using ‘dd’ and was pretty 
> consistently able to reproduce the issue. Since the issue occurred in a VM, I 
> do have a screenshot of the crashed machine and to avoid attaching an image, 
> I’ll link to where they are:  
> http://kvanals.kvanals.org/.ceph_kernel_panic_images/
>
> Am I way off base or has anyone else run into this issue?

Hi Kenneth,

This might be a botched backport.  The first version of this patch had
a conflict with Al's change that introduced ceph_free_inode() and Zheng
had to adjust it for that.  However, it looks like it has been taken to
4.19 verbatim, even though 4.19 does not have ceph_free_inode().

Zheng, Jeff, please take a look ASAP.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Panic in kernel CephFS client after kernel update

2019-10-01 Thread Kenneth Van Alstyne
All:
I’m not sure this should go to LKML or here, but I’ll start here.  After 
upgrading from Linux kernel 4.19.60 to 4.19.75 (or 76), I started running into 
kernel panics in the “ceph” module.  Based on the call trace, I believe I was 
able to narrow it down to the following commit in the Linux kernel 4.19 source 
tree:

commit 81281039a673d30f9d04d38659030a28051a
Author: Yan, Zheng mailto:z...@redhat.com>>
Date:   Sun Jun 2 09:45:38 2019 +0800

ceph: use ceph_evict_inode to cleanup inode's resource

[ Upstream commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]

remove_session_caps() relies on __wait_on_freeing_inode(), to wait for
freeing inode to remove its caps. But VFS wakes freeing inode waiters
before calling destroy_inode().

Cc: sta...@vger.kernel.org
Link: https://tracker.ceph.com/issues/40102
Signed-off-by: "Yan, Zheng" mailto:z...@redhat.com>>
Reviewed-by: Jeff Layton mailto:jlay...@redhat.com>>
Signed-off-by: Ilya Dryomov mailto:idryo...@gmail.com>>
Signed-off-by: Sasha Levin mailto:sas...@kernel.org>>


Backing this patch out and recompiling my kernel has since resolved my issues 
(as far as I can tell thus far).  The issue was fairly easy to create by simply 
creating and deleting files.  I tested using ‘dd’ and was pretty consistently 
able to reproduce the issue. Since the issue occurred in a VM, I do have a 
screenshot of the crashed machine and to avoid attaching an image, I’ll link to 
where they are:  http://kvanals.kvanals.org/.ceph_kernel_panic_images/

Am I way off base or has anyone else run into this issue?

Thanks,

--
Kenneth Van Alstyne
Systems Architect
M: 228.547.8045
15052 Conference Center Dr, Chantilly, VA 20151
perspecta

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com