Re: [ceph-users] Crashed MDS (segfault)

2019-10-25 Thread Gustavo Tonini
Well, I coundn't identify which object I need to "rmomapkey" as instructed
in https://tracker.ceph.com/issues/38452#note-12.

This is the log around the crash: https://pastebin.com/muw34Qdc

On Fri, Oct 25, 2019 at 11:27 AM Yan, Zheng  wrote:

> On Fri, Oct 25, 2019 at 9:42 PM Gustavo Tonini 
> wrote:
> >
> > Running "cephfs-data-scan init  --force-init" solved the problem.
> >
> > Then I had to run "cephfs-journal-tool event recover_dentries summary"
> and truncate the journal to fix the corrupted journal.
> >
> > CephFS worked well for approximately 3 hours and then our MDS crashed
> again, apparently due to the bug described at
> https://tracker.ceph.com/issues/38452
> >
>
> does the method in issue #38452 work for you?  if not, please
> debug_mds to 10, and set log around the crash to us
>
>
> Yan, Zheng
>
> > On Wed, Oct 23, 2019, 02:24 Yan, Zheng  wrote:
> >>
> >> On Tue, Oct 22, 2019 at 1:49 AM Gustavo Tonini 
> wrote:
> >> >
> >> > Is there a possibility to lose data if I use "cephfs-data-scan init
> --force-init"?
> >> >
> >>
> >> It only causes incorrect stat on root inode, can't cause data lose.
> >>
> >> running 'ceph daemon mds.a scrub_path / force repair' after mds
> >> restart can fix the incorrect stat.
> >>
> >> > On Mon, Oct 21, 2019 at 4:36 AM Yan, Zheng  wrote:
> >> >>
> >> >> On Fri, Oct 18, 2019 at 9:10 AM Gustavo Tonini <
> gustavoton...@gmail.com> wrote:
> >> >> >
> >> >> > Hi Zheng,
> >> >> > the cluster is running ceph mimic. This warning about network only
> appears when using nautilus' cephfs-journal-tool.
> >> >> >
> >> >> > "cephfs-data-scan scan_links" does not report any issue.
> >> >> >
> >> >> > How could variable "newparent" be NULL at
> https://github.com/ceph/ceph/blob/master/src/mds/SnapRealm.cc#L599 ? Is
> there a way to fix this?
> >> >> >
> >> >>
> >> >>
> >> >> try 'cephfs-data-scan init'. It will setup root inode's snaprealm.
> >> >>
> >> >> > On Thu, Oct 17, 2019 at 9:58 PM Yan, Zheng 
> wrote:
> >> >> >>
> >> >> >> On Thu, Oct 17, 2019 at 10:19 PM Gustavo Tonini <
> gustavoton...@gmail.com> wrote:
> >> >> >> >
> >> >> >> > No. The cluster was just rebalancing.
> >> >> >> >
> >> >> >> > The journal seems damaged:
> >> >> >> >
> >> >> >> > ceph@deployer:~$ cephfs-journal-tool --rank=fs_padrao:0
> journal inspect
> >> >> >> > 2019-10-16 17:46:29.596 7fcd34cbf700 -1 NetHandler
> create_socket couldn't create socket (97) Address family not supported by
> protocol
> >> >> >>
> >> >> >> corrupted journal shouldn't cause error like this. This is more
> like
> >> >> >> network issue. please double check network config of your cluster.
> >> >> >>
> >> >> >> > Overall journal integrity: DAMAGED
> >> >> >> > Corrupt regions:
> >> >> >> > 0x1c5e4d904ab-1c5e4d9ddbc
> >> >> >> > ceph@deployer:~$
> >> >> >> >
> >> >> >> > Could a journal reset help with this?
> >> >> >> >
> >> >> >> > I could snapshot all FS pools and export the journal before to
> guarantee a rollback to this state if something goes wrong with jounal
> reset.
> >> >> >> >
> >> >> >> > On Thu, Oct 17, 2019, 09:07 Yan, Zheng 
> wrote:
> >> >> >> >>
> >> >> >> >> On Tue, Oct 15, 2019 at 12:03 PM Gustavo Tonini <
> gustavoton...@gmail.com> wrote:
> >> >> >> >> >
> >> >> >> >> > Dear ceph users,
> >> >> >> >> > we're experiencing a segfault during MDS startup (replay
> process) which is making our FS inaccessible.
> >> >> >> >> >
> >> >> >> >> > MDS log messages:
> >> >> >> >> >
> >> >> >> >> > Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c08f49700  1 -- 192.168.8.195:6800/3181891717 <== osd.26
> 192.168.8.209:6821/2

Re: [ceph-users] Crashed MDS (segfault)

2019-10-25 Thread Gustavo Tonini
Running "cephfs-data-scan init  --force-init" solved the problem.

Then I had to run "cephfs-journal-tool event recover_dentries summary" and
truncate the journal to fix the corrupted journal.

CephFS worked well for approximately 3 hours and then our MDS crashed
again, apparently due to the bug described at
https://tracker.ceph.com/issues/38452

On Wed, Oct 23, 2019, 02:24 Yan, Zheng  wrote:

> On Tue, Oct 22, 2019 at 1:49 AM Gustavo Tonini 
> wrote:
> >
> > Is there a possibility to lose data if I use "cephfs-data-scan init
> --force-init"?
> >
>
> It only causes incorrect stat on root inode, can't cause data lose.
>
> running 'ceph daemon mds.a scrub_path / force repair' after mds
> restart can fix the incorrect stat.
>
> > On Mon, Oct 21, 2019 at 4:36 AM Yan, Zheng  wrote:
> >>
> >> On Fri, Oct 18, 2019 at 9:10 AM Gustavo Tonini 
> wrote:
> >> >
> >> > Hi Zheng,
> >> > the cluster is running ceph mimic. This warning about network only
> appears when using nautilus' cephfs-journal-tool.
> >> >
> >> > "cephfs-data-scan scan_links" does not report any issue.
> >> >
> >> > How could variable "newparent" be NULL at
> https://github.com/ceph/ceph/blob/master/src/mds/SnapRealm.cc#L599 ? Is
> there a way to fix this?
> >> >
> >>
> >>
> >> try 'cephfs-data-scan init'. It will setup root inode's snaprealm.
> >>
> >> > On Thu, Oct 17, 2019 at 9:58 PM Yan, Zheng  wrote:
> >> >>
> >> >> On Thu, Oct 17, 2019 at 10:19 PM Gustavo Tonini <
> gustavoton...@gmail.com> wrote:
> >> >> >
> >> >> > No. The cluster was just rebalancing.
> >> >> >
> >> >> > The journal seems damaged:
> >> >> >
> >> >> > ceph@deployer:~$ cephfs-journal-tool --rank=fs_padrao:0 journal
> inspect
> >> >> > 2019-10-16 17:46:29.596 7fcd34cbf700 -1 NetHandler create_socket
> couldn't create socket (97) Address family not supported by protocol
> >> >>
> >> >> corrupted journal shouldn't cause error like this. This is more like
> >> >> network issue. please double check network config of your cluster.
> >> >>
> >> >> > Overall journal integrity: DAMAGED
> >> >> > Corrupt regions:
> >> >> > 0x1c5e4d904ab-1c5e4d9ddbc
> >> >> > ceph@deployer:~$
> >> >> >
> >> >> > Could a journal reset help with this?
> >> >> >
> >> >> > I could snapshot all FS pools and export the journal before to
> guarantee a rollback to this state if something goes wrong with jounal
> reset.
> >> >> >
> >> >> > On Thu, Oct 17, 2019, 09:07 Yan, Zheng  wrote:
> >> >> >>
> >> >> >> On Tue, Oct 15, 2019 at 12:03 PM Gustavo Tonini <
> gustavoton...@gmail.com> wrote:
> >> >> >> >
> >> >> >> > Dear ceph users,
> >> >> >> > we're experiencing a segfault during MDS startup (replay
> process) which is making our FS inaccessible.
> >> >> >> >
> >> >> >> > MDS log messages:
> >> >> >> >
> >> >> >> > Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c08f49700  1 -- 192.168.8.195:6800/3181891717 <== osd.26
> 192.168.8.209:6821/2419345 3  osd_op_reply(21 1. [getxattr]
> v0'0 uv0 ondisk = -61 ((61) No data available)) v8  154+0+0 (3715233608
> 0 0) 0x2776340 con 0x18bd500
> >> >> >> > Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 MDSIOContextBase::complete:
> 18C_IO_Inode_Fetched
> >> >> >> > Oct 15 03:41:39.894658 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 mds.0.cache.ino(0x100) _fetched got 0 and 544
> >> >> >> > Oct 15 03:41:39.894658 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 mds.0.cache.ino(0x100)  magic is 'ceph fs
> volume v011' (expecting 'ceph fs volume v011')
> >> >> >> > Oct 15 03:41:39.894735 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10  mds.0.cache.snaprealm(0x100 seq 1 0x1799c00)
> open_parents [1,head]
> >> >> >> > Oct 15 03:41:39.894735 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 mds.0.cache.ino(0x100) _fetched [inode 0x100
> [...2,head] ~mds0/

Re: [ceph-users] Crashed MDS (segfault)

2019-10-21 Thread Gustavo Tonini
Is there a possibility to lose data if I use "cephfs-data-scan init
--force-init"?

On Mon, Oct 21, 2019 at 4:36 AM Yan, Zheng  wrote:

> On Fri, Oct 18, 2019 at 9:10 AM Gustavo Tonini 
> wrote:
> >
> > Hi Zheng,
> > the cluster is running ceph mimic. This warning about network only
> appears when using nautilus' cephfs-journal-tool.
> >
> > "cephfs-data-scan scan_links" does not report any issue.
> >
> > How could variable "newparent" be NULL at
> https://github.com/ceph/ceph/blob/master/src/mds/SnapRealm.cc#L599 ? Is
> there a way to fix this?
> >
>
>
> try 'cephfs-data-scan init'. It will setup root inode's snaprealm.
>
> > On Thu, Oct 17, 2019 at 9:58 PM Yan, Zheng  wrote:
> >>
> >> On Thu, Oct 17, 2019 at 10:19 PM Gustavo Tonini <
> gustavoton...@gmail.com> wrote:
> >> >
> >> > No. The cluster was just rebalancing.
> >> >
> >> > The journal seems damaged:
> >> >
> >> > ceph@deployer:~$ cephfs-journal-tool --rank=fs_padrao:0 journal
> inspect
> >> > 2019-10-16 17:46:29.596 7fcd34cbf700 -1 NetHandler create_socket
> couldn't create socket (97) Address family not supported by protocol
> >>
> >> corrupted journal shouldn't cause error like this. This is more like
> >> network issue. please double check network config of your cluster.
> >>
> >> > Overall journal integrity: DAMAGED
> >> > Corrupt regions:
> >> > 0x1c5e4d904ab-1c5e4d9ddbc
> >> > ceph@deployer:~$
> >> >
> >> > Could a journal reset help with this?
> >> >
> >> > I could snapshot all FS pools and export the journal before to
> guarantee a rollback to this state if something goes wrong with jounal
> reset.
> >> >
> >> > On Thu, Oct 17, 2019, 09:07 Yan, Zheng  wrote:
> >> >>
> >> >> On Tue, Oct 15, 2019 at 12:03 PM Gustavo Tonini <
> gustavoton...@gmail.com> wrote:
> >> >> >
> >> >> > Dear ceph users,
> >> >> > we're experiencing a segfault during MDS startup (replay process)
> which is making our FS inaccessible.
> >> >> >
> >> >> > MDS log messages:
> >> >> >
> >> >> > Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c08f49700  1 -- 192.168.8.195:6800/3181891717 <== osd.26
> 192.168.8.209:6821/2419345 3  osd_op_reply(21 1. [getxattr]
> v0'0 uv0 ondisk = -61 ((61) No data available)) v8  154+0+0 (3715233608
> 0 0) 0x2776340 con 0x18bd500
> >> >> > Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 MDSIOContextBase::complete:
> 18C_IO_Inode_Fetched
> >> >> > Oct 15 03:41:39.894658 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 mds.0.cache.ino(0x100) _fetched got 0 and 544
> >> >> > Oct 15 03:41:39.894658 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 mds.0.cache.ino(0x100)  magic is 'ceph fs
> volume v011' (expecting 'ceph fs volume v011')
> >> >> > Oct 15 03:41:39.894735 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10  mds.0.cache.snaprealm(0x100 seq 1 0x1799c00)
> open_parents [1,head]
> >> >> > Oct 15 03:41:39.894735 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 mds.0.cache.ino(0x100) _fetched [inode 0x100
> [...2,head] ~mds0/ auth v275131 snaprealm=0x1799c00 f(v0 1=1+0) n(v76166
> rc2020-07-17 15:29:27.00 b41838692297 -3184=-3168+-16)/n() (iversion
> lock) 0x18bf800]
> >> >> > Oct 15 03:41:39.894821 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 MDSIOContextBase::complete:
> 18C_IO_Inode_Fetched
> >> >> > Oct 15 03:41:39.894821 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 mds.0.cache.ino(0x1) _fetched got 0 and 482
> >> >> > Oct 15 03:41:39.894891 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.201 7f3c00589700 10 mds.0.cache.ino(0x1)  magic is 'ceph fs volume
> v011' (expecting 'ceph fs volume v011')
> >> >> > Oct 15 03:41:39.894958 mds1 ceph-mds:   -472> 2019-10-15
> 00:40:30.205 7f3c00589700 -1 *** Caught signal (Segmentation fault) **#012
> in thread 7f3c00589700 thread_name:fn_anonymous#012#012 ceph version 13.2.6
> (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)#012 1:
> (()+0x11390) [0x7f3c0e48a390]#012 2: (operator<<(std::ostream&, SnapRealm
> const&)+0x42) [0x72cb92]#012 3: (SnapRealm::merge_to(SnapRealm*)+0x3

Re: [ceph-users] Crashed MDS (segfault)

2019-10-17 Thread Gustavo Tonini
Hi Zheng,
the cluster is running ceph mimic. This warning about network only appears
when using nautilus' cephfs-journal-tool.

"cephfs-data-scan scan_links" does not report any issue.

How could variable "newparent" be NULL at
https://github.com/ceph/ceph/blob/master/src/mds/SnapRealm.cc#L599 ? Is
there a way to fix this?

On Thu, Oct 17, 2019 at 9:58 PM Yan, Zheng  wrote:

> On Thu, Oct 17, 2019 at 10:19 PM Gustavo Tonini 
> wrote:
> >
> > No. The cluster was just rebalancing.
> >
> > The journal seems damaged:
> >
> > ceph@deployer:~$ cephfs-journal-tool --rank=fs_padrao:0 journal inspect
> > 2019-10-16 17:46:29.596 7fcd34cbf700 -1 NetHandler create_socket
> couldn't create socket (97) Address family not supported by protocol
>
> corrupted journal shouldn't cause error like this. This is more like
> network issue. please double check network config of your cluster.
>
> > Overall journal integrity: DAMAGED
> > Corrupt regions:
> > 0x1c5e4d904ab-1c5e4d9ddbc
> > ceph@deployer:~$
> >
> > Could a journal reset help with this?
> >
> > I could snapshot all FS pools and export the journal before to guarantee
> a rollback to this state if something goes wrong with jounal reset.
> >
> > On Thu, Oct 17, 2019, 09:07 Yan, Zheng  wrote:
> >>
> >> On Tue, Oct 15, 2019 at 12:03 PM Gustavo Tonini <
> gustavoton...@gmail.com> wrote:
> >> >
> >> > Dear ceph users,
> >> > we're experiencing a segfault during MDS startup (replay process)
> which is making our FS inaccessible.
> >> >
> >> > MDS log messages:
> >> >
> >> > Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c08f49700  1 -- 192.168.8.195:6800/3181891717 <== osd.26
> 192.168.8.209:6821/2419345 3  osd_op_reply(21 1. [getxattr]
> v0'0 uv0 ondisk = -61 ((61) No data available)) v8  154+0+0 (3715233608
> 0 0) 0x2776340 con 0x18bd500
> >> > Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c00589700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched
> >> > Oct 15 03:41:39.894658 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c00589700 10 mds.0.cache.ino(0x100) _fetched got 0 and 544
> >> > Oct 15 03:41:39.894658 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c00589700 10 mds.0.cache.ino(0x100)  magic is 'ceph fs volume v011'
> (expecting 'ceph fs volume v011')
> >> > Oct 15 03:41:39.894735 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c00589700 10  mds.0.cache.snaprealm(0x100 seq 1 0x1799c00) open_parents
> [1,head]
> >> > Oct 15 03:41:39.894735 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c00589700 10 mds.0.cache.ino(0x100) _fetched [inode 0x100 [...2,head]
> ~mds0/ auth v275131 snaprealm=0x1799c00 f(v0 1=1+0) n(v76166 rc2020-07-17
> 15:29:27.00 b41838692297 -3184=-3168+-16)/n() (iversion lock) 0x18bf800]
> >> > Oct 15 03:41:39.894821 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c00589700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched
> >> > Oct 15 03:41:39.894821 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c00589700 10 mds.0.cache.ino(0x1) _fetched got 0 and 482
> >> > Oct 15 03:41:39.894891 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
> 7f3c00589700 10 mds.0.cache.ino(0x1)  magic is 'ceph fs volume v011'
> (expecting 'ceph fs volume v011')
> >> > Oct 15 03:41:39.894958 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.205
> 7f3c00589700 -1 *** Caught signal (Segmentation fault) **#012 in thread
> 7f3c00589700 thread_name:fn_anonymous#012#012 ceph version 13.2.6
> (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)#012 1:
> (()+0x11390) [0x7f3c0e48a390]#012 2: (operator<<(std::ostream&, SnapRealm
> const&)+0x42) [0x72cb92]#012 3: (SnapRealm::merge_to(SnapRealm*)+0x308)
> [0x72f488]#012 4: (CInode::decode_snap_blob(ceph::buffer::list&)+0x53)
> [0x6e1f63]#012 5:
> (CInode::decode_store(ceph::buffer::list::iterator&)+0x76) [0x702b86]#012
> 6: (CInode::_fetched(ceph::buffer::list&, ceph::buffer::list&,
> Context*)+0x1b2) [0x702da2]#012 7: (MDSIOContextBase::complete(int)+0x119)
> [0x74fcc9]#012 8: (Finisher::finisher_thread_entry()+0x12e)
> [0x7f3c0ebffece]#012 9: (()+0x76ba) [0x7f3c0e4806ba]#012 10: (clone()+0x6d)
> [0x7f3c0dca941d]#012 NOTE: a copy of the executable, or `objdump -rdS
> ` is needed to interpret this.
> >> > Oct 15 03:41:39.895400 mds1 ceph-mds: --- logging levels ---
> >> > Oct 15 03:41:39.895473 mds1 ceph-mds:0/ 5 none
> >> > Oct 15 03:41:39.895473 mds1 ceph-mds:0/ 1 lockdep
>

[ceph-users] Crashed MDS (segfault)

2019-10-14 Thread Gustavo Tonini
Dear ceph users,
we're experiencing a segfault during MDS startup (replay process) which is
making our FS inaccessible.

MDS log messages:

Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c08f49700  1 -- 192.168.8.195:6800/3181891717 <== osd.26
192.168.8.209:6821/2419345 3  osd_op_reply(21 1. [getxattr]
v0'0 uv0 ondisk = -61 ((61) No data available)) v8  154+0+0 (3715233608
0 0) 0x2776340 con 0x18bd500
Oct 15 03:41:39.894584 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c00589700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched
Oct 15 03:41:39.894658 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x100) _fetched got 0 and 544
Oct 15 03:41:39.894658 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x100)  magic is 'ceph fs volume v011'
(expecting 'ceph fs volume v011')
Oct 15 03:41:39.894735 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c00589700 10  mds.0.cache.snaprealm(0x100 seq 1 0x1799c00) open_parents
[1,head]
Oct 15 03:41:39.894735 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x100) _fetched [inode 0x100 [...2,head]
~mds0/ auth v275131 snaprealm=0x1799c00 f(v0 1=1+0) n(v76166 rc2020-07-17
15:29:27.00 b41838692297 -3184=-3168+-16)/n() (iversion lock) 0x18bf800]
Oct 15 03:41:39.894821 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c00589700 10 MDSIOContextBase::complete: 18C_IO_Inode_Fetched
Oct 15 03:41:39.894821 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x1) _fetched got 0 and 482
Oct 15 03:41:39.894891 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.201
7f3c00589700 10 mds.0.cache.ino(0x1)  magic is 'ceph fs volume v011'
(expecting 'ceph fs volume v011')
Oct 15 03:41:39.894958 mds1 ceph-mds:   -472> 2019-10-15 00:40:30.205
7f3c00589700 -1 *** Caught signal (Segmentation fault) **#012 in thread
7f3c00589700 thread_name:fn_anonymous#012#012 ceph version 13.2.6
(7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)#012 1:
(()+0x11390) [0x7f3c0e48a390]#012 2: (operator<<(std::ostream&, SnapRealm
const&)+0x42) [0x72cb92]#012 3: (SnapRealm::merge_to(SnapRealm*)+0x308)
[0x72f488]#012 4: (CInode::decode_snap_blob(ceph::buffer::list&)+0x53)
[0x6e1f63]#012 5:
(CInode::decode_store(ceph::buffer::list::iterator&)+0x76) [0x702b86]#012
6: (CInode::_fetched(ceph::buffer::list&, ceph::buffer::list&,
Context*)+0x1b2) [0x702da2]#012 7: (MDSIOContextBase::complete(int)+0x119)
[0x74fcc9]#012 8: (Finisher::finisher_thread_entry()+0x12e)
[0x7f3c0ebffece]#012 9: (()+0x76ba) [0x7f3c0e4806ba]#012 10: (clone()+0x6d)
[0x7f3c0dca941d]#012 NOTE: a copy of the executable, or `objdump -rdS
` is needed to interpret this.
Oct 15 03:41:39.895400 mds1 ceph-mds: --- logging levels ---
Oct 15 03:41:39.895473 mds1 ceph-mds:0/ 5 none
Oct 15 03:41:39.895473 mds1 ceph-mds:0/ 1 lockdep


Cluster status information:

  cluster:
id: b8205875-e56f-4280-9e52-6aab9c758586
health: HEALTH_WARN
1 filesystem is degraded
1 nearfull osd(s)
11 pool(s) nearfull

  services:
mon: 3 daemons, quorum mon1,mon2,mon3
mgr: mon1(active), standbys: mon2, mon3
mds: fs_padrao-1/1/1 up  {0=mds1=up:replay(laggy or crashed)}
osd: 90 osds: 90 up, 90 in

  data:
pools:   11 pools, 1984 pgs
objects: 75.99 M objects, 285 TiB
usage:   457 TiB used, 181 TiB / 639 TiB avail
pgs: 1896 active+clean
 87   active+clean+scrubbing+deep+repair
 1active+clean+scrubbing

  io:
client:   89 KiB/s wr, 0 op/s rd, 3 op/s wr

Has anyone seen anything like this?

Regards,
Gustavo.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com