Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)
>> >> I'm on IRC (as MooingLemur) if more real-time communication would help :) > > Sure, I'll try to contact you there. In the meantime could you open up > a tracker showing the crash stack trace above and a brief description > of the current situation and the events leading up to it? Could you > also get a debug log of one of these crashes with "debug bluestore = > 20" and, ideally, a coredump? > https://tracker.ceph.com/issues/25001 As mentioned in the bug, I was mistaken when I mentioned here that these were SSDs. They're SATA, so the crashing ones aren't hosting a cache. Thanks! -Troy ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)
On Thu, Jul 19, 2018 at 12:47 PM, Troy Ablan wrote: > > > On 07/18/2018 06:37 PM, Brad Hubbard wrote: >> On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan wrote: >>> >>> >>> On 07/17/2018 11:14 PM, Brad Hubbard wrote: On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote: > > I was on 12.2.5 for a couple weeks and started randomly seeing > corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke > loose. I panicked and moved to Mimic, and when that didn't solve the > problem, only then did I start to root around in mailing lists archives. > > It appears I can't downgrade OSDs back to Luminous now that 12.2.7 is > out, but I'm unsure how to proceed now that the damaged cluster is > running under Mimic. Is there anything I can do to get the cluster back > online and objects readable? That depends on what the specific problem is. Can you provide some data that fills in the blanks around "randomly seeing corruption"? >>> Thanks for the reply, Brad. I have a feeling that almost all of this stems >>> from the time the cluster spent running 12.2.6. When booting VMs that use >>> rbd as a backing store, they typically get I/O errors during boot and cannot >>> read critical parts of the image. I also get similar errors if I try to rbd >>> export most of the images. Also, CephFS is not started as ceph -s indicates >>> damage. >>> >>> Many of the OSDs have been crashing and restarting as I've tried to rbd >>> export good versions of images (from older snapshots). Here's one >>> particular crash: >>> >>> 2018-07-18 15:52:15.809 7fcbaab77700 -1 >>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/h >>> uge/release/13.2.0/rpm/el7/BUILD/ceph-13.2.0/src/os/bluestore/BlueStore.h: >>> In function 'void >>> BlueStore::SharedBlobSet::remove_last(BlueStore::SharedBlob*)' thread >>> 7fcbaab7 >>> 7700 time 2018-07-18 15:52:15.750916 >>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.0/rpm/el7/BUILD/ceph-13 >>> .2.0/src/os/bluestore/BlueStore.h: 455: FAILED assert(sb->nref == 0) >>> >>> ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic >>> (stable) >>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0xff) [0x7fcbc197a53f] >>> 2: (()+0x286727) [0x7fcbc197a727] >>> 3: (BlueStore::SharedBlob::put()+0x1da) [0x5641f39181ca] >>> 4: (std::_Rb_tree, >>> boost::intrusive_ptr, >>> std::_Identity >, >>> std::less >, >>> std::allocator > ::_M_erase(std::_Rb_tree_node>> lueStore::SharedBlob> >*)+0x2d) [0x5641f3977cfd] >>> 5: (std::_Rb_tree, >>> boost::intrusive_ptr, >>> std::_Identity >, >>> std::less >, >>> std::allocator > ::_M_erase(std::_Rb_tree_node>> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] >>> 6: (std::_Rb_tree, >>> boost::intrusive_ptr, >>> std::_Identity >, >>> std::less >, >>> std::allocator > ::_M_erase(std::_Rb_tree_node>> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] >>> 7: (std::_Rb_tree, >>> boost::intrusive_ptr, >>> std::_Identity >, >>> std::less >, >>> std::allocator > ::_M_erase(std::_Rb_tree_node>> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] >>> 8: (BlueStore::TransContext::~TransContext()+0xf7) [0x5641f3979297] >>> 9: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x610) >>> [0x5641f391c9b0] >>> 10: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x9a) >>> [0x5641f392a38a] >>> 11: (BlueStore::_kv_finalize_thread()+0x41e) [0x5641f392b3be] >>> 12: (BlueStore::KVFinalizeThread::entry()+0xd) [0x5641f397d85d] >>> 13: (()+0x7e25) [0x7fcbbe4d2e25] >>> 14: (clone()+0x6d) [0x7fcbbd5c3bad] >>> NOTE: a copy of the executable, or `objdump -rdS ` is needed to >>> interpret this. >>> >>> >>> Here's the output of ceph -s that might fill in some configuration >>> questions. Since osds are continually restarting if I try to put load on >>> it, the cluster seems to be churning a bit. That's why I set nodown for >>> now. >>> >>> cluster: >>> id: b2873c9a-5539-4c76-ac4a-a6c9829bfed2 >>> health: HEALTH_ERR >>> 1 filesystem is degraded >>> 1 filesystem is offline >>> 1 mds daemon damaged >>> nodown,noscrub,nodeep-scrub flag(s) set >>> 9 scrub errors >>> Reduced data availability: 61 pgs inactive, 56 pgs peering, 4 >>> pgs stale >>> Possible data damage: 3 pgs inconsistent >>> 16 slow requests are blocked > 32 sec >>> 26 stuck requests are blocked > 4096 sec >>> >>> services: >>> mon: 5 daemons, quorum a,b,c,d,e >>> mgr: a(active), standbys: b, d, e, c >>> mds: lcs-0/1/1 up , 2 up:standby, 1 damaged >>> osd: 34 osds: 34 up, 34 in >>> flags nodown,noscrub,nodeep-scrub >>> >>> data: >>> pools: 15 pools, 640 pgs >>>
Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)
On 07/18/2018 06:37 PM, Brad Hubbard wrote: > On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan wrote: >> >> >> On 07/17/2018 11:14 PM, Brad Hubbard wrote: >>> >>> On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote: I was on 12.2.5 for a couple weeks and started randomly seeing corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke loose. I panicked and moved to Mimic, and when that didn't solve the problem, only then did I start to root around in mailing lists archives. It appears I can't downgrade OSDs back to Luminous now that 12.2.7 is out, but I'm unsure how to proceed now that the damaged cluster is running under Mimic. Is there anything I can do to get the cluster back online and objects readable? >>> >>> That depends on what the specific problem is. Can you provide some >>> data that fills in the blanks around "randomly seeing corruption"? >>> >> Thanks for the reply, Brad. I have a feeling that almost all of this stems >> from the time the cluster spent running 12.2.6. When booting VMs that use >> rbd as a backing store, they typically get I/O errors during boot and cannot >> read critical parts of the image. I also get similar errors if I try to rbd >> export most of the images. Also, CephFS is not started as ceph -s indicates >> damage. >> >> Many of the OSDs have been crashing and restarting as I've tried to rbd >> export good versions of images (from older snapshots). Here's one >> particular crash: >> >> 2018-07-18 15:52:15.809 7fcbaab77700 -1 >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/h >> uge/release/13.2.0/rpm/el7/BUILD/ceph-13.2.0/src/os/bluestore/BlueStore.h: >> In function 'void >> BlueStore::SharedBlobSet::remove_last(BlueStore::SharedBlob*)' thread >> 7fcbaab7 >> 7700 time 2018-07-18 15:52:15.750916 >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.0/rpm/el7/BUILD/ceph-13 >> .2.0/src/os/bluestore/BlueStore.h: 455: FAILED assert(sb->nref == 0) >> >> ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic >> (stable) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0xff) [0x7fcbc197a53f] >> 2: (()+0x286727) [0x7fcbc197a727] >> 3: (BlueStore::SharedBlob::put()+0x1da) [0x5641f39181ca] >> 4: (std::_Rb_tree, >> boost::intrusive_ptr, >> std::_Identity >, >> std::less >, >> std::allocator > >>> ::_M_erase(std::_Rb_tree_node> lueStore::SharedBlob> >*)+0x2d) [0x5641f3977cfd] >> 5: (std::_Rb_tree, >> boost::intrusive_ptr, >> std::_Identity >, >> std::less >, >> std::allocator > >>> ::_M_erase(std::_Rb_tree_node> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] >> 6: (std::_Rb_tree, >> boost::intrusive_ptr, >> std::_Identity >, >> std::less >, >> std::allocator > >>> ::_M_erase(std::_Rb_tree_node> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] >> 7: (std::_Rb_tree, >> boost::intrusive_ptr, >> std::_Identity >, >> std::less >, >> std::allocator > >>> ::_M_erase(std::_Rb_tree_node> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] >> 8: (BlueStore::TransContext::~TransContext()+0xf7) [0x5641f3979297] >> 9: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x610) >> [0x5641f391c9b0] >> 10: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x9a) >> [0x5641f392a38a] >> 11: (BlueStore::_kv_finalize_thread()+0x41e) [0x5641f392b3be] >> 12: (BlueStore::KVFinalizeThread::entry()+0xd) [0x5641f397d85d] >> 13: (()+0x7e25) [0x7fcbbe4d2e25] >> 14: (clone()+0x6d) [0x7fcbbd5c3bad] >> NOTE: a copy of the executable, or `objdump -rdS ` is needed to >> interpret this. >> >> >> Here's the output of ceph -s that might fill in some configuration >> questions. Since osds are continually restarting if I try to put load on >> it, the cluster seems to be churning a bit. That's why I set nodown for >> now. >> >> cluster: >> id: b2873c9a-5539-4c76-ac4a-a6c9829bfed2 >> health: HEALTH_ERR >> 1 filesystem is degraded >> 1 filesystem is offline >> 1 mds daemon damaged >> nodown,noscrub,nodeep-scrub flag(s) set >> 9 scrub errors >> Reduced data availability: 61 pgs inactive, 56 pgs peering, 4 >> pgs stale >> Possible data damage: 3 pgs inconsistent >> 16 slow requests are blocked > 32 sec >> 26 stuck requests are blocked > 4096 sec >> >> services: >> mon: 5 daemons, quorum a,b,c,d,e >> mgr: a(active), standbys: b, d, e, c >> mds: lcs-0/1/1 up , 2 up:standby, 1 damaged >> osd: 34 osds: 34 up, 34 in >> flags nodown,noscrub,nodeep-scrub >> >> data: >> pools: 15 pools, 640 pgs >> objects: 9.73 M objects, 13 TiB >> usage: 24 TiB used, 55 TiB / 79 TiB avail >> pgs: 23.438% pgs not active >> 487 active+clean >> 73
Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)
On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan wrote: > > > On 07/17/2018 11:14 PM, Brad Hubbard wrote: >> >> On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote: >>> >>> I was on 12.2.5 for a couple weeks and started randomly seeing >>> corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke >>> loose. I panicked and moved to Mimic, and when that didn't solve the >>> problem, only then did I start to root around in mailing lists archives. >>> >>> It appears I can't downgrade OSDs back to Luminous now that 12.2.7 is >>> out, but I'm unsure how to proceed now that the damaged cluster is >>> running under Mimic. Is there anything I can do to get the cluster back >>> online and objects readable? >> >> That depends on what the specific problem is. Can you provide some >> data that fills in the blanks around "randomly seeing corruption"? >> > Thanks for the reply, Brad. I have a feeling that almost all of this stems > from the time the cluster spent running 12.2.6. When booting VMs that use > rbd as a backing store, they typically get I/O errors during boot and cannot > read critical parts of the image. I also get similar errors if I try to rbd > export most of the images. Also, CephFS is not started as ceph -s indicates > damage. > > Many of the OSDs have been crashing and restarting as I've tried to rbd > export good versions of images (from older snapshots). Here's one > particular crash: > > 2018-07-18 15:52:15.809 7fcbaab77700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/h > uge/release/13.2.0/rpm/el7/BUILD/ceph-13.2.0/src/os/bluestore/BlueStore.h: > In function 'void > BlueStore::SharedBlobSet::remove_last(BlueStore::SharedBlob*)' thread > 7fcbaab7 > 7700 time 2018-07-18 15:52:15.750916 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.0/rpm/el7/BUILD/ceph-13 > .2.0/src/os/bluestore/BlueStore.h: 455: FAILED assert(sb->nref == 0) > > ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0xff) [0x7fcbc197a53f] > 2: (()+0x286727) [0x7fcbc197a727] > 3: (BlueStore::SharedBlob::put()+0x1da) [0x5641f39181ca] > 4: (std::_Rb_tree, > boost::intrusive_ptr, > std::_Identity >, > std::less >, > std::allocator > >>::_M_erase(std::_Rb_tree_node lueStore::SharedBlob> >*)+0x2d) [0x5641f3977cfd] > 5: (std::_Rb_tree, > boost::intrusive_ptr, > std::_Identity >, > std::less >, > std::allocator > >>::_M_erase(std::_Rb_tree_node lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] > 6: (std::_Rb_tree, > boost::intrusive_ptr, > std::_Identity >, > std::less >, > std::allocator > >>::_M_erase(std::_Rb_tree_node lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] > 7: (std::_Rb_tree, > boost::intrusive_ptr, > std::_Identity >, > std::less >, > std::allocator > >>::_M_erase(std::_Rb_tree_node lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] > 8: (BlueStore::TransContext::~TransContext()+0xf7) [0x5641f3979297] > 9: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x610) > [0x5641f391c9b0] > 10: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x9a) > [0x5641f392a38a] > 11: (BlueStore::_kv_finalize_thread()+0x41e) [0x5641f392b3be] > 12: (BlueStore::KVFinalizeThread::entry()+0xd) [0x5641f397d85d] > 13: (()+0x7e25) [0x7fcbbe4d2e25] > 14: (clone()+0x6d) [0x7fcbbd5c3bad] > NOTE: a copy of the executable, or `objdump -rdS ` is needed to > interpret this. > > > Here's the output of ceph -s that might fill in some configuration > questions. Since osds are continually restarting if I try to put load on > it, the cluster seems to be churning a bit. That's why I set nodown for > now. > > cluster: > id: b2873c9a-5539-4c76-ac4a-a6c9829bfed2 > health: HEALTH_ERR > 1 filesystem is degraded > 1 filesystem is offline > 1 mds daemon damaged > nodown,noscrub,nodeep-scrub flag(s) set > 9 scrub errors > Reduced data availability: 61 pgs inactive, 56 pgs peering, 4 > pgs stale > Possible data damage: 3 pgs inconsistent > 16 slow requests are blocked > 32 sec > 26 stuck requests are blocked > 4096 sec > > services: > mon: 5 daemons, quorum a,b,c,d,e > mgr: a(active), standbys: b, d, e, c > mds: lcs-0/1/1 up , 2 up:standby, 1 damaged > osd: 34 osds: 34 up, 34 in > flags nodown,noscrub,nodeep-scrub > > data: > pools: 15 pools, 640 pgs > objects: 9.73 M objects, 13 TiB > usage: 24 TiB used, 55 TiB / 79 TiB avail > pgs: 23.438% pgs not active > 487 active+clean > 73 peering > 70 activating > 5 stale+peering > 3 active+clean+inconsistent > 2 stale+activating > > io: >
Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)
On 07/17/2018 11:14 PM, Brad Hubbard wrote: On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote: I was on 12.2.5 for a couple weeks and started randomly seeing corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke loose. I panicked and moved to Mimic, and when that didn't solve the problem, only then did I start to root around in mailing lists archives. It appears I can't downgrade OSDs back to Luminous now that 12.2.7 is out, but I'm unsure how to proceed now that the damaged cluster is running under Mimic. Is there anything I can do to get the cluster back online and objects readable? That depends on what the specific problem is. Can you provide some data that fills in the blanks around "randomly seeing corruption"? Thanks for the reply, Brad. I have a feeling that almost all of this stems from the time the cluster spent running 12.2.6. When booting VMs that use rbd as a backing store, they typically get I/O errors during boot and cannot read critical parts of the image. I also get similar errors if I try to rbd export most of the images. Also, CephFS is not started as ceph -s indicates damage. Many of the OSDs have been crashing and restarting as I've tried to rbd export good versions of images (from older snapshots). Here's one particular crash: 2018-07-18 15:52:15.809 7fcbaab77700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/h uge/release/13.2.0/rpm/el7/BUILD/ceph-13.2.0/src/os/bluestore/BlueStore.h: In function 'void BlueStore::SharedBlobSet::remove_last(BlueStore::SharedBlob*)' thread 7fcbaab7 7700 time 2018-07-18 15:52:15.750916 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.0/rpm/el7/BUILD/ceph-13 .2.0/src/os/bluestore/BlueStore.h: 455: FAILED assert(sb->nref == 0) ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7fcbc197a53f] 2: (()+0x286727) [0x7fcbc197a727] 3: (BlueStore::SharedBlob::put()+0x1da) [0x5641f39181ca] 4: (std::_Rb_tree, boost::intrusive_ptr, std::_Identity >, std::less >, std::allocator > >::_M_erase(std::_Rb_tree_node lueStore::SharedBlob> >*)+0x2d) [0x5641f3977cfd] 5: (std::_Rb_tree, boost::intrusive_ptr, std::_Identity >, std::less >, std::allocator > >::_M_erase(std::_Rb_tree_node lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] 6: (std::_Rb_tree, boost::intrusive_ptr, std::_Identity >, std::less >, std::allocator > >::_M_erase(std::_Rb_tree_node lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] 7: (std::_Rb_tree, boost::intrusive_ptr, std::_Identity >, std::less >, std::allocator > >::_M_erase(std::_Rb_tree_node lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb] 8: (BlueStore::TransContext::~TransContext()+0xf7) [0x5641f3979297] 9: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x610) [0x5641f391c9b0] 10: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x9a) [0x5641f392a38a] 11: (BlueStore::_kv_finalize_thread()+0x41e) [0x5641f392b3be] 12: (BlueStore::KVFinalizeThread::entry()+0xd) [0x5641f397d85d] 13: (()+0x7e25) [0x7fcbbe4d2e25] 14: (clone()+0x6d) [0x7fcbbd5c3bad] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Here's the output of ceph -s that might fill in some configuration questions. Since osds are continually restarting if I try to put load on it, the cluster seems to be churning a bit. That's why I set nodown for now. cluster: id: b2873c9a-5539-4c76-ac4a-a6c9829bfed2 health: HEALTH_ERR 1 filesystem is degraded 1 filesystem is offline 1 mds daemon damaged nodown,noscrub,nodeep-scrub flag(s) set 9 scrub errors Reduced data availability: 61 pgs inactive, 56 pgs peering, 4 pgs stale Possible data damage: 3 pgs inconsistent 16 slow requests are blocked > 32 sec 26 stuck requests are blocked > 4096 sec services: mon: 5 daemons, quorum a,b,c,d,e mgr: a(active), standbys: b, d, e, c mds: lcs-0/1/1 up , 2 up:standby, 1 damaged osd: 34 osds: 34 up, 34 in flags nodown,noscrub,nodeep-scrub data: pools: 15 pools, 640 pgs objects: 9.73 M objects, 13 TiB usage: 24 TiB used, 55 TiB / 79 TiB avail pgs: 23.438% pgs not active 487 active+clean 73 peering 70 activating 5 stale+peering 3 active+clean+inconsistent 2 stale+activating io: client: 1.3 KiB/s wr, 0 op/s rd, 0 op/s wr If there's any other information I can provide that can help point to the problem, I'd be glad to share. Thanks -Troy ___ ceph-users mailing list
Re: [ceph-users] Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)
On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan wrote: > I was on 12.2.5 for a couple weeks and started randomly seeing > corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke > loose. I panicked and moved to Mimic, and when that didn't solve the > problem, only then did I start to root around in mailing lists archives. > > It appears I can't downgrade OSDs back to Luminous now that 12.2.7 is > out, but I'm unsure how to proceed now that the damaged cluster is > running under Mimic. Is there anything I can do to get the cluster back > online and objects readable? That depends on what the specific problem is. Can you provide some data that fills in the blanks around "randomly seeing corruption"? > > Everything is BlueStore and most of it is EC. > > Thanks. > > -Troy > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com