On 29-8-2017 19:12, Steve Taylor wrote: > Hong, > > Probably your best chance at recovering any data without special, > expensive, forensic procedures is to perform a dd from /dev/sdb to > somewhere else large enough to hold a full disk image and attempt to > repair that. You'll want to use 'conv=noerror' with your dd command > since your disk is failing. Then you could either re-attach the OSD > from the new source or attempt to retrieve objects from the filestore > on it.
Like somebody else already pointed out In problem "cases like disk, use dd_rescue. It has really a far better chance of restoring a copy of your disk --WjW > I have actually done this before by creating an RBD that matches the > disk size, performing the dd, running xfs_repair, and eventually > adding it back to the cluster as an OSD. RBDs as OSDs is certainly a > temporary arrangement for repair only, but I'm happy to report that it > worked flawlessly in my case. I was able to weight the OSD to 0, > offload all of its data, then remove it for a full recovery, at which > point I just deleted the RBD. > > The possibilities afforded by Ceph inception are endless. ☺ > > > > Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2799 | > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this message > is prohibited. > > > > On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote: >> Rule of thumb with batteries is: >> - more “proper temperature” you run them at the more life you get out >> of them >> - more battery is overpowered for your application the longer it will >> survive. >> >> Get your self a LSI 94** controller and use it as HBA and you will be >> fine. but get MORE DRIVES !!!!! … >>> On 28 Aug 2017, at 23:10, hjcho616 <hjcho...@yahoo.com> wrote: >>> >>> Thank you Tomasz and Ronny. I'll have to order some hdd soon and >>> try these out. Car battery idea is nice! I may try that.. =) Do >>> they last longer? Ones that fit the UPS original battery spec >>> didn't last very long... part of the reason why I gave up on them.. >>> =P My wife probably won't like the idea of car battery hanging out >>> though ha! >>> >>> The OSD1 (one with mostly ok OSDs, except that smart failure) >>> motherboard doesn't have any additional SATA connectors available. >>> Would it be safe to add another OSD host? >>> >>> Regards, >>> Hong >>> >>> >>> >>> On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz@g >>> mail.com> wrote: >>> >>> >>> Sorry for being brutal … anyway >>> 1. get the battery for UPS ( a car battery will do as well, I’ve >>> moded on ups in the past with truck battery and it was working like >>> a charm :D ) >>> 2. get spare drives and put those in because your cluster CAN NOT >>> get out of error due to lack of space >>> 3. Follow advice of Ronny Aasen on hot to recover data from hard >>> drives >>> 4 get cooling to drives or you will loose more ! >>> >>> >>>> On 28 Aug 2017, at 22:39, hjcho616 <hjcho...@yahoo.com> wrote: >>>> >>>> Tomasz, >>>> >>>> Those machines are behind a surge protector. Doesn't appear to >>>> be a good one! I do have a UPS... but it is my fault... no >>>> battery. Power was pretty reliable for a while... and UPS was >>>> just beeping every chance it had, disrupting some sleep.. =P So >>>> running on surge protector only. I am running this in home >>>> environment. So far, HDD failures have been very rare for this >>>> environment. =) It just doesn't get loaded as much! I am not >>>> sure what to expect, seeing that "unfound" and just a feeling of >>>> possibility of maybe getting OSD back made me excited about it. >>>> =) Thanks for letting me know what should be the priority. I >>>> just lack experience and knowledge in this. =) Please do continue >>>> to guide me though this. >>>> >>>> Thank you for the decode of that smart messages! I do agree that >>>> looks like it is on its way out. I would like to know how to get >>>> good portion of it back if possible. =) >>>> >>>> I think I just set the size and min_size to 1. >>>> # ceph osd lspools >>>> 0 data,1 metadata,2 rbd, >>>> # ceph osd pool set rbd size 1 >>>> set pool 2 size to 1 >>>> # ceph osd pool set rbd min_size 1 >>>> set pool 2 min_size to 1 >>>> >>>> Seems to be doing some backfilling work. >>>> >>>> # ceph health >>>> HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2 >>>> pgs backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling; >>>> 108 pgs degraded; 6 pgs down; 6 pgs inconsistent; 6 pgs peering; >>>> 7 pgs recovery_wait; 16 pgs stale; 108 pgs stuck degraded; 6 pgs >>>> stuck inactive; 16 pgs stuck stale; 130 pgs stuck unclean; 101 >>>> pgs stuck undersized; 101 pgs undersized; 1 requests are blocked >>>>> 32 sec; recovery 1790657/4502340 objects degraded (39.772%); >>>> recovery 641906/4502340 objects misplaced (14.257%); recovery >>>> 147/2251990 unfound (0.007%); 50 scrub errors; mds cluster is >>>> degraded; no legacy OSD present but 'sortbitwise' flag is not set >>>> >>>> >>>> >>>> Regards, >>>> Hong >>>> >>>> >>>> On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz <tom.kusmierz >>>> @gmail.com> wrote: >>>> >>>> >>>> So to decode few things about your disk: >>>> >>>> 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail >>>> Always - 37 >>>> 37 read erros and only one sector marked as pending - fun disk >>>> :/ >>>> >>>> 181 Program_Fail_Cnt_Total 0x0022 099 099 000 Old_age >>>> Always - 35325174 >>>> So firmware has quite few bugs, that’s nice >>>> >>>> 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age >>>> Always - 2855 >>>> disk was thrown around while operational even more nice. >>>> >>>> 194 Temperature_Celsius 0x0002 047 041 000 Old_age >>>> Always - 53 (Min/Max 15/59) >>>> if your disk passes 50 you should not consider using it, high >>>> temperatures demagnetise plate layer and you will see more errors >>>> in very near future. >>>> >>>> 197 Current_Pending_Sector 0x0032 100 100 000 Old_age >>>> Always - 1 >>>> as mentioned before :) >>>> >>>> 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age >>>> Always - 4222 >>>> your heads keep missing tracks … bent ? I don’t even know how to >>>> comment here. >>>> >>>> >>>> generally fun drive you’ve got there … rescue as much as you can >>>> and throw it away !!! >>>> >>>> >>> >>> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com