Sorry for being brutal … anyway 1. get the battery for UPS ( a car battery will do as well, I’ve moded on ups in the past with truck battery and it was working like a charm :D ) 2. get spare drives and put those in because your cluster CAN NOT get out of error due to lack of space 3. Follow advice of Ronny Aasen on hot to recover data from hard drives 4 get cooling to drives or you will loose more !
> On 28 Aug 2017, at 22:39, hjcho616 <hjcho...@yahoo.com> wrote: > > Tomasz, > > Those machines are behind a surge protector. Doesn't appear to be a good > one! I do have a UPS... but it is my fault... no battery. Power was pretty > reliable for a while... and UPS was just beeping every chance it had, > disrupting some sleep.. =P So running on surge protector only. I am running > this in home environment. So far, HDD failures have been very rare for this > environment. =) It just doesn't get loaded as much! I am not sure what to > expect, seeing that "unfound" and just a feeling of possibility of maybe > getting OSD back made me excited about it. =) Thanks for letting me know what > should be the priority. I just lack experience and knowledge in this. =) > Please do continue to guide me though this. > > Thank you for the decode of that smart messages! I do agree that looks like > it is on its way out. I would like to know how to get good portion of it > back if possible. =) > > I think I just set the size and min_size to 1. > # ceph osd lspools > 0 data,1 metadata,2 rbd, > # ceph osd pool set rbd size 1 > set pool 2 size to 1 > # ceph osd pool set rbd min_size 1 > set pool 2 min_size to 1 > > Seems to be doing some backfilling work. > > # ceph health > HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2 pgs > backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling; 108 pgs degraded; > 6 pgs down; 6 pgs inconsistent; 6 pgs peering; 7 pgs recovery_wait; 16 pgs > stale; 108 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs stuck stale; 130 > pgs stuck unclean; 101 pgs stuck undersized; 101 pgs undersized; 1 requests > are blocked > 32 sec; recovery 1790657/4502340 objects degraded (39.772%); > recovery 641906/4502340 objects misplaced (14.257%); recovery 147/2251990 > unfound (0.007%); 50 scrub errors; mds cluster is degraded; no legacy OSD > present but 'sortbitwise' flag is not set > > > > Regards, > Hong > > > On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz <tom.kusmi...@gmail.com> > wrote: > > > So to decode few things about your disk: > > 1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - > 37 > 37 read erros and only one sector marked as pending - fun disk :/ > > 181 Program_Fail_Cnt_Total 0x0022 099 099 000 Old_age Always - > 35325174 > So firmware has quite few bugs, that’s nice > > 191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - > 2855 > disk was thrown around while operational even more nice. > > 194 Temperature_Celsius 0x0002 047 041 000 Old_age Always - > 53 (Min/Max 15/59) > if your disk passes 50 you should not consider using it, high temperatures > demagnetise plate layer and you will see more errors in very near future. > > 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - > 1 > as mentioned before :) > > 200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - > 4222 > your heads keep missing tracks … bent ? I don’t even know how to comment here. > > > generally fun drive you’ve got there … rescue as much as you can and throw it > away !!! > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com