I have a small update to this: After an even closer reading of an offending pg's query I noticed the following:
"peer": "4", "pgid": "19.6e", "last_update": "51072'48910307", "last_complete": "51072'48910307", "log_tail": "50495'48906592", The log tail seems to have lagged behind the last_update/last_complete. I suspect this is whats causing the cluster to reject these pgs. Anyone know how i can go about cleaning this up? Aaron > On Dec 1, 2014, at 8:12 PM, Aaron Bassett <aa...@five3genomics.com> wrote: > > Hi all, I have a problem with some incomplete pgs. Here’s the backstory: I > had a pool that I had accidently left with a size of 2. On one of the ods > nodes, the system hdd started to fail and I attempted to rescue it by > sacrificing one of my osd nodes. That went ok and I was able to bring the > node back up minus the one osd. Now I have 11 incomplete osds. I believe > these are mostly from the pool that only had size two, but I cant tell for > sure. I found another thread on here that talked about using > ceph_objectstore_tool to add or remove pg data to get out of an incomplete > state. > > Let’s start with the one pg I’ve been playing with, this is a loose > description of where I’ve been. First I saw that it had the missing osd in > “down_osds_we_would_probe” when I queried it, and some reading around that > told me to recreate the missing osd, so I did that. It (obviously) didnt have > the missing data, but it took the pg from down+incomplete to just incomplete. > Then I tried pg_force_create and that didnt seem to make a difference. Some > more googling then brought me to ceph_objectstore_tool and I started to take > a closer look at the results from pg query. I noticed that the list of > probing osds gets longer and longer till the end of the query has something > like: > > "probing_osds": [ > "0", > "3", > "4", > "16", > "23", > "26", > "35", > "41", > "44", > "51", > "56”], > > So I took a look at those osds and noticed that some of them have data in the > directory for the troublesome pg and others dont. So I tried picking one with > the *most* data and i used ceph_objectstore_tool to export the pg. It was > > 6G so a fair amount of data is still there. I then imported it (after > removing) into all the others in that list. Unfortunately, it is still > incomplete. I’m not sure what my next step should be here. Here’s some other > stuff from the query on it: > > "info": { "pgid": "0.63b", > "last_update": "50495'8246", > "last_complete": "50495'8246", > "log_tail": "20346'5245", > "last_user_version": 8246, > "last_backfill": "MAX", > "purged_snaps": "[]", > "history": { "epoch_created": 1, > "last_epoch_started": 51102, > "last_epoch_clean": 50495, > "last_epoch_split": 0, > "same_up_since": 68312, > "same_interval_since": 68312, > "same_primary_since": 68190, > "last_scrub": "28158'8240", > "last_scrub_stamp": "2014-11-18 17:08:49.368486", > "last_deep_scrub": "28158'8240", > "last_deep_scrub_stamp": "2014-11-18 17:08:49.368486", > "last_clean_scrub_stamp": "2014-11-18 17:08:49.368486"}, > "stats": { "version": "50495'8246", > "reported_seq": "84279", > "reported_epoch": "69394", > "state": "down+incomplete", > "last_fresh": "2014-12-01 23:23:07.355308", > "last_change": "2014-12-01 21:28:52.771807", > "last_active": "2014-11-24 13:37:09.784417", > "last_clean": "2014-11-22 21:59:49.821836", > "last_became_active": "0.000000", > "last_unstale": "2014-12-01 23:23:07.355308", > "last_undegraded": "2014-12-01 23:23:07.355308", > "last_fullsized": "2014-12-01 23:23:07.355308", > "mapping_epoch": 68285, > "log_start": "20346'5245", > "ondisk_log_start": "20346'5245", > "created": 1, > "last_epoch_clean": 50495, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "28158'8240", > "last_scrub_stamp": "2014-11-18 17:08:49.368486", > "last_deep_scrub": "28158'8240", > "last_deep_scrub_stamp": "2014-11-18 17:08:49.368486", > "last_clean_scrub_stamp": "2014-11-18 17:08:49.368486", > "log_size": 3001, > "ondisk_log_size": 3001, > > > Also in the peering section, all the peers now have the same last_update: > which makes me think it should just pick up and take off. > > There is another think I’m having problems with and I’m not sure if it’s > related or not. I set a crush map manually as I have a mix of ssd and platter > osds and it seems to work when I set it, the cluster starts rebalancing, etc, > but if I do a restart ceph-all on all my nodes the crush maps seems to revert > to the one I didn’t set. I don’t know if its being blocked from taking by > these incomplete pgs or if I’m missing a step to get it to “stick” It makes > me think when I’m stopping and starting these osds to use > ceph_objectstore_tool on them they may be getting out of sync with the > cluster. > > Any insights would be greatly appreciated, > > Aaron > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com