Incomplete means "Ceph detects that a placement group is missing a
necessary period of history from its log. If you see this state, report a
bug, and try to start any failed OSDs that may contain the needed
information".

In the PG query, it lists some OSDs that it's trying to probe:

          "probing_osds": [
                "10",
                "13",
                "15",
                "25"],
          "down_osds_we_would_probe": [],


Is one of those the OSD you replaced?  If so, you might try ceph pg {pg-id}
mark_unfound_lost revert|delete.  That command will lose data; it tells
Ceph to give up looking for data that it can't find, so you might want to
wait a bit.


There's also the possibility that your crushmap is still not correct.  In
the history, I can see that you had bad CRUSH MAPs in the past.  Stuff like

  "recovery_state": [
        { "name": "Started\/Primary\/Peering",
          "enter_time": "2014-10-21 12:18:48.482666",
          "past_intervals": [
                { "first": 4663,
                  "last": 4685,
                  "maybe_went_rw": 1,
                  "up": [],
                  "acting": [
                        10,
                        25,
                        10,
                        -1]},


shows that CRUSH placed some data on osd.10 twice, which is a sign of a
bad crushmap.  You might run through the crushtool testing at
http://ceph.com/docs/master/man/8/crushtool/, just to make sure everything
is kosher.



On Tue, Oct 21, 2014 at 7:04 PM, Chris Kitzmiller <ckitzmil...@hampshire.edu
> wrote:

> I've gotten myself into the position of having ~100 incomplete PGs. All of
> my OSDs are up+in (and I've restarted them all one by one).
>
> I was in the process of rebalancing after altering my CRUSH map when I
> lost an OSD backing disk. I replaced that OSD and it seemed to be
> backfilling well. During this time I noticed that I had 2 underperforming
> disks which were holding up the backfilling process. I set them out to try
> and get everything recovered but *I think* this is what caused some of my
> PGs to go incomplete. Since then I set those two underperformers back in
> and they're still backfilling now.
>
> Any help would be appreciated in troubleshooting these PGs. I'm not sure
> why they're incomplete or what to do about it. A query of one of my
> incomplete PGs can be found here: http://pastebin.com/raw.php?i=AJ3RMjz6
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to