Re: [ceph-users] pg stuck in peering while power failure

2017-01-10 Thread Craig Chi
Hi Sam,

Thank you for your precise inspection.

I reviewed the log at the time, and I discovered that the cluster failed a OSD 
just after I shut the first unit down. Thus as you said, the pg can't finish 
peering due to the second unit was then shut off suddenly.

Much appreciate your advice, but I aim to keep my cluster working when 2 
storage nodes are down. The unexpected OSD failed with the following log just 
at the time I shut the first unit down:

2017-01-10 12:30:07.905562 mon.1 172.20.1.3:6789/0 28484 : cluster [INF] 
osd.153 172.20.3.2:6810/26796 failed (2 reporters from different host after 
20.072026>= grace 20.00)

But that OSD was not dead actually, more likely had slow response to 
heartbeats. What I think is increasing the osd_heartbeat_grace may somehow 
mitigate the issue.

Sincerely,
Craig Chi

On 2017-01-11 00:08, Samuel Justwrote:
> { "name": "Started\/Primary\/Peering", "enter_time": "2017-01-10 
> 13:43:34.933074", "past_intervals": [ { "first": 75858, "last": 75860, 
> "maybe_went_rw": 1, "up": [ 345, 622, 685, 183, 792, 2147483647, 2147483647, 
> 401, 516 ], "acting": [ 345, 622, 685, 183, 792, 2147483647, 2147483647, 401, 
> 516 ], "primary": 345, "up_primary":345 }, Between 75858 and 75860, 345, 622, 
> 685, 183, 792, 2147483647, 2147483647, 401, 516 was the acting set. The 
> current acting set 345, 622, 685, 183, 2147483647, 2147483647, 153, 401, 516 
> needs *all 7* of the osds from epochs 75858 through 75860 to ensure that it 
> has any writes completed during that time. You can make transient situations 
> like that less of a problem by setting min_size to 8 (though it'll prevent 
> writes with 2 failures until backfill completes). A possible enhancement for 
> an EC pool would be to gather the infos from those osds anyway and use that 
> rule outwrites (if they actually happened, you'd still be stuck). -Sam On 
> Tue, Jan 10, 20
 17 at 5:

36 AM, Craig Chiwrote:>Hi List,>>I am testing the 
stability of my Ceph cluster with power failure.>>I brutally powered off 2 Ceph 
units with each 90 OSDs on it while the client>I/O was continuing.>>Since then, 
some of the pgs of my cluster stucked in peering>>pgmap v3261136: 17408 pgs, 4 
pools, 176 TB data, 5082 kobjects>236 TB used, 5652 TB / 5889 TB 
avail>8563455/38919024 objects degraded (22.003%)>13526 
active+undersized+degraded>3769 active+clean>104 down+remapped+peering>9 
down+peering>>I queried the peering pg (all on EC pool with 7+2) and got 
blocked>information (full query: http://pastebin.com/pRkaMG2h 
)>>"probing_osds": 
[>"153(6)",>"183(3)",>"345(0)",>"401(7)",>"516(8)",>"622(1)",>"685(2)">],>"blocked":
 "peering is blocked due to down osds",>"down_osds_we_would_probe": 
[>792>],>"peering_blocked_by": [>{>"osd": 792,>"current_lost_at": 0,>"comment": 
"starting or marking this osd lost may let us>proceed">}>]>>>osd.792 is exactly 
on one of the unit
 s I powe

red off. And I think the I/O>associated with this pg is paused too.>>I have 
checked the troubleshooting page on Ceph website 
(>http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/>), 
it says that start the OSD or mark it lost can make the procedure>continue.>>I 
am sure that my cluster was healthy before the power outage occurred. I 
am>wondering if the power outage really happens in production environment, 
will>it also freeze my client I/O if I don't do anything? Since I just lost 
2>redundancies (I have erasure code with 7+2), I think it should still 
serve>normal functionality.>>Or if I am doing something wrong? Please give me 
some suggestions, thanks.>>Sincerely,>Craig 
Chi>>___>ceph-users mailing 
list>ceph-users@lists.ceph.com>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pg stuck in peering while power failure

2017-01-10 Thread Samuel Just
{
"name": "Started\/Primary\/Peering",
"enter_time": "2017-01-10 13:43:34.933074",
"past_intervals": [
{
"first": 75858,
"last": 75860,
"maybe_went_rw": 1,
"up": [
345,
622,
685,
183,
792,
2147483647,
2147483647,
401,
516
],
"acting": [
345,
622,
685,
183,
792,
2147483647,
2147483647,
401,
516
],
"primary": 345,
"up_primary": 345
},

Between 75858 and 75860,

345,
622,
685,
183,
792,
2147483647,
2147483647,
401,
516

was the acting set.  The current acting set

345,
622,
685,
183,
2147483647,
2147483647,
153,
401,
516

needs *all 7* of the osds from epochs 75858 through 75860 to ensure
that it has any writes completed during that time.  You can make
transient situations like that less of a problem by setting min_size
to 8 (though it'll prevent writes with 2 failures until backfill
completes).  A possible enhancement for an EC pool would be to gather
the infos from those osds anyway and use that rule out writes (if they
actually happened, you'd still be stuck).
-Sam

On Tue, Jan 10, 2017 at 5:36 AM, Craig Chi  wrote:
> Hi List,
>
> I am testing the stability of my Ceph cluster with power failure.
>
> I brutally powered off 2 Ceph units with each 90 OSDs on it while the client
> I/O was continuing.
>
> Since then, some of the pgs of my cluster stucked in peering
>
>   pgmap v3261136: 17408 pgs, 4 pools, 176 TB data, 5082 kobjects
> 236 TB used, 5652 TB / 5889 TB avail
> 8563455/38919024 objects degraded (22.003%)
>13526 active+undersized+degraded
> 3769 active+clean
>  104 down+remapped+peering
>9 down+peering
>
> I queried the peering pg (all on EC pool with 7+2) and got blocked
> information (full query: http://pastebin.com/pRkaMG2h )
>
> "probing_osds": [
> "153(6)",
> "183(3)",
> "345(0)",
> "401(7)",
> "516(8)",
> "622(1)",
> "685(2)"
> ],
> "blocked": "peering is blocked due to down osds",
> "down_osds_we_would_probe": [
> 792
> ],
> "peering_blocked_by": [
> {
> "osd": 792,
> "current_lost_at": 0,
> "comment": "starting or marking this osd lost may let us
> proceed"
> }
> ]
>
>
> osd.792 is exactly on one of the units I powered off. And I think the I/O
> associated with this pg is paused too.
>
> I have checked the troubleshooting page on Ceph website (
> http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/
> ), it says that start the OSD or mark it lost can make the procedure
> continue.
>
> I am sure that my cluster was healthy before the power outage occurred. I am
> wondering if the power outage really happens in production environment, will
> it also freeze my client I/O if I don't do anything? Since I just lost 2
> redundancies (I have erasure code with 7+2), I think it should still serve
> normal functionality.
>
> Or if I am doing something wrong? Please give me some suggestions, thanks.
>
> Sincerely,
> Craig Chi
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg stuck in peering while power failure

2017-01-10 Thread Craig Chi
Hi List,

I am testing the stability of my Ceph cluster with power failure.

I brutally powered off 2 Ceph units with each 90 OSDs on it while the client 
I/O was continuing.

Since then, some of the pgs of my cluster stucked in peering

pgmap v3261136: 17408 pgs, 4 pools, 176 TB data, 5082 kobjects
236 TB used, 5652 TB / 5889 TB avail
8563455/38919024 objects degraded (22.003%)
13526 active+undersized+degraded
3769 active+clean
104 down+remapped+peering
9 down+peering

I queried the peering pg (all on EC pool with 7+2) and got blocked information 
(full query:http://pastebin.com/pRkaMG2h)

"probing_osds": [
"153(6)",
"183(3)",
"345(0)",
"401(7)",
"516(8)",
"622(1)",
"685(2)"
],
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
792
],
"peering_blocked_by": [
{
"osd": 792,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"
}
]


osd.792 is exactly on one of the units I powered off. And I think the I/O 
associated with this pg is paused too.

I have checked the troubleshooting page on Ceph website 
(http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/), 
it says that start the OSD or mark it lost can make the procedure continue.

I am sure that my cluster was healthy before the power outage occurred. I am 
wondering if the power outage really happens in production environment, will it 
also freeze my client I/O if I don't do anything? Since I just lost 2 
redundancies (I have erasure code with 7+2), I think it should still serve 
normal functionality.

Or if I am doing something wrong? Please give me some suggestions, thanks.

Sincerely,
Craig Chi___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com