Re: [ceph-users] unsubscribe

2019-07-12 Thread Brian Topping
It’s in the mail headers on every email: 
mailto:ceph-users-requ...@lists.ceph.com?subject=unsubscribe

> On Jul 12, 2019, at 5:00 PM, Robert Stanford  wrote:
> 
> unsubscribe
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] unsubscribe

2019-07-12 Thread Robert Stanford
unsubscribe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 14.2.1 OSDs crash and sometimes fail to start back up, workaround

2019-07-12 Thread Edward Kalk
Slight correction. I removed and added back only the OSDs that were crashing.
I noticed it seemed to be only certain OSDs that were crashing. Once they were 
rebuilt, they stopped crashing.

Further info, We originally had deployed Luminous code, upgraded to mimic, then 
upgraded to nautilus.
Perhaps there was issues with OSDs related to upgrades? I don’t know.
Perhaps a clean install of 14.2.1 would not have done this? I don’t know.

-Ed

> On Jul 12, 2019, at 11:32 AM, Edward Kalk  wrote:
> 
> It seems that I have been able to workaround my issues.
> I’ve attempted to reproduce by rebooting nodes and using the stop all OSDs 
> wait a bit and start them.
> At this time, no OSDs are crashing like before. OSDs seem to have no problems 
> starting either.
> What I did is remove completely the OSDs one at a time and reissue them 
> allowing CEPH 14.2.1 to reengineer them.
>  I have attached my doc I use to accomplish 
> this. *BEfore I do it, I mark the OSD as “out” via the GUI or CLI and allow 
> it to reweight to 0%, this is monitored via Ceph -s. I do this so that there 
> is not an actual disk fail which then puts me into dual disk fail when I’m 
> rebuilding an OSD.
> 
> -Edward Kalk
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 14.2.1 OSDs crash and sometimes fail to start back up, workaround

2019-07-12 Thread Edward Kalk
It seems that I have been able to workaround my issues.
I’ve attempted to reproduce by rebooting nodes and using the stop all OSDs wait 
a bit and start them.
At this time, no OSDs are crashing like before. OSDs seem to have no problems 
starting either.
What I did is remove completely the OSDs one at a time and reissue them 
allowing CEPH 14.2.1 to reengineer them.
Remove a disk:
1.) see which OSD is which disk: sudo ceph-volume lvm list

2.) ceph osd out X
EX:
synergy@synergy3:~$ ceph osd out 21
marked out osd.21.

2.a) ceph osd down osd.X
Ex:
ceph osd down osd.21

2.aa) Stop OSD daemon: sudo systemctl stop ceph-osd@X
EX:
sudo systemctl stop ceph-osd@21

2.b) ceph osd rm osd.X
EX:
ceph osd rm osd.21

3.) check status : ceph -s

4.)Observe data migration: ceph -w

5.) remove from CRUSH: ceph osd crush remove {name}
EX: ceph osd crush remove osd.21
5.b) del auth: ceph auth del osd.21

6.) find info on disk:
sudo hdparm -I /dev/sdd

7.) see sata ports: lsscsi --verbose

8.) Go pull the disk and replace it, or not and do the following steps to 
re-use it.

additional steps to remove and reuse a disk: (without ejecting, as ejecting and 
replace drops this for us)
(do this last after following the CEPH docs for remove a disk.)
9.) sudo gdisk /dev/sdX (x,z,Y,Y)
9.a)
 94  lsblk
 95  dmsetup remove 
ceph--e36dc03d--bf0d--462a--b4e6--8e49819bec0b-osd--block--d5574ac1--f72f--4942--8f4a--ac24891b2ee6

 10.) deploy a /dev/sdX disk: from 216.106.44.209 (ceph-mon0) you must be in 
the "my_cluster" folder:
EX: Synergy@Ceph-Mon0:~/my_cluster$ ceph-deploy osd create --data /dev/sdd 
synergy1
 I have attached my doc I use to accomplish this. *BEfore I do it, I mark the 
OSD as “out” via the GUI or CLI and allow it to reweight to 0%, this is 
monitored via Ceph -s. I do this so that there is not an actual disk fail which 
then puts me into dual disk fail when I’m rebuilding an OSD.

-Edward Kalk

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "session established", "io error", "session lost, hunting for new mon" solution/fix

2019-07-12 Thread Marc Roos
 
Thanks Ilya for explaining. Am I correct to understand from the link[0] 
mentioned in the issue, that because eg. I have an unhealthy state for 
some time (1 pg on a insignificant pool) I have larger osdmaps, 
triggering this issue? Or is just random bad luck? (Just a bit curious 
why I have this issue)

[0]
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg51522.html

-Original Message-
Subject: Re: [ceph-users] "session established", "io error", "session 
lost, hunting for new mon" solution/fix

On Fri, Jul 12, 2019 at 12:33 PM Paul Emmerich  
wrote:
>
>
>
> On Thu, Jul 11, 2019 at 11:36 PM Marc Roos  
wrote:
>> Anyone know why I would get these? Is it not strange to get them in a 

>> 'standard' setup?
>
> you are probably running on an ancient kernel. this bug has been fixed 
a long time ago.

This is not a kernel bug:

http://tracker.ceph.com/issues/38040

It is possible to hit with few OSDs too.  The actual problem is the size 
of the osdmap message which can contain multiple full osdmaps, not the 
number of OSDs.  The size of a full osdmap is proportional to the number 
of OSDs but it's not the only way to get a big osdmap message.

As you have experienced, these settings used to be expressed in the 
number of osdmaps and our defaults were too high for a stream of full 
osdmaps (as opposed to incrementals).  It is now expressed in bytes, the 
patch should be in 12.2.13.

>
>> -Original Message-
>> Subject: [ceph-users] "session established", "io error", "session 
>> lost, hunting for new mon" solution/fix
>>
>>
>> I have on a cephfs client again (luminous cluster, centos7, only 32 
>> osds!). Wanted to share the 'fix'
>>
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session 
>> established [Thu Jul 11 12:16:09 2019] libceph: mon0 
>> 192.168.10.111:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon0 

>> 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 
>> 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session established 
>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io error 

>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session 
>> lost, hunting for new mon [Thu Jul 11 12:16:09 2019] libceph: mon0 
>> 192.168.10.111:6789 session established [Thu Jul 11 12:16:09 2019] 
>> libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 12:16:09 2019] 

>> libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon 
>> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session 
>> established [Thu Jul 11 12:16:09 2019] libceph: mon1 
>> 192.168.10.112:6789 io error [Thu Jul 11 12:16:09 2019] libceph: mon1 

>> 192.168.10.112:6789 session lost, hunting for new mon
>>
>> 1) I blocked client access to the monitors with iptables -I INPUT -p 
>> tcp -s 192.168.10.43 --dport 6789 -j REJECT Resulting in
>>
>> [Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket 
>> closed (con state CONNECTING) [Thu Jul 11 12:34:18 2019] libceph: 
>> mon1 192.168.10.112:6789 socket closed (con state CONNECTING) [Thu 
>> Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket closed 

>> (con state CONNECTING) [Thu Jul 11 12:34:26 2019] libceph: mon2 
>> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket closed (con 
>> state CONNECTING) [Thu Jul 11 12:34:28 2019] libceph: mon2 
>> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket closed (con 
>> state CONNECTING) [Thu Jul 11 12:34:30 2019] libceph: mon2 
>> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket closed (con 
>> state CONNECTING) [Thu Jul 11 12:34:42 2019] libceph: mon2 
>> 192.168.10.113:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket closed (con 
>> state CONNECTING) [Thu Jul 11 12:34:45 2019] libceph: mon0 
>> 192.168.10.111:6789 socket closed (con state CONNECTING) [Thu Jul 11 
>> 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket closed (con 
>> state CONNECTING)
>>
>> 2) I applied the suggested changes to the osd map message max, 
>> mentioned
>>
>> in early threads[0]
>> ceph tell osd.* injectargs '--osd_map_message_max=10'
>> ceph tell mon.* injectargs '--osd_map_message_max=10'
>> [@c01 ~]# ceph daemon osd.0 config show|grep message_max
>> "osd_map_message_max": "10",
>> [@c01 ~]# ceph daemon mon.a config show|grep message_max
>> "osd_map_message_max": "10",
>>
>> [0]
>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg54419.html
>> http://tracker.ceph.com/issues/38040
>>
>> 3) Allow access to a monitor with
>> iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
>>
>> Getting
>> [Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 session 
>> established [Thu Jul 11 12:39:26 2019] 

Re: [ceph-users] Pool stats issue with upgrades to nautilus

2019-07-12 Thread Nathan Fish
Excellent! I have been checking the tracker
(https://tracker.ceph.com/versions/574) every day, and there hadn't
been any movement for weeks.

On Fri, Jul 12, 2019 at 11:29 AM Sage Weil  wrote:
>
> On Fri, 12 Jul 2019, Nathan Fish wrote:
> > Thanks. Speaking of 14.2.2, is there a timeline for it? We really want
> > some of the fixes in it as soon as possible.
>
> I think it's basically ready now... probably Monday?
>
> sage
>
> >
> > On Fri, Jul 12, 2019 at 11:22 AM Sage Weil  wrote:
> > >
> > > Hi everyone,
> > >
> > > All current Nautilus releases have an issue where deploying a single new
> > > (Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was
> > > originally deployed pre-Nautilus) breaks the pool utilization stats
> > > reported by ``ceph df``.  Until all OSDs have been reprovisioned or
> > > updated (via ``ceph-bluestore-tool repair``), the pool stats will show
> > > values that are lower than the true value.  A fix is in the works but will
> > > not appear until 14.2.3.  Users who have upgraded to Nautilus (or are
> > > considering upgrading) may want to delay provisioning new OSDs until the
> > > fix is available in the next release.
> > >
> > > This issue will only affect you if:
> > >
> > > - You started with a pre-nautilus cluster and upgraded
> > > - You then provision one or more new BlueStore OSDs, or run
> > >   'ceph-bluestore-tool repair' on an upgraded OSD.
> > >
> > > The symptom is that the pool stats from 'ceph df' are too small.  For
> > > example, the pre-upgrade stats on our test cluster were
> > >
> > > ...
> > > POOLS:
> > > POOL   ID  STORED  OBJECTS USED   
> > >  %USED MAX AVAIL
> > > data 0  63 TiB  44.59M  63 
> > > TiB 30.2148 TiB
> > > ...
> > >
> > > but when one OSD was updated it changed to
> > >
> > > POOLS:
> > > POOL   ID  STORED  OBJECTS USED   
> > >  %USED MAX AVAIL
> > > data 0 558 GiB  43.50M 1.7 
> > > TiB  1.2245 TiB
> > >
> > > The root cause is that, starting with Nautilus, BlueStore maintains
> > > per-pool usage stats, but it requires a slight on-disk format change;
> > > upgraded OSDs won't have the new stats until you run a ceph-bluestore-tool
> > > repair.  The problem is that the mon starts using the new stats as soon as
> > > *any* OSDs are reporting per-pool stats (instead of waiting until *all*
> > > OSDs are doing so).
> > >
> > > To avoid the issue, either
> > >
> > >  - do not provision new BlueStore OSDs after the upgrade, or
> > >  - update all OSDs to keep new per-pool stats.  An existing BlueStore
> > >OSD can be converted with
> > >
> > >  systemctl stop ceph-osd@$N
> > >  ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-$N
> > >  systemctl start ceph-osd@$N
> > >
> > >Note that FileStore does not support the new per-pool stats at all, so
> > >if there are filestore OSDs in your cluster there is no workaround
> > >that doesn't involve replacing the filestore OSDs with bluestore.
> > >
> > > A fix[1] is working it's way through QA and will appear in 14.2.3; it
> > > won't quite make the 14.2.2 release.
> > >
> > > sage
> > >
> > >
> > > [1] https://github.com/ceph/ceph/pull/28978
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool stats issue with upgrades to nautilus

2019-07-12 Thread Sage Weil
On Fri, 12 Jul 2019, Nathan Fish wrote:
> Thanks. Speaking of 14.2.2, is there a timeline for it? We really want
> some of the fixes in it as soon as possible.

I think it's basically ready now... probably Monday?

sage

> 
> On Fri, Jul 12, 2019 at 11:22 AM Sage Weil  wrote:
> >
> > Hi everyone,
> >
> > All current Nautilus releases have an issue where deploying a single new
> > (Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was
> > originally deployed pre-Nautilus) breaks the pool utilization stats
> > reported by ``ceph df``.  Until all OSDs have been reprovisioned or
> > updated (via ``ceph-bluestore-tool repair``), the pool stats will show
> > values that are lower than the true value.  A fix is in the works but will
> > not appear until 14.2.3.  Users who have upgraded to Nautilus (or are
> > considering upgrading) may want to delay provisioning new OSDs until the
> > fix is available in the next release.
> >
> > This issue will only affect you if:
> >
> > - You started with a pre-nautilus cluster and upgraded
> > - You then provision one or more new BlueStore OSDs, or run
> >   'ceph-bluestore-tool repair' on an upgraded OSD.
> >
> > The symptom is that the pool stats from 'ceph df' are too small.  For
> > example, the pre-upgrade stats on our test cluster were
> >
> > ...
> > POOLS:
> > POOL   ID  STORED  OBJECTS USED 
> >%USED MAX AVAIL
> > data 0  63 TiB  44.59M  63 TiB  
> >30.2148 TiB
> > ...
> >
> > but when one OSD was updated it changed to
> >
> > POOLS:
> > POOL   ID  STORED  OBJECTS USED 
> >%USED MAX AVAIL
> > data 0 558 GiB  43.50M 1.7 TiB  
> > 1.2245 TiB
> >
> > The root cause is that, starting with Nautilus, BlueStore maintains
> > per-pool usage stats, but it requires a slight on-disk format change;
> > upgraded OSDs won't have the new stats until you run a ceph-bluestore-tool
> > repair.  The problem is that the mon starts using the new stats as soon as
> > *any* OSDs are reporting per-pool stats (instead of waiting until *all*
> > OSDs are doing so).
> >
> > To avoid the issue, either
> >
> >  - do not provision new BlueStore OSDs after the upgrade, or
> >  - update all OSDs to keep new per-pool stats.  An existing BlueStore
> >OSD can be converted with
> >
> >  systemctl stop ceph-osd@$N
> >  ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-$N
> >  systemctl start ceph-osd@$N
> >
> >Note that FileStore does not support the new per-pool stats at all, so
> >if there are filestore OSDs in your cluster there is no workaround
> >that doesn't involve replacing the filestore OSDs with bluestore.
> >
> > A fix[1] is working it's way through QA and will appear in 14.2.3; it
> > won't quite make the 14.2.2 release.
> >
> > sage
> >
> >
> > [1] https://github.com/ceph/ceph/pull/28978
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pool stats issue with upgrades to nautilus

2019-07-12 Thread Nathan Fish
Thanks. Speaking of 14.2.2, is there a timeline for it? We really want
some of the fixes in it as soon as possible.

On Fri, Jul 12, 2019 at 11:22 AM Sage Weil  wrote:
>
> Hi everyone,
>
> All current Nautilus releases have an issue where deploying a single new
> (Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was
> originally deployed pre-Nautilus) breaks the pool utilization stats
> reported by ``ceph df``.  Until all OSDs have been reprovisioned or
> updated (via ``ceph-bluestore-tool repair``), the pool stats will show
> values that are lower than the true value.  A fix is in the works but will
> not appear until 14.2.3.  Users who have upgraded to Nautilus (or are
> considering upgrading) may want to delay provisioning new OSDs until the
> fix is available in the next release.
>
> This issue will only affect you if:
>
> - You started with a pre-nautilus cluster and upgraded
> - You then provision one or more new BlueStore OSDs, or run
>   'ceph-bluestore-tool repair' on an upgraded OSD.
>
> The symptom is that the pool stats from 'ceph df' are too small.  For
> example, the pre-upgrade stats on our test cluster were
>
> ...
> POOLS:
> POOL   ID  STORED  OBJECTS USED   
>  %USED MAX AVAIL
> data 0  63 TiB  44.59M  63 TiB
>  30.2148 TiB
> ...
>
> but when one OSD was updated it changed to
>
> POOLS:
> POOL   ID  STORED  OBJECTS USED   
>  %USED MAX AVAIL
> data 0 558 GiB  43.50M 1.7 TiB
>   1.2245 TiB
>
> The root cause is that, starting with Nautilus, BlueStore maintains
> per-pool usage stats, but it requires a slight on-disk format change;
> upgraded OSDs won't have the new stats until you run a ceph-bluestore-tool
> repair.  The problem is that the mon starts using the new stats as soon as
> *any* OSDs are reporting per-pool stats (instead of waiting until *all*
> OSDs are doing so).
>
> To avoid the issue, either
>
>  - do not provision new BlueStore OSDs after the upgrade, or
>  - update all OSDs to keep new per-pool stats.  An existing BlueStore
>OSD can be converted with
>
>  systemctl stop ceph-osd@$N
>  ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-$N
>  systemctl start ceph-osd@$N
>
>Note that FileStore does not support the new per-pool stats at all, so
>if there are filestore OSDs in your cluster there is no workaround
>that doesn't involve replacing the filestore OSDs with bluestore.
>
> A fix[1] is working it's way through QA and will appear in 14.2.3; it
> won't quite make the 14.2.2 release.
>
> sage
>
>
> [1] https://github.com/ceph/ceph/pull/28978
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Pool stats issue with upgrades to nautilus

2019-07-12 Thread Sage Weil
Hi everyone,

All current Nautilus releases have an issue where deploying a single new 
(Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was 
originally deployed pre-Nautilus) breaks the pool utilization stats 
reported by ``ceph df``.  Until all OSDs have been reprovisioned or 
updated (via ``ceph-bluestore-tool repair``), the pool stats will show 
values that are lower than the true value.  A fix is in the works but will 
not appear until 14.2.3.  Users who have upgraded to Nautilus (or are 
considering upgrading) may want to delay provisioning new OSDs until the 
fix is available in the next release.

This issue will only affect you if:

- You started with a pre-nautilus cluster and upgraded
- You then provision one or more new BlueStore OSDs, or run 
  'ceph-bluestore-tool repair' on an upgraded OSD.

The symptom is that the pool stats from 'ceph df' are too small.  For 
example, the pre-upgrade stats on our test cluster were

...
POOLS:
POOL   ID  STORED  OBJECTS USED
%USED MAX AVAIL 
data 0  63 TiB  44.59M  63 TiB 
30.2148 TiB 
...

but when one OSD was updated it changed to

POOLS:
POOL   ID  STORED  OBJECTS USED
%USED MAX AVAIL 
data 0 558 GiB  43.50M 1.7 TiB  
1.2245 TiB 

The root cause is that, starting with Nautilus, BlueStore maintains 
per-pool usage stats, but it requires a slight on-disk format change; 
upgraded OSDs won't have the new stats until you run a ceph-bluestore-tool 
repair.  The problem is that the mon starts using the new stats as soon as 
*any* OSDs are reporting per-pool stats (instead of waiting until *all* 
OSDs are doing so).

To avoid the issue, either

 - do not provision new BlueStore OSDs after the upgrade, or
 - update all OSDs to keep new per-pool stats.  An existing BlueStore
   OSD can be converted with

 systemctl stop ceph-osd@$N
 ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-$N
 systemctl start ceph-osd@$N

   Note that FileStore does not support the new per-pool stats at all, so 
   if there are filestore OSDs in your cluster there is no workaround  
   that doesn't involve replacing the filestore OSDs with bluestore.

A fix[1] is working it's way through QA and will appear in 14.2.3; it 
won't quite make the 14.2.2 release.

sage


[1] https://github.com/ceph/ceph/pull/28978
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "session established", "io error", "session lost, hunting for new mon" solution/fix

2019-07-12 Thread Ilya Dryomov
On Fri, Jul 12, 2019 at 12:33 PM Paul Emmerich  wrote:
>
>
>
> On Thu, Jul 11, 2019 at 11:36 PM Marc Roos  wrote:
>> Anyone know why I would get these? Is it not strange to get them in a
>> 'standard' setup?
>
> you are probably running on an ancient kernel. this bug has been fixed a long 
> time ago.

This is not a kernel bug:

http://tracker.ceph.com/issues/38040

It is possible to hit with few OSDs too.  The actual problem is the
size of the osdmap message which can contain multiple full osdmaps, not
the number of OSDs.  The size of a full osdmap is proportional to the
number of OSDs but it's not the only way to get a big osdmap message.

As you have experienced, these settings used to be expressed in the
number of osdmaps and our defaults were too high for a stream of full
osdmaps (as opposed to incrementals).  It is now expressed in bytes,
the patch should be in 12.2.13.

>
> Paul
>
>> -Original Message-
>> Subject: [ceph-users] "session established", "io error", "session lost,
>> hunting for new mon" solution/fix
>>
>>
>> I have on a cephfs client again (luminous cluster, centos7, only 32
>> osds!). Wanted to share the 'fix'
>>
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
>> established
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
>> lost, hunting for new mon
>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session
>> established
>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io error
>> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session
>> lost, hunting for new mon
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
>> established
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error
>> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
>> lost, hunting for new mon
>> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session
>> established
>> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 io error
>> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session
>> lost, hunting for new mon
>>
>> 1) I blocked client access to the monitors with
>> iptables -I INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
>> Resulting in
>>
>> [Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:18 2019] libceph: mon1 192.168.10.112:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:26 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:28 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:30 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:42 2019] libceph: mon2 192.168.10.113:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:45 2019] libceph: mon0 192.168.10.111:6789 socket
>> closed (con state CONNECTING)
>> [Thu Jul 11 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket
>> closed (con state CONNECTING)
>>
>> 2) I applied the suggested changes to the osd map message max, mentioned
>>
>> in early threads[0]
>> ceph tell osd.* injectargs '--osd_map_message_max=10'
>> ceph tell mon.* injectargs '--osd_map_message_max=10'
>> [@c01 ~]# ceph daemon osd.0 config show|grep message_max
>> "osd_map_message_max": "10",
>> [@c01 ~]# ceph daemon mon.a config show|grep message_max
>> "osd_map_message_max": "10",
>>
>> [0]
>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg54419.html
>> http://tracker.ceph.com/issues/38040
>>
>> 3) Allow access to a monitor with
>> iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
>>
>> Getting
>> [Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 session
>> established
>> [Thu Jul 11 12:39:26 2019] libceph: osd0 down
>> [Thu Jul 11 12:39:26 2019] libceph: osd0 up
>>
>> Problems solved, in D state hung unmount was released.
>>
>> I am not sure if the prolonged disconnection to the monitors was the
>> solution or the osd_map_message_max=10, or both.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 3 OSDs stopped and unable to restart

2019-07-12 Thread Igor Fedotov

Left some notes in the ticket..

On 7/11/2019 10:32 PM, Brett Chancellor wrote:
We moved the .rgw.meta data pool over to SSD to try and improve 
performance, during the backfill SSDs bgan dying in mass. Log attached 
to this case

https://tracker.ceph.com/issues/40741

Right now the SSD's wont come up with either allocator and the cluster 
is pretty much dead.


What are the consequences of deleting the .rgw.meta pool? Can it be 
recreated?


On Wed, Jul 10, 2019 at 3:31 PM ifedo...@suse.de 
 mailto:ifedo...@suse.de>> 
wrote:


You might want to try manual rocksdb compaction using
ceph-kvstore-tool..

Sent from my Huawei tablet


 Original Message 
Subject: Re: [ceph-users] 3 OSDs stopped and unable to restart
From: Brett Chancellor
To: Igor Fedotov
CC: Ceph Users

Once backfilling finished, the cluster was super slow,
most osd's were filled with heartbeat_map errors.  When an
OSD restarts it causes a cascade of other osd's to follow
suit and restart.. logs like..
  -3> 2019-07-10 18:34:50.046 7f34abf5b700 -1 osd.69
1348581 get_health_metrics reporting 21 slow ops, oldest
is osd_op(client.115295041.0:17575966 15.c37fa482
15.c37fa482 (undecoded)
ack+ondisk+write+known_if_redirected e1348522)
    -2> 2019-07-10 18:34:50.967 7f34acf5d700  1
heartbeat_map is_healthy 'OSD::osd_op_tp thread
0x7f3493f2b700' had timed out after 90
    -1> 2019-07-10 18:34:50.967 7f34acf5d700  1
heartbeat_map is_healthy 'OSD::osd_op_tp thread
0x7f3493f2b700' had suicide timed out after 150
     0> 2019-07-10 18:34:51.025 7f3493f2b700 -1 *** Caught
signal (Aborted) **
 in thread 7f3493f2b700 thread_name:tp_osd_tp

 ceph version 14.2.1
(d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)
 1: (()+0xf5d0) [0x7f34b57c25d0]
 2: (pread64()+0x33) [0x7f34b57c1f63]
 3: (KernelDevice::read_random(unsigned long, unsigned
long, char*, bool)+0x238) [0x55bfdae5a448]
 4: (BlueFS::_read_random(BlueFS::FileReader*, unsigned
long, unsigned long, char*)+0xca) [0x55bfdae1271a]
 5: (BlueRocksRandomAccessFile::Read(unsigned long,
unsigned long, rocksdb::Slice*, char*) const+0x20)
[0x55bfdae3b440]
 6: (rocksdb::RandomAccessFileReader::Read(unsigned long,
unsigned long, rocksdb::Slice*, char*) const+0x960)
[0x55bfdb466ba0]
 7: (rocksdb::BlockFetcher::ReadBlockContents()+0x3e7)
[0x55bfdb420c27]
 8: (()+0x11146a4) [0x55bfdb40d6a4]
 9:

(rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::FilePrefetchBuffer*,
rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions
const&, rocksdb::BlockHandle const&, rocksdb::Slice,
rocksdb::BlockBasedTable::CachableEntry*,
bool, rocksdb::GetContext*)+0x2cc) [0x55bfdb40f63c]
 10: (rocksdb::DataBlockIter*

rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*,
rocksdb::ReadOptions const&, rocksdb::BlockHandle const&,
rocksdb::DataBlockIter*, bool, bool, bool,
rocksdb::GetContext*, rocksdb::Status,
rocksdb::FilePrefetchBuffer*)+0x169) [0x55bfdb41cb29]
 11:
(rocksdb::BlockBasedTableIterator::InitDataBlock()+0xc8) [0x55bfdb41e588]
 12:
(rocksdb::BlockBasedTableIterator::FindKeyForward()+0x8d) [0x55bfdb41e89d]
 13: (()+0x10adde9) [0x55bfdb3a6de9]
 14: (rocksdb::MergingIterator::Next()+0x44) [0x55bfdb4357c4]
 15: (rocksdb::DBIter::FindNextUserEntryInternal(bool,
bool)+0x762) [0x55bfdb32a092]
 16: (rocksdb::DBIter::Next()+0x1d6) [0x55bfdb32b6c6]
 17:
(RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d)
[0x55bfdad9fa8d]
 18: (BlueStore::_collection_list(BlueStore::Collection*,
ghobject_t const&, ghobject_t const&, int,
std::vector >*,
ghobject_t*)+0xdf6) [0x55bfdad12466]
 19:

(BlueStore::collection_list(boost::intrusive_ptr&,
ghobject_t const&, ghobject_t const&, int,
std::vector >*,
ghobject_t*)+0x9b) [0x55bfdad1393b]
 20: (PG::_delete_some(ObjectStore::Transaction*)+0x1e0)
[0x55bfda984120]
 21: (PG::RecoveryState::Deleting::react(PG::DeleteSome
const&)+0x38) [0x55bfda985598]
 22:
(boost::statechart::simple_state,

(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
const&, void const*)+0x16a) 

Re: [ceph-users] "session established", "io error", "session lost, hunting for new mon" solution/fix

2019-07-12 Thread Marc Roos
 
Paul, this should have been/is back ported to this kernel not?


-Original Message-
From: Paul Emmerich [mailto:paul.emmer...@croit.io] 
Cc: ceph-users
Subject: Re: [ceph-users] "session established", "io error", "session 
lost, hunting for new mon" solution/fix

 

Anyone know why I would get these? Is it not strange to get them in 
a 
'standard' setup?



you are probably running on an ancient kernel. this bug has been fixed a 
long time ago.


Paul

 






-Original Message-
Subject: [ceph-users] "session established", "io error", "session 
lost, 
hunting for new mon" solution/fix


I have on a cephfs client again (luminous cluster, centos7, only 32 

osds!). Wanted to share the 'fix'

[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 
session 
established
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io 
error
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 
session 
lost, hunting for new mon
[Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 
session 
established
[Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io 
error
[Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 
session 
lost, hunting for new mon
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 
session 
established
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io 
error
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 
session 
lost, hunting for new mon
[Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 
session 
established
[Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 io 
error
[Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 
session 
lost, hunting for new mon

1) I blocked client access to the monitors with
iptables -I INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
Resulting in 

[Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:18 2019] libceph: mon1 192.168.10.112:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:26 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:28 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:30 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:42 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:45 2019] libceph: mon0 192.168.10.111:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket 

closed (con state CONNECTING)

2) I applied the suggested changes to the osd map message max, 
mentioned 

in early threads[0]
ceph tell osd.* injectargs '--osd_map_message_max=10'
ceph tell mon.* injectargs '--osd_map_message_max=10'
[@c01 ~]# ceph daemon osd.0 config show|grep message_max
"osd_map_message_max": "10",
[@c01 ~]# ceph daemon mon.a config show|grep message_max
"osd_map_message_max": "10",

[0]
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg54419.htm
l
http://tracker.ceph.com/issues/38040

3) Allow access to a monitor with
iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT

Getting 
[Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 
session 
established
[Thu Jul 11 12:39:26 2019] libceph: osd0 down
[Thu Jul 11 12:39:26 2019] libceph: osd0 up

Problems solved, in D state hung unmount was released. 

I am not sure if the prolonged disconnection to the monitors was 
the 
solution or the osd_map_message_max=10, or both. 





___
ceph-users mailing list

Re: [ceph-users] "session established", "io error", "session lost, hunting for new mon" solution/fix

2019-07-12 Thread Marc Roos
 

 
Hi Paul, 

Thanks for your reply, I am running 3.10.0-957.12.2.el7.x86_64, it is 
from may 2019.



-Original Message-
From: Paul Emmerich [mailto:paul.emmer...@croit.io] 
Sent: vrijdag 12 juli 2019 12:34
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] "session established", "io error", "session 
lost, hunting for new mon" solution/fix



On Thu, Jul 11, 2019 at 11:36 PM Marc Roos  
wrote:


 

Anyone know why I would get these? Is it not strange to get them in 
a 
'standard' setup?



you are probably running on an ancient kernel. this bug has been fixed a 
long time ago.


Paul

 






-Original Message-
Subject: [ceph-users] "session established", "io error", "session 
lost, 
hunting for new mon" solution/fix


I have on a cephfs client again (luminous cluster, centos7, only 32 

osds!). Wanted to share the 'fix'

[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 
session 
established
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io 
error
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 
session 
lost, hunting for new mon
[Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 
session 
established
[Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io 
error
[Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 
session 
lost, hunting for new mon
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 
session 
established
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io 
error
[Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 
session 
lost, hunting for new mon
[Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 
session 
established
[Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 io 
error
[Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 
session 
lost, hunting for new mon

1) I blocked client access to the monitors with
iptables -I INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
Resulting in 

[Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:18 2019] libceph: mon1 192.168.10.112:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:26 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:28 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:30 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:42 2019] libceph: mon2 192.168.10.113:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:45 2019] libceph: mon0 192.168.10.111:6789 socket 

closed (con state CONNECTING)
[Thu Jul 11 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket 

closed (con state CONNECTING)

2) I applied the suggested changes to the osd map message max, 
mentioned 

in early threads[0]
ceph tell osd.* injectargs '--osd_map_message_max=10'
ceph tell mon.* injectargs '--osd_map_message_max=10'
[@c01 ~]# ceph daemon osd.0 config show|grep message_max
"osd_map_message_max": "10",
[@c01 ~]# ceph daemon mon.a config show|grep message_max
"osd_map_message_max": "10",

[0]
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg54419.htm
l
http://tracker.ceph.com/issues/38040

3) Allow access to a monitor with
iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT

Getting 
[Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 
session 
established
[Thu Jul 11 12:39:26 2019] libceph: osd0 down
[Thu Jul 11 12:39:26 2019] libceph: osd0 up

Problems solved, in D state hung unmount was released. 

I am not sure if the prolonged disconnection to the monitors was 
the 
solution or the osd_map_message_max=10, or both. 


Re: [ceph-users] "session established", "io error", "session lost, hunting for new mon" solution/fix

2019-07-12 Thread Paul Emmerich
On Thu, Jul 11, 2019 at 11:36 PM Marc Roos  wrote:

>
>
> Anyone know why I would get these? Is it not strange to get them in a
> 'standard' setup?
>

you are probably running on an ancient kernel. this bug has been fixed a
long time ago.


Paul


>
>
>
>
>
> -Original Message-
> Subject: [ceph-users] "session established", "io error", "session lost,
> hunting for new mon" solution/fix
>
>
> I have on a cephfs client again (luminous cluster, centos7, only 32
> osds!). Wanted to share the 'fix'
>
> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
> established
> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error
> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
> lost, hunting for new mon
> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session
> established
> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 io error
> [Thu Jul 11 12:16:09 2019] libceph: mon2 192.168.10.113:6789 session
> lost, hunting for new mon
> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
> established
> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 io error
> [Thu Jul 11 12:16:09 2019] libceph: mon0 192.168.10.111:6789 session
> lost, hunting for new mon
> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session
> established
> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 io error
> [Thu Jul 11 12:16:09 2019] libceph: mon1 192.168.10.112:6789 session
> lost, hunting for new mon
>
> 1) I blocked client access to the monitors with
> iptables -I INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
> Resulting in
>
> [Thu Jul 11 12:34:16 2019] libceph: mon1 192.168.10.112:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:18 2019] libceph: mon1 192.168.10.112:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:22 2019] libceph: mon1 192.168.10.112:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:26 2019] libceph: mon2 192.168.10.113:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:27 2019] libceph: mon2 192.168.10.113:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:28 2019] libceph: mon2 192.168.10.113:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:30 2019] libceph: mon1 192.168.10.112:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:30 2019] libceph: mon2 192.168.10.113:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:34 2019] libceph: mon2 192.168.10.113:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:42 2019] libceph: mon2 192.168.10.113:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:44 2019] libceph: mon0 192.168.10.111:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:45 2019] libceph: mon0 192.168.10.111:6789 socket
> closed (con state CONNECTING)
> [Thu Jul 11 12:34:46 2019] libceph: mon0 192.168.10.111:6789 socket
> closed (con state CONNECTING)
>
> 2) I applied the suggested changes to the osd map message max, mentioned
>
> in early threads[0]
> ceph tell osd.* injectargs '--osd_map_message_max=10'
> ceph tell mon.* injectargs '--osd_map_message_max=10'
> [@c01 ~]# ceph daemon osd.0 config show|grep message_max
> "osd_map_message_max": "10",
> [@c01 ~]# ceph daemon mon.a config show|grep message_max
> "osd_map_message_max": "10",
>
> [0]
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg54419.html
> http://tracker.ceph.com/issues/38040
>
> 3) Allow access to a monitor with
> iptables -D INPUT -p tcp -s 192.168.10.43 --dport 6789 -j REJECT
>
> Getting
> [Thu Jul 11 12:39:26 2019] libceph: mon0 192.168.10.111:6789 session
> established
> [Thu Jul 11 12:39:26 2019] libceph: osd0 down
> [Thu Jul 11 12:39:26 2019] libceph: osd0 up
>
> Problems solved, in D state hung unmount was released.
>
> I am not sure if the prolonged disconnection to the monitors was the
> solution or the osd_map_message_max=10, or both.
>
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW: Is 'radosgw-admin reshard stale-instances rm' safe?

2019-07-12 Thread Rudenko Aleksandr
Hi, Casey.

Can you help me with my question?

From: Konstantin Shalygin 
Date: Wednesday, 26 June 2019 at 07:29
To: Rudenko Aleksandr 
Cc: "ceph-users@lists.ceph.com" , Casey Bodley 

Subject: Re: [ceph-users] RGW: Is 'radosgw-admin reshard stale-instances rm' 
safe?



On 6/25/19 12:46 AM, Rudenko Aleksandr wrote:
Hi, Konstantin.

Thanks for the reply.

I know about stale instances and that they remained from prior version.

I ask about “marker” of bucket. I have bucket “clx” and I can see his current 
marker in stale-instances list.
As I know, stale-instances list must contain only previous marker ids.



Good question! I CC'ed Casey for answer...





k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com