Re: [ceph-users] Luminous | PG split causing slow requests

2018-03-15 Thread David Turner
The settings don't completely disable it, they push it back so that it
won't happen for a very long time.  I based doing my offline splitting
every month because after 5-6 weeks I found that our PGs started splitting
on their own with the increased settings... so I schedule a task to split
them offline every 4 weeks.

filestore_merge_threshold is a variable used in the equation to figure out
when a folder will split.  Any negative number disables filestore from
actually merging subfolders and 16 is the value I wanted to use in the
equation.

Note that anything that moves PGs around between OSDs will build the PG on
the new OSD with the current settings in the config file so you should
schedule to do offline splitting shortly after adding/removing storage.

On Wed, Mar 14, 2018 at 1:43 PM David C  wrote:

> On Mon, Feb 26, 2018 at 6:08 PM, David Turner 
> wrote:
>
>> The slow requests are absolutely expected on filestore subfolder
>> splitting.  You can however stop an OSD, split it's subfolders, and start
>> it back up.  I perform this maintenance once/month.  I changed my settings
>> to [1]these, but I only suggest doing something this drastic if you're
>> committed to manually split your PGs regularly.  In my environment that
>> needs to be once/month.
>>
>
> Hi David, to be honest I've still not completely got my head around the
> filestore splitting, but one thing's for sure it's causing major IO issues
> on my small cluster. If I understand correctly, your settings in [1]
> completely disable "online" merging and splitting. Have I got that right?
>
> Why is your filestore_merge_threshold -16 as opposed to -1?
>
> You say you need to do your offline splitting on a monthly basis in your
> environment but how are you arriving at that conclusion? What would I need
> to monitor to discover how frequently I would need to do a split?
>
> Thanks for all your help on this
>
>
>>
>> Along with those settings, I use [2]this script to perform the subfolder
>> splitting. It will change your config file to [3]these settings, perform
>> the subfolder splitting, change them back to what you currently have, and
>> start your OSDs back up.  using a negative merge threshold prevents
>> subfolder merging which is useful for some environments.
>>
>> The script automatically sets noout and unset it for you afterwards as
>> well it won't start unless the cluster is health_ok.  Feel free to use it
>> as is or pick from it what's useful for you.  I highly suggest that anyone
>> feeling the pains of subfolder splitting to do some sort of offline
>> splitting to get through it.  If you're using some sort of config
>> management like salt or puppet, be sure to disable it so that the config
>> won't be overwritten while the subfolders are being split.
>>
>>
>> [1] filestore_merge_threshold = -16
>>  filestore_split_multiple = 256
>>
>> [2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4
>>
>> [3] filestore_merge_threshold = -1
>>  filestore_split_multiple = 1
>> On Mon, Feb 26, 2018 at 12:18 PM David C  wrote:
>>
>>> Thanks, David. I think I've probably used the wrong terminology here,
>>> I'm not splitting PGs to create more PGs. This is the PG folder splitting
>>> that happens automatically, I believe it's controlled by the
>>> "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
>>> the Luminous default...). Increasing heartbeat grace would probably still
>>> be a good idea to prevent the flapping. I'm trying to understand if the
>>> slow requests is to be expected or if I need to tune something or look at
>>> hardware.
>>>
>>> On Mon, Feb 26, 2018 at 4:19 PM, David Turner 
>>> wrote:
>>>
 Splitting PG's is one of the most intensive and disruptive things you
 can, and should, do to a cluster.  Tweaking recovery sleep, max backfills,
 and heartbeat grace should help with this.  Heartbeat grace can be set high
 enough to mitigate the OSDs flapping which slows things down by peering and
 additional recovery, while still being able to detect OSDs that might fail
 and go down.  The recovery sleep and max backfills are the settings you
 want to look at for mitigating slow requests.  I generally tweak those
 while watching iostat of some OSDs and ceph -s to make sure I'm not giving
 too  much priority to the recovery operations so that client IO can still
 happen.

 On Mon, Feb 26, 2018 at 11:10 AM David C 
 wrote:

> Hi All
>
> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners,
> journals on NVME. Cluster primarily used for CephFS, ~20M objects.
>
> I'm seeing some OSDs getting marked down, it appears to be related to
> PG splitting, e.g:
>
> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
>> objects, starting split.
>>
>
> Followed by:
>
> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log
>> [WRN] : 9 slow request

Re: [ceph-users] Luminous | PG split causing slow requests

2018-03-14 Thread David C
On Mon, Feb 26, 2018 at 6:08 PM, David Turner  wrote:

> The slow requests are absolutely expected on filestore subfolder
> splitting.  You can however stop an OSD, split it's subfolders, and start
> it back up.  I perform this maintenance once/month.  I changed my settings
> to [1]these, but I only suggest doing something this drastic if you're
> committed to manually split your PGs regularly.  In my environment that
> needs to be once/month.
>

Hi David, to be honest I've still not completely got my head around the
filestore splitting, but one thing's for sure it's causing major IO issues
on my small cluster. If I understand correctly, your settings in [1]
completely disable "online" merging and splitting. Have I got that right?

Why is your filestore_merge_threshold -16 as opposed to -1?

You say you need to do your offline splitting on a monthly basis in your
environment but how are you arriving at that conclusion? What would I need
to monitor to discover how frequently I would need to do a split?

Thanks for all your help on this


>
> Along with those settings, I use [2]this script to perform the subfolder
> splitting. It will change your config file to [3]these settings, perform
> the subfolder splitting, change them back to what you currently have, and
> start your OSDs back up.  using a negative merge threshold prevents
> subfolder merging which is useful for some environments.
>
> The script automatically sets noout and unset it for you afterwards as
> well it won't start unless the cluster is health_ok.  Feel free to use it
> as is or pick from it what's useful for you.  I highly suggest that anyone
> feeling the pains of subfolder splitting to do some sort of offline
> splitting to get through it.  If you're using some sort of config
> management like salt or puppet, be sure to disable it so that the config
> won't be overwritten while the subfolders are being split.
>
>
> [1] filestore_merge_threshold = -16
>  filestore_split_multiple = 256
>
> [2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4
>
> [3] filestore_merge_threshold = -1
>  filestore_split_multiple = 1
> On Mon, Feb 26, 2018 at 12:18 PM David C  wrote:
>
>> Thanks, David. I think I've probably used the wrong terminology here, I'm
>> not splitting PGs to create more PGs. This is the PG folder splitting that
>> happens automatically, I believe it's controlled by the
>> "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
>> the Luminous default...). Increasing heartbeat grace would probably still
>> be a good idea to prevent the flapping. I'm trying to understand if the
>> slow requests is to be expected or if I need to tune something or look at
>> hardware.
>>
>> On Mon, Feb 26, 2018 at 4:19 PM, David Turner 
>> wrote:
>>
>>> Splitting PG's is one of the most intensive and disruptive things you
>>> can, and should, do to a cluster.  Tweaking recovery sleep, max backfills,
>>> and heartbeat grace should help with this.  Heartbeat grace can be set high
>>> enough to mitigate the OSDs flapping which slows things down by peering and
>>> additional recovery, while still being able to detect OSDs that might fail
>>> and go down.  The recovery sleep and max backfills are the settings you
>>> want to look at for mitigating slow requests.  I generally tweak those
>>> while watching iostat of some OSDs and ceph -s to make sure I'm not giving
>>> too  much priority to the recovery operations so that client IO can still
>>> happen.
>>>
>>> On Mon, Feb 26, 2018 at 11:10 AM David C 
>>> wrote:
>>>
 Hi All

 I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners,
 journals on NVME. Cluster primarily used for CephFS, ~20M objects.

 I'm seeing some OSDs getting marked down, it appears to be related to
 PG splitting, e.g:

 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
> objects, starting split.
>

 Followed by:

 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log
> [WRN] : 9 slow requests, 5 included below; oldest blocked for > 30.308128
> secs
> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log
> [WRN] : slow request 30.151105 seconds old, received at 2018-02-26
> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log
> [WRN] : slow request 30.133441 seconds old, received at 2018-02-26
> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log
> [WRN] : slow request 30.083401 s

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-27 Thread David C
This is super helpful, thanks for sharing, David. I need to a bit more
reading into this.

On 26 Feb 2018 6:08 p.m., "David Turner"  wrote:

The slow requests are absolutely expected on filestore subfolder
splitting.  You can however stop an OSD, split it's subfolders, and start
it back up.  I perform this maintenance once/month.  I changed my settings
to [1]these, but I only suggest doing something this drastic if you're
committed to manually split your PGs regularly.  In my environment that
needs to be once/month.

Along with those settings, I use [2]this script to perform the subfolder
splitting. It will change your config file to [3]these settings, perform
the subfolder splitting, change them back to what you currently have, and
start your OSDs back up.  using a negative merge threshold prevents
subfolder merging which is useful for some environments.

The script automatically sets noout and unset it for you afterwards as well
it won't start unless the cluster is health_ok.  Feel free to use it as is
or pick from it what's useful for you.  I highly suggest that anyone
feeling the pains of subfolder splitting to do some sort of offline
splitting to get through it.  If you're using some sort of config
management like salt or puppet, be sure to disable it so that the config
won't be overwritten while the subfolders are being split.


[1] filestore_merge_threshold = -16
 filestore_split_multiple = 256

[2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4

[3] filestore_merge_threshold = -1
 filestore_split_multiple = 1
On Mon, Feb 26, 2018 at 12:18 PM David C  wrote:

> Thanks, David. I think I've probably used the wrong terminology here, I'm
> not splitting PGs to create more PGs. This is the PG folder splitting that
> happens automatically, I believe it's controlled by the
> "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
> the Luminous default...). Increasing heartbeat grace would probably still
> be a good idea to prevent the flapping. I'm trying to understand if the
> slow requests is to be expected or if I need to tune something or look at
> hardware.
>
> On Mon, Feb 26, 2018 at 4:19 PM, David Turner 
> wrote:
>
>> Splitting PG's is one of the most intensive and disruptive things you
>> can, and should, do to a cluster.  Tweaking recovery sleep, max backfills,
>> and heartbeat grace should help with this.  Heartbeat grace can be set high
>> enough to mitigate the OSDs flapping which slows things down by peering and
>> additional recovery, while still being able to detect OSDs that might fail
>> and go down.  The recovery sleep and max backfills are the settings you
>> want to look at for mitigating slow requests.  I generally tweak those
>> while watching iostat of some OSDs and ceph -s to make sure I'm not giving
>> too  much priority to the recovery operations so that client IO can still
>> happen.
>>
>> On Mon, Feb 26, 2018 at 11:10 AM David C  wrote:
>>
>>> Hi All
>>>
>>> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals
>>> on NVME. Cluster primarily used for CephFS, ~20M objects.
>>>
>>> I'm seeing some OSDs getting marked down, it appears to be related to PG
>>> splitting, e.g:
>>>
>>> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
 objects, starting split.

>>>
>>> Followed by:
>>>
>>> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : 9 slow requests, 5 included below; oldest blocked for > 30.308128
 secs
 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.151105 seconds old, received at 2018-02-26
 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
 commit_sent
 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.133441 seconds old, received at 2018-02-26
 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
 commit_sent
 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.083401 seconds old, received at 2018-02-26
 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
 ondisk+read+rwordered+known_if_redirected+full_force e13994) currently
 waiting for rw locks
 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.072310 seconds old, received at 2018-02-26
 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
 waiting for rw locks
 2018-02-

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David Turner
The slow requests are absolutely expected on filestore subfolder
splitting.  You can however stop an OSD, split it's subfolders, and start
it back up.  I perform this maintenance once/month.  I changed my settings
to [1]these, but I only suggest doing something this drastic if you're
committed to manually split your PGs regularly.  In my environment that
needs to be once/month.

Along with those settings, I use [2]this script to perform the subfolder
splitting. It will change your config file to [3]these settings, perform
the subfolder splitting, change them back to what you currently have, and
start your OSDs back up.  using a negative merge threshold prevents
subfolder merging which is useful for some environments.

The script automatically sets noout and unset it for you afterwards as well
it won't start unless the cluster is health_ok.  Feel free to use it as is
or pick from it what's useful for you.  I highly suggest that anyone
feeling the pains of subfolder splitting to do some sort of offline
splitting to get through it.  If you're using some sort of config
management like salt or puppet, be sure to disable it so that the config
won't be overwritten while the subfolders are being split.


[1] filestore_merge_threshold = -16
 filestore_split_multiple = 256

[2] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4

[3] filestore_merge_threshold = -1
 filestore_split_multiple = 1
On Mon, Feb 26, 2018 at 12:18 PM David C  wrote:

> Thanks, David. I think I've probably used the wrong terminology here, I'm
> not splitting PGs to create more PGs. This is the PG folder splitting that
> happens automatically, I believe it's controlled by the
> "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
> the Luminous default...). Increasing heartbeat grace would probably still
> be a good idea to prevent the flapping. I'm trying to understand if the
> slow requests is to be expected or if I need to tune something or look at
> hardware.
>
> On Mon, Feb 26, 2018 at 4:19 PM, David Turner 
> wrote:
>
>> Splitting PG's is one of the most intensive and disruptive things you
>> can, and should, do to a cluster.  Tweaking recovery sleep, max backfills,
>> and heartbeat grace should help with this.  Heartbeat grace can be set high
>> enough to mitigate the OSDs flapping which slows things down by peering and
>> additional recovery, while still being able to detect OSDs that might fail
>> and go down.  The recovery sleep and max backfills are the settings you
>> want to look at for mitigating slow requests.  I generally tweak those
>> while watching iostat of some OSDs and ceph -s to make sure I'm not giving
>> too  much priority to the recovery operations so that client IO can still
>> happen.
>>
>> On Mon, Feb 26, 2018 at 11:10 AM David C  wrote:
>>
>>> Hi All
>>>
>>> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals
>>> on NVME. Cluster primarily used for CephFS, ~20M objects.
>>>
>>> I'm seeing some OSDs getting marked down, it appears to be related to PG
>>> splitting, e.g:
>>>
>>> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
 objects, starting split.

>>>
>>> Followed by:
>>>
>>> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : 9 slow requests, 5 included below; oldest blocked for > 30.308128
 secs
 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.151105 seconds old, received at 2018-02-26
 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
 commit_sent
 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.133441 seconds old, received at 2018-02-26
 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
 commit_sent
 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.083401 seconds old, received at 2018-02-26
 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
 ondisk+read+rwordered+known_if_redirected+full_force e13994) currently
 waiting for rw locks
 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.072310 seconds old, received at 2018-02-26
 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
 0=[] ondisk+write+known_if_redirected+full_force e13994) currently waiting
 for rw locks
 2018-02-26 10:27:58.242584 7f141cc3f700  0 log_channel(cluster) log
 [WRN] : slow request 30.308128 seconds old, received at 2018-02-26
 10:2

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David C
Thanks, David. I think I've probably used the wrong terminology here, I'm
not splitting PGs to create more PGs. This is the PG folder splitting that
happens automatically, I believe it's controlled by the
"filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's
the Luminous default...). Increasing heartbeat grace would probably still
be a good idea to prevent the flapping. I'm trying to understand if the
slow requests is to be expected or if I need to tune something or look at
hardware.

On Mon, Feb 26, 2018 at 4:19 PM, David Turner  wrote:

> Splitting PG's is one of the most intensive and disruptive things you can,
> and should, do to a cluster.  Tweaking recovery sleep, max backfills, and
> heartbeat grace should help with this.  Heartbeat grace can be set high
> enough to mitigate the OSDs flapping which slows things down by peering and
> additional recovery, while still being able to detect OSDs that might fail
> and go down.  The recovery sleep and max backfills are the settings you
> want to look at for mitigating slow requests.  I generally tweak those
> while watching iostat of some OSDs and ceph -s to make sure I'm not giving
> too  much priority to the recovery operations so that client IO can still
> happen.
>
> On Mon, Feb 26, 2018 at 11:10 AM David C  wrote:
>
>> Hi All
>>
>> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals
>> on NVME. Cluster primarily used for CephFS, ~20M objects.
>>
>> I'm seeing some OSDs getting marked down, it appears to be related to PG
>> splitting, e.g:
>>
>> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
>>> objects, starting split.
>>>
>>
>> Followed by:
>>
>> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log [WRN]
>>> : 9 slow requests, 5 included below; oldest blocked for > 30.308128 secs
>>> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.151105 seconds old, received at 2018-02-26
>>> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>> commit_sent
>>> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.133441 seconds old, received at 2018-02-26
>>> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>> commit_sent
>>> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.083401 seconds old, received at 2018-02-26
>>> 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
>>> ondisk+read+rwordered+known_if_redirected+full_force e13994) currently
>>> waiting for rw locks
>>> 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.072310 seconds old, received at 2018-02-26
>>> 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>> waiting for rw locks
>>> 2018-02-26 10:27:58.242584 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.308128 seconds old, received at 2018-02-26
>>> 10:27:27.934288: osd_op(mds.0.5339:811964 3.5c
>>> 3:3bb9d743:::200.0018c6c4:head [write 0~62535 [fadvise_dontneed]] snapc
>>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>>> commit_sent
>>> 2018-02-26 10:27:59.242768 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : 47 slow requests, 5 included below; oldest blocked for > 31.308410
>>> secs
>>> 2018-02-26 10:27:59.242776 7f141cc3f700  0 log_channel(cluster) log
>>> [WRN] : slow request 30.349575 seconds old, received at 2018-02-26
>>> 10:27:28.893124:
>>
>>
>> I'm also experiencing some MDS crash issues which I think could be
>> related.
>>
>> Is there anything I can do to mitigate the slow requests problem? The
>> rest of the time the cluster is performing pretty well.
>>
>> Thanks,
>> David
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David Turner
Splitting PG's is one of the most intensive and disruptive things you can,
and should, do to a cluster.  Tweaking recovery sleep, max backfills, and
heartbeat grace should help with this.  Heartbeat grace can be set high
enough to mitigate the OSDs flapping which slows things down by peering and
additional recovery, while still being able to detect OSDs that might fail
and go down.  The recovery sleep and max backfills are the settings you
want to look at for mitigating slow requests.  I generally tweak those
while watching iostat of some OSDs and ceph -s to make sure I'm not giving
too  much priority to the recovery operations so that client IO can still
happen.

On Mon, Feb 26, 2018 at 11:10 AM David C  wrote:

> Hi All
>
> I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals
> on NVME. Cluster primarily used for CephFS, ~20M objects.
>
> I'm seeing some OSDs getting marked down, it appears to be related to PG
> splitting, e.g:
>
> 2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121
>> objects, starting split.
>>
>
> Followed by:
>
> 2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log [WRN]
>> : 9 slow requests, 5 included below; oldest blocked for > 30.308128 secs
>> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log [WRN]
>> : slow request 30.151105 seconds old, received at 2018-02-26
>> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
>> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>> commit_sent
>> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log [WRN]
>> : slow request 30.133441 seconds old, received at 2018-02-26
>> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
>> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>> commit_sent
>> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log [WRN]
>> : slow request 30.083401 seconds old, received at 2018-02-26
>> 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
>> 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
>> ondisk+read+rwordered+known_if_redirected+full_force e13994) currently
>> waiting for rw locks
>> 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log [WRN]
>> : slow request 30.072310 seconds old, received at 2018-02-26
>> 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
>> 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently waiting
>> for rw locks
>> 2018-02-26 10:27:58.242584 7f141cc3f700  0 log_channel(cluster) log [WRN]
>> : slow request 30.308128 seconds old, received at 2018-02-26
>> 10:27:27.934288: osd_op(mds.0.5339:811964 3.5c
>> 3:3bb9d743:::200.0018c6c4:head [write 0~62535 [fadvise_dontneed]] snapc
>> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
>> commit_sent
>> 2018-02-26 10:27:59.242768 7f141cc3f700  0 log_channel(cluster) log [WRN]
>> : 47 slow requests, 5 included below; oldest blocked for > 31.308410 secs
>> 2018-02-26 10:27:59.242776 7f141cc3f700  0 log_channel(cluster) log [WRN]
>> : slow request 30.349575 seconds old, received at 2018-02-26
>> 10:27:28.893124:
>
>
> I'm also experiencing some MDS crash issues which I think could be related.
>
> Is there anything I can do to mitigate the slow requests problem? The rest
> of the time the cluster is performing pretty well.
>
> Thanks,
> David
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David C
Hi All

I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals on
NVME. Cluster primarily used for CephFS, ~20M objects.

I'm seeing some OSDs getting marked down, it appears to be related to PG
splitting, e.g:

2018-02-26 10:27:27.935489 7f140dbe2700  1 _created [C,D] has 5121 objects,
> starting split.
>

Followed by:

2018-02-26 10:27:58.242551 7f141cc3f700  0 log_channel(cluster) log [WRN] :
> 9 slow requests, 5 included below; oldest blocked for > 30.308128 secs
> 2018-02-26 10:27:58.242563 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.151105 seconds old, received at 2018-02-26
> 10:27:28.091312: osd_op(mds.0.5339:811969 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 73416~5897 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:58.242569 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.133441 seconds old, received at 2018-02-26
> 10:27:28.108976: osd_op(mds.0.5339:811970 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 79313~4866 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:58.242574 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.083401 seconds old, received at 2018-02-26
> 10:27:28.159016: osd_op(mds.9174516.0:444202 3.5c
> 3:3bb9d743:::200.0018c6c4:head [stat] snapc 0=[]
> ondisk+read+rwordered+known_if_redirected+full_force e13994) currently
> waiting for rw locks
> 2018-02-26 10:27:58.242579 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.072310 seconds old, received at 2018-02-26
> 10:27:28.170107: osd_op(mds.0.5339:811971 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 84179~1941 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently waiting
> for rw locks
> 2018-02-26 10:27:58.242584 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.308128 seconds old, received at 2018-02-26
> 10:27:27.934288: osd_op(mds.0.5339:811964 3.5c
> 3:3bb9d743:::200.0018c6c4:head [write 0~62535 [fadvise_dontneed]] snapc
> 0=[] ondisk+write+known_if_redirected+full_force e13994) currently
> commit_sent
> 2018-02-26 10:27:59.242768 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : 47 slow requests, 5 included below; oldest blocked for > 31.308410 secs
> 2018-02-26 10:27:59.242776 7f141cc3f700  0 log_channel(cluster) log [WRN]
> : slow request 30.349575 seconds old, received at 2018-02-26
> 10:27:28.893124:


I'm also experiencing some MDS crash issues which I think could be related.

Is there anything I can do to mitigate the slow requests problem? The rest
of the time the cluster is performing pretty well.

Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com