[ceph-users] any way to see enabled/disabled status of bucket sync?

2018-12-31 Thread Christian Rice
Is there a command that will show me the current status of bucket sync (enabled 
vs disabled)?

Referring to 
https://github.com/ceph/ceph/blob/b5f33ae3722118ec07112a4fe1bb0bdedb803a60/src/rgw/rgw_admin.cc#L1626
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-31 Thread Mohamad Gebai
On 12/31/18 4:51 AM, Marcus Murwall wrote:
> What you say does make sense though as I also get the feeling that the
> osds are just waiting for something. Something that never happens and
> the request finally timeout...

So the OSDs are just completely idle? If not, try using strace and/or
perf to get some insights into what they're doing.

Maybe someone with better knowledge of EC internals will suggest
something. In the mean time, you might want to look at the client side.
Could the client be somehow saturated or blocked on something? (If the
clients aren't blocked you can use 'perf' or Mark's profiler [1] to
profile them).

Try benchmarking with an iodepth of 1 and slowly increase it until you
run into the issue, all while monitoring your resources. You might find
something that causes the tipping point. Are you able to reproduce this
using fio? Maybe this is just a client issue..

Sorry for suggesting a bunch of things that are all over the place, I'm
just trying to understand the state of the cluster (and clients). Are
both the OSDs and the clients completely blocked and make no progress?

Let us know what you find.

Mohamad

[1] https://github.com/markhpc/gdbpmp/

>
> I will have one of our network guys to take a look and get a second
> pair of eyes on it as well, just to make sure I'm not missing anything.
>
> Thanks for your help so far Mohamad, I really appreciate it. If you
> have some more ideas/suggestions on where to look please let us know.
>
> I wish you all a happy new year.
>
> Regards
> Marcus
>
>> Mohamad Gebai 
>> 28 December 2018 at 16:10
>> Hi Marcus,
>>
>> On 12/27/18 4:21 PM, Marcus Murwall wrote:
>>> Hey Mohamad
>>>
>>> I work with Florian on this issue.
>>> Just reinstalled the ceph cluster and triggered the error again.
>>> Looking at iostat -x 1 there is basically no activity at all against
>>> any of the osds.
>>> We get blocked ops all over the place but here are some output from
>>> one of the osds that had blocked requests:
>>> http://paste.openstack.org/show/738721/
>>
>> Looking at the historic_slow_ops, the step in the pipeline that takes
>> the most time is sub_op_applied -> commit_sent. I couldn't say
>> exactly what these steps are from a high level view, but looking at
>> the code, commit_sent indicates that a message has been sent to the
>> OSD's client over the network. Can you look for network congestion
>> (the fact that there's nothing happening on the disks points in that
>> direction too)? Something like iftop might help. Is there anything
>> suspicious in the logs?
>>
>> Also, do you get the same throughput when benchmarking the replicated
>> compared to the EC pool?
>>
>> Mohamad
>>
>>>
>>>
>>> Regards
>>> Marcus
>>>
 Mohamad Gebai 
 26 December 2018 at 18:27
 What is happening on the individual nodes when you reach that point
 (iostat -x 1 on the OSD nodes)? Also, what throughput do you get when
 benchmarking the replicated pool?

 I guess one way to start would be by looking at ongoing operations at
 the OSD level:

 ceph daemon osd.X dump_blocked_ops
 ceph daemon osd.X dump_ops_in_flight
 ceph daemon osd.X dump_historic_slow_ops

 (see ceph daemon osd.X help) for more commands.

 The first command will show currently blocked operations. The last
 command shows recent slow operations. You can follow the flow of
 individual operations, and you might find that the slow operations are
 all associated with the same few PGs, or that they're spending too much
 time waiting on something.

 Hope that helps.

 Mohamad


 Florian Haas 
 26 December 2018 at 11:20
 Hi everyone,

 We have a Luminous cluster (12.2.10) on Ubuntu Xenial, though we have
 also observed the same behavior on 12.2.7 on Bionic (download.ceph.com
 doesn't build Luminous packages for Bionic, and 12.2.7 is the latest
 distro build).

 The primary use case for this cluster is radosgw. 6 OSD nodes, 22 OSDs
 per node, of which 20 are SAS spinners and 2 are NVMe devices. Cluster
 has been deployed with ceph-ansible stable-3.1, we're using
 "objectstore: bluestore" and "osd_scenario: collocated".

 We're using a "class hdd" replicated CRUSH ruleset for all our pools,
 except:

 - the bucket index pool, which uses a replicated "class nvme" rule, and
 - the bucket data pool, which uses an EC (crush-device-class=hdd,
 crush-failure-domain=host, k=3, m=2).

 We also have 3 pools that we have created in order to be able to do
 benchmark runs while leaving the other pools untouched, so we have

 - bench-repl-hdd, replicated, size 3, using a CRUSH rule with "step
 take
 default class hdd"
 - bench-repl-nvme, replicated, size 3, using a CRUSH rule with "step
 take default class nvme"
 - bench-ec-hdd, EC, crush-devi

Re: [ceph-users] Help with setting device-class rule on pool without causing data to move

2018-12-31 Thread Eric Goirand
Hi David,

CERN has provided with a python script to swap the correct bucket IDs
(default <-> hdd), you can find it here :
https://github.com/cernceph/ceph-scripts/blob/master/tools/device-class-id-swap.py

The principle is the following :
- extract the CRUSH map
- run the script on it => it creates a new CRUSH file.
- edit the CRUSH map and modify the rule associated with the pool(s) you
want to associate with HDD OSDs only like :
=> step take default WITH step take default class hdd

Then recompile and reinject the new CRUSH map and voilà !

Your cluster should be using only the HDD OSDs without rebalancing (or a
very small amount).

In case you have forgotten something, just reapply the former CRUSH map and
start again.

Cheers and Happy new year 2019.

Eric



On Sun, Dec 30, 2018, 21:16 David C  wrote:

> Hi All
>
> I'm trying to set the existing pools in a Luminous cluster to use the hdd
> device-class but without moving data around. If I just create a new rule
> using the hdd class and set my pools to use that new rule it will cause a
> huge amount of data movement even though the pgs are all already on HDDs.
>
> There is a thread on ceph-large [1] which appears to have the solution but
> I can't get my head around what I need to do. I'm not too clear on which
> IDs I need to swap. Could someone give me some pointers on this please?
>
> [1]
> http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000109.html
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge latency spikes

2018-12-31 Thread Marc Schöchlin
Hi,

our dell servers contain "PERC H730P Mini" raid controllers with 2GB battery 
backed cache memory.
All of our ceph osd disks (typically 12 * 8GB spinners or  16 * 1-2 TBssds per 
node) are used directly without using the raid functionality.

We deactivated the cache of the controller for the osd disks completely for 
spinners and ssds in the lifecycle controller:

- writing: to prevent cache access contention (throughput of cache might create 
latency spikes because the disks do 90% write load)
- reading: ceph is used as storage system for virtualized systems (XEN with 
rbd-nbd)
  -> we already have several levels of caching which should prevent re-reading 
of data: pagecache of the virtualized systems, rbd cache of rbd-bd, bluestore 
cache on the osd

Do you see scenarios where it might be a good idea to activate the cache anyway?

Regards
Marc

Am 20.11.18 um 15:11 schrieb Ashley Merrick:
> Me and quite a few others have had high random latency issues with disk cache 
> enabled.
>
> ,Ash
>
> On

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-large] Help with setting device-class rule on pool without causing data to move

2018-12-31 Thread Wido den Hollander
Recently Dan from CERN showcased a interesting way.

Use upmap to map all PGs to the current OSDs and then change the CRUSH topology.

Then enable the balancer module and have it slowly move the PGs.

Yes, you will rebalance, but it can be done over a very long period with Health 
OK in between.

Wido

> Op 30 dec. 2018 om 21:16 heeft David C  het volgende 
> geschreven:
> 
> Hi All
> 
> I'm trying to set the existing pools in a Luminous cluster to use the hdd 
> device-class but without moving data around. If I just create a new rule 
> using the hdd class and set my pools to use that new rule it will cause a 
> huge amount of data movement even though the pgs are all already on HDDs. 
> 
> There is a thread on ceph-large [1] which appears to have the solution but I 
> can't get my head around what I need to do. I'm not too clear on which IDs I 
> need to swap. Could someone give me some pointers on this please?
> 
> [1] http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000109.html
> 
> 
> 
> ___
> Ceph-large mailing list
> ceph-la...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC pools grinding to a screeching halt on Luminous

2018-12-31 Thread Marcus Murwall

Hi Mohamad

The network is 2x25Gbit interface bonded for the cluster network and I 
see no signs of congestion. Also if I benchmark against a replicated 
pool I can't recreate these issues. I can push a lot more data against a 
replicated pool and everything works just fine. If it was a network 
congestion it feels as if I would get the same issue regardless if it 
was a replicated or a EC pool?
What you say does make sense though as I also get the feeling that the 
osds are just waiting for something. Something that never happens and 
the request finally timeout... and it seems to only happen when using 
erasure coding.


I will have one of our network guys to take a look and get a second pair 
of eyes on it as well, just to make sure I'm not missing anything.


Thanks for your help so far Mohamad, I really appreciate it. If you have 
some more ideas/suggestions on where to look please let us know.


I wish you all a happy new year.

Regards
Marcus


Mohamad Gebai 
28 December 2018 at 16:10
Hi Marcus,

On 12/27/18 4:21 PM, Marcus Murwall wrote:

Hey Mohamad

I work with Florian on this issue.
Just reinstalled the ceph cluster and triggered the error again.
Looking at iostat -x 1 there is basically no activity at all against 
any of the osds.
We get blocked ops all over the place but here are some output from 
one of the osds that had blocked requests: 
http://paste.openstack.org/show/738721/


Looking at the historic_slow_ops, the step in the pipeline that takes 
the most time is sub_op_applied -> commit_sent. I couldn't say exactly 
what these steps are from a high level view, but looking at the code, 
commit_sent indicates that a message has been sent to the OSD's client 
over the network. Can you look for network congestion (the fact that 
there's nothing happening on the disks points in that direction too)? 
Something like iftop might help. Is there anything suspicious in the logs?


Also, do you get the same throughput when benchmarking the replicated 
compared to the EC pool?


Mohamad




Regards
Marcus


Mohamad Gebai 
26 December 2018 at 18:27
What is happening on the individual nodes when you reach that point
(iostat -x 1 on the OSD nodes)? Also, what throughput do you get when
benchmarking the replicated pool?

I guess one way to start would be by looking at ongoing operations at
the OSD level:

ceph daemon osd.X dump_blocked_ops
ceph daemon osd.X dump_ops_in_flight
ceph daemon osd.X dump_historic_slow_ops

(see ceph daemon osd.X help) for more commands.

The first command will show currently blocked operations. The last
command shows recent slow operations. You can follow the flow of
individual operations, and you might find that the slow operations are
all associated with the same few PGs, or that they're spending too much
time waiting on something.

Hope that helps.

Mohamad


Florian Haas 
26 December 2018 at 11:20
Hi everyone,

We have a Luminous cluster (12.2.10) on Ubuntu Xenial, though we have
also observed the same behavior on 12.2.7 on Bionic (download.ceph.com
doesn't build Luminous packages for Bionic, and 12.2.7 is the latest
distro build).

The primary use case for this cluster is radosgw. 6 OSD nodes, 22 OSDs
per node, of which 20 are SAS spinners and 2 are NVMe devices. Cluster
has been deployed with ceph-ansible stable-3.1, we're using
"objectstore: bluestore" and "osd_scenario: collocated".

We're using a "class hdd" replicated CRUSH ruleset for all our pools,
except:

- the bucket index pool, which uses a replicated "class nvme" rule, and
- the bucket data pool, which uses an EC (crush-device-class=hdd,
crush-failure-domain=host, k=3, m=2).

We also have 3 pools that we have created in order to be able to do
benchmark runs while leaving the other pools untouched, so we have

- bench-repl-hdd, replicated, size 3, using a CRUSH rule with "step take
default class hdd"
- bench-repl-nvme, replicated, size 3, using a CRUSH rule with "step
take default class nvme"
- bench-ec-hdd, EC, crush-device-class=hdd, crush-failure-domain=host,
k=3, m=2.

Baseline benchmarks with "ceph tell osd.* bench" at the default block
size of 4M yield pretty exactly the throughput you'd expect from the
devices: approx. 185 MB/s from the SAS drives; the NVMe devices
currently pull only 650 MB/s on writes but that may well be due to
pending conditioning — this is new hardware.

Now when we run "rados bench" against the replicated pools, we again get
exactly what we expect for a nominally performing but largely untuned
system.

It's when we try running benchmarks against the EC pool that everything
appears to grind to a halt:

http://paste.openstack.org/show/738187/

After 19 seconds, that pool does not accept a single further object. We
simultaneously see slow request warnings creep up in the cluster, and
the only thing we can then do is kill the benchmark, and wait for the
slow requests to clear out.

We've also seen the log mes