Re: [ceph-users] Bluestore increased disk usage

2019-02-10 Thread Jakub Jaszewski
Hi Yenya,

I guess Ceph adds the size of all  your data.db devices to the cluster
total used space.

Regards,
Jakub


pt., 8 lut 2019, 10:11: Jan Kasprzak  napisał(a):

> Hello, ceph users,
>
> I moved my cluster to bluestore (Ceph Mimic), and now I see the increased
> disk usage. From ceph -s:
>
> pools:   8 pools, 3328 pgs
> objects: 1.23 M objects, 4.6 TiB
> usage:   23 TiB used, 444 TiB / 467 TiB avail
>
> I use 3-way replication of my data, so I would expect the disk usage
> to be around 14 TiB. Which was true when I used filestore-based Luminous
> OSDs
> before. Why the disk usage now is 23 TiB?
>
> If I remember it correctly (a big if!), the disk usage was about the same
> when I originally moved the data to empty bluestore OSDs by changing the
> crush rule, but went up after I have added more bluestore OSDs and the
> cluster
> rebalanced itself.
>
> Could it be some miscalculation of free space in bluestore? Also, could it
> be
> related to the HEALTH_ERR backfill_toofull problem discused here in the
> other
> thread?
>
> Thanks,
>
> -Yenya
>
> --
> | Jan "Yenya" Kasprzak 
> |
> | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5
> |
>  This is the world we live in: the way to deal with computers is to google
>  the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating to a dedicated cluster network

2019-01-23 Thread Jakub Jaszewski
Hi Yenya,

Can I ask how your cluster looks and  why you want to do the network
splitting?

We used to set up 9-12 OSD nodes (12-16 HDDs each) clusters using 2x10Gb
for access and 2x10Gb for cluster network, however, I don't see the reasons
to not use just one network for next cluster setup.

Thanks
Jakub

śr., 23 sty 2019, 10:40: Jan Kasprzak  napisał(a):

> Hello, Ceph users,
>
> is it possible to migrate already deployed Ceph cluster, which uses
> public network only, to a split public/dedicated networks? If so,
> can this be done without service disruption? I have now got a new
> hardware which makes this possible, but I am not sure how to do it.
>
> Another question is whether the cluster network can be done
> solely on top of IPv6 link-local addresses without any public address
> prefix.
>
> When deploying this cluster (Ceph Firefly, IIRC), I had problems
> with mixed IPv4/IPv6 addressing, and ended up with ms_bind_ipv6 = false
> in my Ceph conf.
>
> Thanks,
>
> -Yenya
>
> --
> | Jan "Yenya" Kasprzak 
> |
> | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5
> |
>  This is the world we live in: the way to deal with computers is to google
>  the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH Cluster Usage Discrepancy

2018-10-20 Thread Jakub Jaszewski
Hi Dan,

Did you configure block.wal/block.db as separate devices/partition
(osd_scenario: non-collocated or lvm for clusters installed using
ceph-ansbile playbooks )?

I run Ceph version 13.2.1 with non-collocated data.db and have the same
situation - the sum of block.db partitions' size is displayed as RAW USED
in ceph df.
Perhaps it is not the case for collocated block.db/wal.

Jakub

On Sat, Oct 20, 2018 at 8:34 PM Waterbly, Dan 
wrote:

> I get that, but isn’t 4TiB to track 2.45M objects excessive? These numbers
> seem very high to me.
>
> Get Outlook for iOS 
>
>
>
> On Sat, Oct 20, 2018 at 10:27 AM -0700, "Serkan Çoban" <
> cobanser...@gmail.com> wrote:
>
> 4.65TiB includes size of wal and db partitions too.
>> On Sat, Oct 20, 2018 at 7:45 PM Waterbly, Dan  wrote:
>> >
>> > Hello,
>> >
>> >
>> >
>> > I have inserted 2.45M 1,000 byte objects into my cluster (radosgw, 3x 
>> > replication).
>> >
>> >
>> >
>> > I am confused by the usage ceph df is reporting and am hoping someone can 
>> > shed some light on this. Here is what I see when I run ceph df
>> >
>> >
>> >
>> > GLOBAL:
>> >
>> > SIZEAVAIL   RAW USED %RAW USED
>> >
>> > 1.02PiB 1.02PiB  4.65TiB  0.44
>> >
>> > POOLS:
>> >
>> > NAME   ID USED
>> > %USED MAX AVAIL OBJECTS
>> >
>> > .rgw.root  1  3.30KiB 
>> > 0330TiB   17
>> >
>> > .rgw.buckets.data  2  22.9GiB 0330TiB 
>> > 24550943
>> >
>> > default.rgw.control3   0B 
>> > 0330TiB8
>> >
>> > default.rgw.meta   4 373B 
>> > 0330TiB3
>> >
>> > default.rgw.log5   0B 
>> > 0330TiB0
>> >
>> > .rgw.control   6   0B 0330TiB  
>> >   8
>> >
>> > .rgw.meta  7  2.18KiB 0330TiB  
>> >  12
>> >
>> > .rgw.log   8   0B 0330TiB  
>> > 194
>> >
>> > .rgw.buckets.index 9   0B 0330TiB 
>> > 2560
>> >
>> >
>> >
>> > Why does my bucket pool report usage of 22.9GiB but my cluster as a whole 
>> > is reporting 4.65TiB? There is nothing else on this cluster as it was just 
>> > installed and configured.
>> >
>> >
>> >
>> > Thank you for your help with this.
>> >
>> >
>> >
>> > -Dan
>> >
>> >
>> >
>> > Dan Waterbly | Senior Application Developer | 509.235.7500 x225 | 
>> > dan.water...@sos.wa.gov
>> >
>> > WASHINGTON STATE ARCHIVES | DIGITAL ARCHIVES
>> >
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] understanding % used in ceph df

2018-10-19 Thread Jakub Jaszewski
Hi, your question is more about MAX AVAIL value I think, see how Ceph
calculates it
http://docs.ceph.com/docs/luminous/rados/operations/monitoring/#checking-a-cluster-s-usage-stats

One OSD getting full makes the pool full as well, so keep on nearfull OSDs
reweighting .

Jakub


19 paź 2018 16:34 "Florian Engelmann" 
napisał(a):

Hi,


Our Ceph cluster is a 6 Node cluster each node having 8 disks. The
cluster is used for object storage only (right now). We do use EC 3+2 on
the buckets.data pool.

We had a problem with RadosGW segfaulting (12.2.5) till we upgraded to
12.2.8. We had almost 30.000 radosgw crashes leading to millions of
unreferenced objects (failed multiuploads?). It filled our cluster so
fast that we are now in danger to run out of space.

As you can see we are reweighting some OSDs right now. But the real
question is how "used" is calculated in ceph df.

Global: %RAW USED = 76.49%

while

x-1.rgw.buckets.data Used = 90.32%

Am I right this is because we should still be "able" to loose one OSD node?

If thats true - reweight can help just a little to rebalance the
capacity used on each node?

The only chance we have right now to survive until new HDDs arrive is to
delete objects, right?


ceph -s
   cluster:
 id: a146-6561-307e-b032-x
 health: HEALTH_WARN
 3 nearfull osd(s)
 13 pool(s) nearfull
 1 large omap objects
 766760/180478374 objects misplaced (0.425%)

   services:
 mon: 3 daemons, quorum ceph1-mon3,ceph1-mon2,ceph1-mon1
 mgr: ceph1-mon2(active), standbys: ceph1-mon1, ceph1-mon3
 osd: 36 osds: 36 up, 36 in; 24 remapped pgs
 rgw: 3 daemons active
 rgw-nfs: 2 daemons active

   data:
 pools:   13 pools, 1424 pgs
 objects: 36.10M objects, 115TiB
 usage:   200TiB used, 61.6TiB / 262TiB avail
 pgs: 766760/180478374 objects misplaced (0.425%)
  1400 active+clean
  16   active+remapped+backfill_wait
  8active+remapped+backfilling

   io:
 client:   3.05MiB/s rd, 0B/s wr, 1.12kop/s rd, 37op/s wr
 recovery: 306MiB/s, 91objects/s

ceph df
GLOBAL:
 SIZE   AVAIL   RAW USED %RAW USED
 262TiB 61.6TiB   200TiB 76.49
POOLS:
 NAMEID USED%USED MAX AVAIL
OBJECTS
 iscsi-images1  35B 0   6.87TiB
   5
 .rgw.root   2  3.57KiB 0   6.87TiB
  18
 x-1.rgw.buckets.data   6   115TiB 90.32   12.4TiB
   36090523
 x-1.rgw.control7   0B 0   6.87TiB
  8
 x-1.rgw.meta   8   943KiB 0   6.87TiB
   3265
 x-1.rgw.log9   0B 0   6.87TiB
407
 x-1.rgw.buckets.index  12  0B 0   6.87TiB
   3096
 x-1.rgw.buckets.non-ec 13  0B 0   6.87TiB
   1623
 default.rgw.meta14373B 0   6.87TiB
   3
 default.rgw.control 15  0B 0   6.87TiB
   8
 default.rgw.log 16  0B 0   6.87TiB
   0
 scbench 17  0B 0   6.87TiB
   0
 rbdbench18 1.00GiB  0.01   6.87TiB
 260



Regards,
Flo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fastest way to find raw device from OSD-ID? (osd -> lvm lv -> lvm pv -> disk)

2018-10-08 Thread Jakub Jaszewski
Hi Kevin,
Have you tried ceph osd metadata OSDid ?

Jakub

pon., 8 paź 2018, 19:32 użytkownik Alfredo Deza  napisał:

> On Mon, Oct 8, 2018 at 6:09 AM Kevin Olbrich  wrote:
> >
> > Hi!
> >
> > Yes, thank you. At least on one node this works, the other node just
> freezes but this might by caused by a bad disk that I try to find.
>
> If it is freezing, you could maybe try running the command where it
> freezes? (ceph-volume will log it to the terminal)
>
>
> >
> > Kevin
> >
> > Am Mo., 8. Okt. 2018 um 12:07 Uhr schrieb Wido den Hollander <
> w...@42on.com>:
> >>
> >> Hi,
> >>
> >> $ ceph-volume lvm list
> >>
> >> Does that work for you?
> >>
> >> Wido
> >>
> >> On 10/08/2018 12:01 PM, Kevin Olbrich wrote:
> >> > Hi!
> >> >
> >> > Is there an easy way to find raw disks (eg. sdd/sdd1) by OSD id?
> >> > Before I migrated from filestore with simple-mode to bluestore with
> lvm,
> >> > I was able to find the raw disk with "df".
> >> > Now, I need to go from LVM LV to PV to disk every time I need to
> >> > check/smartctl a disk.
> >> >
> >> > Kevin
> >> >
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] commit_latency equals apply_latency on bluestore

2018-10-02 Thread Jakub Jaszewski
Hi Cephers, Hi Gregory,

I consider same case like here, commit_latency==apply_latency in ceph osd
perf

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024317.html

What's the meaning of commit_latency and apply_latency in bluestore OSD
setups[? How useful is it when troubleshooting? How does it correspond to
separated block.db and block.wal ?

Thanks
Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Testing cluster throughput - one OSD is always 100% utilized during rados bench write

2018-10-02 Thread Jakub Jaszewski
Hi Cephers,

I'm testing cluster throughput before moving to the production. Ceph
version 13.2.1 (I'll update to 13.2.2).

I run rados bench from 10 cluster nodes and 10 clients in parallel.
Just after I call rados command, HDDs behind three OSDs are 100% utilized
while others are < 40%. After the short while only one OSD stay 100%
utilized. I've stopped this OSD to eliminate hardware issue, but then
another OSD on another node start hitting 100% disk util during next rados
bench write. The same OSD is fully utilized for each bench run.

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sdd   0,00 0,000,00  518,00 0,00   129,50   512,00
  87,99  155,120,00  155,12   1,93 100,00

The test pool size is 3 (replicated). (Deep)scrubbing is temporary off.

Networking, CPU and memory is underutilized during the test.

Particular rados command is
rados bench --name client.rbd_test -p rbd_test 600 write --no-cleanup
--run-name $(hostname)_bench

The same story with
rados --name client.rbd_test -p rbd_test load-gen --min-object-size 4M
--max-object-size 4M --min-op-len 4M --max-op-len 4M --max-ops 16
--read-percent 0 --target-throughput 1000 --run-length 600

Do you face the same behavior? It smells like particular PG related. Is it
the effect of running number of rados bench tasks in parallel ?

Of course, I do not deny it's cluster limit, but I'm not sure why only one
and always the same OSD keeps hitting 100% util. Tomorrow I'm going to test
cluster using rbd,

How looks your clusters limit ? saturated LACP ? 100% utilized HDDs???

Thanks,
Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Mimic packages not available for Ubuntu Trusty

2018-09-19 Thread Jakub Jaszewski
Hi Cephers,

Any plans for Ceph Mimic packages for Ubuntu Trusty? I found only
ceph-deploy.
https://download.ceph.com/debian-mimic/dists/trusty/main/binary-amd64/

Thanks
Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] total_used statistic incorrect

2018-09-19 Thread Jakub Jaszewski
Hi, I've recently deployed fresh cluster via ceph-ansible. I've not yet
created pools, but storage is used anyway.

[root@ceph01 ~]# ceph version
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
(stable)
[root@ceph01 ~]# ceph df
GLOBAL:
SIZEAVAIL   RAW USED %RAW USED
269 TiB 262 TiB  7.1 TiB  2.64
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
[root@ceph01 ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED
RD_OPS RD WR_OPS WR

total_objects0
total_used   7.1 TiB
total_avail  262 TiB
total_space  269 TiB
[root@ceph01 ~]#


Regards
Jakub

On Wed, Sep 19, 2018 at 2:09 PM  wrote:

> The cluster needs time to remove those objects in the previous pools. What
> you can do is to wait.
>
>
>
>
>
> 发件人: Mike Cave 
> 收件人: ceph-users 
> 日期: 2018/09/19 06:24
> 主题:[ceph-users] total_used statistic incorrect
> 发件人:"ceph-users" 
> --
>
>
>
> Greetings,
>
> I’ve recently run into an issue with my new Mimic deploy.
>
> I created some pools and created volumes and did some general testing. In
> total, there was about 21 TiB used. Once testing was completed, I deleted
> the pools and thus thought I deleted the data.
>
> However, the ‘total_used’ statistic given from running ‘ceph  -s’ shows
> that the space is still consumed. I have confirmed that the pools are
> deleted (rados df) but I cannot get the total_used to reflect the actual
> state of usage on the system.
>
> Have I missed a step in deleting a pool? Is there some other step I need
> to perform other than what I found in the docs?
>
> Please let me know if I can provide any additional data.
>
> Cheers,
> Mike
>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> -
> 本邮件及其附件含有浙江宇视科技有限公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发本邮件中的信息。如果您错收了本邮件请您立即电话或邮件通知发件人并删除本邮件!
> This e-mail and its attachments contain confidential information from
> Uniview, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-20 Thread Jakub Jaszewski
Hi David,

Right, we use this cluster (v12.2.5, fresh installation) for RGW, however,
I don't see default.rgw.gc pool like we have on other cluster which was
upgraded to Luminous, 10.2.9 -> 10.2.10 -> 12.2.2 (I believe that
default.rgw.gc pool is there from the time of setting up RGW on Jewel
version and the pool was automatically created).

On impacted cluster we have below pools
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 2 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 1499 flags hashpspool stripe_width
0 application rgw
pool 2 'rbd' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins
pg_num 2048 pgp_num 2048 last_change 2230 flags hashpspool stripe_width 0
application rbd
pool 3 'default.rgw.control' replicated size 3 min_size 2 crush_rule 2
object_hash rjenkins pg_num 8 pgp_num 8 last_change 1501 flags hashpspool
stripe_width 0 application rgw
pool 4 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 2
object_hash rjenkins pg_num 8 pgp_num 8 last_change 1491 flags hashpspool
stripe_width 0 application rgw
pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 2
object_hash rjenkins pg_num 8 pgp_num 8 last_change 1486 flags hashpspool
stripe_width 0 application rgw
pool 7 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1483 owner
18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 8 'default.rgw.buckets.data' erasure size 12 min_size 10 crush_rule 3
object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 2228 flags
hashpspool max_bytes 879609302220800 stripe_width 36864 application rgw
pool 9 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule
2 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1506 owner
18446744073709551615 flags hashpspool stripe_width 0 application rgw

Eight annoying OSDs match to primary OSDs of PGs that make default.rgw.log
pool.

Many thanks
Jakub


On Mon, Aug 20, 2018 at 11:54 AM David Turner  wrote:

> I'm assuming you use RGW and that you have a GC pool for RGW. It also
> might beat assumed that your GC pool only has 8 PGs.  Are any of those
> guesses correct?
>
> On Mon, Aug 20, 2018, 5:13 AM Jakub Jaszewski 
> wrote:
>
>> Issue tracker http://tracker.ceph.com/issues/23801.
>> Still don't know why only particular OSDs write this information to log
>> files.
>>
>> Jakub
>>
>> On Wed, Aug 8, 2018 at 12:02 PM Jakub Jaszewski <
>> jaszewski.ja...@gmail.com> wrote:
>>
>>> Hi All, exactly the same story today, same 8 OSDs and a lot of garbage
>>> collection objects to process
>>>
>>> Below is the number of "cls_rgw.cc:3284: gc_iterate_entries end_key="
>>> entries per OSD log file
>>> hostA:
>>>   /var/log/ceph/ceph-osd.58.log
>>>   1826467
>>> hostB:
>>>   /var/log/ceph/ceph-osd.88.log
>>>   2924241
>>> hostC:
>>>   /var/log/ceph/ceph-osd.153.log
>>>   581002
>>>   /var/log/ceph/ceph-osd.164.log
>>>   3278606
>>> hostD:
>>>   /var/log/ceph/ceph-osd.95.log
>>>   1426963
>>> hostE:
>>>   /var/log/ceph/ceph-osd.4.log
>>>   2716914
>>>   /var/log/ceph/ceph-osd.53.log
>>>   943749
>>> hostF:
>>>   /var/log/ceph/ceph-osd.172.log
>>>   4085334
>>>
>>>
>>> # radosgw-admin gc list --include-all|grep oid |wc -l
>>> 302357
>>> #
>>>
>>> Can anyone please explain what is going on ?
>>>
>>> Thanks!
>>> Jakub
>>>
>>> On Tue, Aug 7, 2018 at 3:03 PM Jakub Jaszewski <
>>> jaszewski.ja...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> 8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of
>>>> records like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the
>>>> corresponding log files, e.g.
>>>>
>>>> 2018-08-07 04:34:06.000585 7fdd8f012700  0 
>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>> end_key=1_01533616446.000580407
>>>> 2018-08-07 04:34:06.001888 7fdd8f012700  0 
>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>> end_key=1_01533616446.001886318
>>>> 2018-08-07 04:34:06.003395 7fdd8f012700  0 
>>>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>>>> end_key=1_01533616446.003390299
>>>> 2018-08-07 04:34:06.005205 7fdd8f012700  0 
>&g

Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-20 Thread Jakub Jaszewski
Issue tracker http://tracker.ceph.com/issues/23801.
Still don't know why only particular OSDs write this information to log
files.

Jakub

On Wed, Aug 8, 2018 at 12:02 PM Jakub Jaszewski 
wrote:

> Hi All, exactly the same story today, same 8 OSDs and a lot of garbage
> collection objects to process
>
> Below is the number of "cls_rgw.cc:3284: gc_iterate_entries end_key="
> entries per OSD log file
> hostA:
>   /var/log/ceph/ceph-osd.58.log
>   1826467
> hostB:
>   /var/log/ceph/ceph-osd.88.log
>   2924241
> hostC:
>   /var/log/ceph/ceph-osd.153.log
>   581002
>   /var/log/ceph/ceph-osd.164.log
>   3278606
> hostD:
>   /var/log/ceph/ceph-osd.95.log
>   1426963
> hostE:
>   /var/log/ceph/ceph-osd.4.log
>   2716914
>   /var/log/ceph/ceph-osd.53.log
>   943749
> hostF:
>   /var/log/ceph/ceph-osd.172.log
>   4085334
>
>
> # radosgw-admin gc list --include-all|grep oid |wc -l
> 302357
> #
>
> Can anyone please explain what is going on ?
>
> Thanks!
> Jakub
>
> On Tue, Aug 7, 2018 at 3:03 PM Jakub Jaszewski 
> wrote:
>
>> Hi,
>>
>> 8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of records
>> like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the corresponding
>> log files, e.g.
>>
>> 2018-08-07 04:34:06.000585 7fdd8f012700  0 
>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>> end_key=1_01533616446.000580407
>> 2018-08-07 04:34:06.001888 7fdd8f012700  0 
>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>> end_key=1_01533616446.001886318
>> 2018-08-07 04:34:06.003395 7fdd8f012700  0 
>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>> end_key=1_01533616446.003390299
>> 2018-08-07 04:34:06.005205 7fdd8f012700  0 
>> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
>> end_key=1_01533616446.005200341
>>
>> # grep '2018-08-07 04:34:06' /var/log/ceph/ceph-osd.4.log |wc -l
>> 712
>> #
>>
>> At the same time there were like 500 000 expired garbage collection
>> objects.
>>
>> Log level of OSD subsystem is set to default 1/5 across all OSDs.
>>
>> I wonder why only few OSDs record this information and is it something to
>> be logged in log level = 1 or maybe higher?
>> https://github.com/ceph/ceph/blob/v12.2.5/src/cls/rgw/cls_rgw.cc#L3284
>>
>> Thanks
>> Jakub
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-08 Thread Jakub Jaszewski
Hi All, exactly the same story today, same 8 OSDs and a lot of garbage
collection objects to process

Below is the number of "cls_rgw.cc:3284: gc_iterate_entries end_key="
entries per OSD log file
hostA:
  /var/log/ceph/ceph-osd.58.log
  1826467
hostB:
  /var/log/ceph/ceph-osd.88.log
  2924241
hostC:
  /var/log/ceph/ceph-osd.153.log
  581002
  /var/log/ceph/ceph-osd.164.log
  3278606
hostD:
  /var/log/ceph/ceph-osd.95.log
  1426963
hostE:
  /var/log/ceph/ceph-osd.4.log
  2716914
  /var/log/ceph/ceph-osd.53.log
  943749
hostF:
  /var/log/ceph/ceph-osd.172.log
  4085334


# radosgw-admin gc list --include-all|grep oid |wc -l
302357
#

Can anyone please explain what is going on ?

Thanks!
Jakub

On Tue, Aug 7, 2018 at 3:03 PM Jakub Jaszewski 
wrote:

> Hi,
>
> 8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of records
> like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the corresponding
> log files, e.g.
>
> 2018-08-07 04:34:06.000585 7fdd8f012700  0 
> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
> end_key=1_01533616446.000580407
> 2018-08-07 04:34:06.001888 7fdd8f012700  0 
> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
> end_key=1_01533616446.001886318
> 2018-08-07 04:34:06.003395 7fdd8f012700  0 
> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
> end_key=1_01533616446.003390299
> 2018-08-07 04:34:06.005205 7fdd8f012700  0 
> /build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
> end_key=1_01533616446.005200341
>
> # grep '2018-08-07 04:34:06' /var/log/ceph/ceph-osd.4.log |wc -l
> 712
> #
>
> At the same time there were like 500 000 expired garbage collection
> objects.
>
> Log level of OSD subsystem is set to default 1/5 across all OSDs.
>
> I wonder why only few OSDs record this information and is it something to
> be logged in log level = 1 or maybe higher?
> https://github.com/ceph/ceph/blob/v12.2.5/src/cls/rgw/cls_rgw.cc#L3284
>
> Thanks
> Jakub
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Tons of "cls_rgw.cc:3284: gc_iterate_entries end_key=" records in OSD logs

2018-08-07 Thread Jakub Jaszewski
Hi,

8 out of 192 OSDs in our cluster (version 12.2.5) write plenty of records
like "cls_rgw.cc:3284: gc_iterate_entries end_key=" to the corresponding
log files, e.g.

2018-08-07 04:34:06.000585 7fdd8f012700  0 
/build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
end_key=1_01533616446.000580407
2018-08-07 04:34:06.001888 7fdd8f012700  0 
/build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
end_key=1_01533616446.001886318
2018-08-07 04:34:06.003395 7fdd8f012700  0 
/build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
end_key=1_01533616446.003390299
2018-08-07 04:34:06.005205 7fdd8f012700  0 
/build/ceph-12.2.5/src/cls/rgw/cls_rgw.cc:3284: gc_iterate_entries
end_key=1_01533616446.005200341

# grep '2018-08-07 04:34:06' /var/log/ceph/ceph-osd.4.log |wc -l
712
#

At the same time there were like 500 000 expired garbage collection objects.

Log level of OSD subsystem is set to default 1/5 across all OSDs.

I wonder why only few OSDs record this information and is it something to
be logged in log level = 1 or maybe higher?
https://github.com/ceph/ceph/blob/v12.2.5/src/cls/rgw/cls_rgw.cc#L3284

Thanks
Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous 12.2.5 - crushable RGW

2018-07-16 Thread Jakub Jaszewski
Hi,
We run 5 RADOS Gateways on Luminous 12.2.5 as upstream servers in nginx
active-active setup, based on keepalived.
Cluster is 12x Ceph nodes (16x 10TB OSD(bluestore) per node, 2x 10Gb
network link shared by access and cluster networks), RGW pool is EC 9+3.

We recently noticed below entries in RGW logs:

2018-07-11 06:19:13.726392 7f2eeed46700  1 == starting new request
req=0x7f2eeed402c0 =
2018-07-11 06:19:13.871358 7f2eeed46700  0 NOTICE: resharding operation on
bucket index detected, blocking
2018-07-11 06:19:58.953816 7f2eeed46700  0 block_while_resharding ERROR:
bucket is still resharding, please retry
2018-07-11 06:19:58.959424 7f2eeed46700  0 NOTICE: resharding operation on
bucket index detected, blocking
2018-07-11 06:20:44.088045 7f2eeed46700  0 block_while_resharding ERROR:
bucket is still resharding, please retry
2018-07-11 06:20:44.090664 7f2eeed46700  0 NOTICE: resharding operation on
bucket index detected, blocking
2018-07-11 06:21:29.141182 7f2eeed46700  0 block_while_resharding ERROR:
bucket is still resharding, please retry
2018-07-11 06:21:29.146598 7f2eeed46700  0 NOTICE: resharding operation on
bucket index detected, blocking
2018-07-11 06:22:14.178369 7f2eeed46700  0 block_while_resharding ERROR:
bucket is still resharding, please retry
2018-07-11 06:22:14.181697 7f2eeed46700  0 NOTICE: resharding operation on
bucket index detected, blocking
2018-07-11 06:22:34.199763 7f2eeed46700  1 == req done
req=0x7f2eeed402c0 op status=0 http_status=200 ==
2018-07-11 06:22:34.199851 7f2eeed46700  1 civetweb: 0x5599a1158000:
10.195.17.6 - - [11/Jul/2018:06:10:11 +] "PUT
/BUCKET/PATH/OBJECT?partNumber=2&uploadId=2~ol_fQw_u7eKRjuP1qVwnj5V12GxDYXu
HTTP/1.1" 200 0 - -

Causing 'upstream timed out (110: Connection timed out) while reading
response header from upstream' errors and 504 response code on nginx side
due to 30 seconds timeout.

Other recurring log entries look like:

2018-07-11 06:20:47.407632 7f2e97c98700  1 == starting new request
req=0x7f2e97c922c0 =
2018-07-11 06:20:47.412455 7f2e97c98700  0 NOTICE: resharding operation on
bucket index detected, blocking
2018-07-11 06:21:32.424983 7f2e97c98700  0 block_while_resharding ERROR:
bucket is still resharding, please retry
2018-07-11 06:21:32.426597 7f2e97c98700  0 NOTICE: resharding operation on
bucket index detected, blocking
2018-07-11 06:22:17.67 7f2e97c98700  0 block_while_resharding ERROR:
bucket is still resharding, please retry
2018-07-11 06:22:17.492217 7f2e97c98700  0 NOTICE: resharding operation on
bucket index detected, blocking

2018-07-11 06:22:32.495254 7f2e97c98700  0 ERROR: update_bucket_id()
new_bucket_id=d644765c-1705-49b2-9609-a8511d3c4fed.151639.105 returned
r=-125
2018-07-11 06:22:32.495386 7f2e97c98700  0 WARNING: set_req_state_err
err_no=125 resorting to 500

2018-07-11 06:22:32.495509 7f2e97c98700  1 == req done
req=0x7f2e97c922c0 op status=-125 http_status=500 ==
2018-07-11 06:22:32.495569 7f2e97c98700  1 civetweb: 0x5599a14f4000:
10.195.17.6 - - [11/Jul/2018:06:19:25 +] "POST PUT
/BUCKET/PATH/OBJECT?uploads HTTP/1.1" 500 0 - -


To avoid 504 & 500 responses we disabled dynamic resharding via
'rgw_dynamic_resharding = false'. Not sure if setting nginx
'proxy_read_timeout' option to value higher than bucket resharding time is
a good idea.

Once done, 'block_while_resharding ERROR: bucket is still resharding,
please retry' disappeared from RGW logs, however, another ERROR is now
logged and then RGWs catch singal Aborted and get restarted by systemd:

2018-07-13 05:27:31.149618 7f7eb72c7700  1 == starting new request
req=0x7f7eb72c12c0 =
2018-07-13 05:27:52.593413 7f7eb72c7700  0 ERROR: flush_read_list():
d->client_cb->handle_data() returned -5
2018-07-13 05:27:52.594040 7f7eb72c7700  1 == req done
req=0x7f7eb72c12c0 op status=-5 http_status=206 ==
2018-07-13 05:27:52.594633 7f7eb72c7700  1 civetweb: 0x55ab3171b000:
10.195.17.6 - - [13/Jul/2018:05:24:28 +] "GET /BUCKET/PATH/OBJECT_580MB
HTTP/1.1" 206 0 - Hadoop 2.7.3.2.5.3.0-37, aws-sdk-java/1.10.6
Linux/4.4.0-97-generic Java_HotSpot(TM)_64-Bit_Server_VM/25.77-b03/1.8.0_77

We see ~40 such ERRORs (each GET requests ~580 MB object) prior to the RGW
crush:

2018-07-13 05:21:43.993778 7fcce6575700  1 == starting new request
req=0x7fcce656f2c0 =
2018-07-13 05:22:16.137676 7fcce6575700 -1
/build/ceph-12.2.5/src/common/buffer.cc: In function 'void
ceph::buffer::list::append(const ceph::buffer::ptr&, unsigned int, unsigned
int)' thread 7fcce6575700 time 2018-07-13 05:22:16.135271
/build/ceph-12.2.5/src/common/buffer.cc: 1967: FAILED assert(len+off <=
bp.length())

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7fcd5b7aab72]
 2: (ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int,
unsigned int)+0x118) [0x7fcd64993cf8]
 3: (RGWPutObj_ObjStore::get_data(ceph::buffer::list&)+0xd

Re: [ceph-users] Replicated pool with an even size - has min_size to be bigger than half the size?

2018-03-29 Thread Jakub Jaszewski
On Thu, Mar 29, 2018 at 12:25 PM, Janne Johansson 
wrote:

>
>
> 2018-03-29 11:50 GMT+02:00 David Rabel :
>
>> On 29.03.2018 11:43, Janne Johansson wrote:
>> > 2018-03-29 11:39 GMT+02:00 David Rabel :
>> >
>> >> For example a replicated pool with size 4: Do i always have to set the
>> >> min_size to 3? Or is there a way to use min_size 2 and use some other
>> >> node as a decision maker in case of split brain?
>> >>
>> >
>> > min_size doesn't arbitrate decisions other than
>> > "can I write if there are only X visible copies?", where X needs to be >
>> > min_size
>> > to allow writes.
>>
> ​I think X >= allows IO to data pool​


> >
>> > It doesn't control any logic, it controls the risk level you want to
>> take.
>>
>> You are right. But with my above example: If I have min_size 2 and size
>> 4, and because of a network issue the 4 OSDs are split into 2 and 2, is
>> it possible that I have write operations on both sides and therefore
>> have inconsistent data?
>>
>
> You always write to the primary, which in turn sends copies to the 3
> others,
> so in the 2+2 split case, only one side can talk to the primary OSD for
> that pg,
> so writes will just happen on one side at most.
>

Could be that your OSDs will be marked as DOWN, see
http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/#osds-check-heartbeats
​
Regarding MON split see
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025756.html

>
>
> --
> May the most significant bit of your life be positive.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs stuck activating after adding new OSDs

2018-03-28 Thread Jakub Jaszewski
Hi Jon,  can you reweight one OSD to default value and share outcome of "ceph
osd df tree; ceph -s; ceph health detail"  ?

Recently I was adding new node, 12x 4TB, one disk at a time and faced
activating+remapped state for few hours.

Not sure but maybe that was caused by "osd_max_backfills" value and
backfill awaiting PGs queue.

# ceph -s
>   cluster:
> id: 1023c49f-3a10-42de-9f62-9b122db21e1e
> health: HEALTH_WARN
> noscrub,nodeep-scrub flag(s) set
> 1 nearfull osd(s)
> 19 pool(s) nearfull
> 6982/289660233 objects misplaced (11.509%)
> Reduced data availability: 29 pgs inactive
> Degraded data redundancy: 788023/289660233 objects degraded
> (0.272%), 782 pgs unclean, 54 pgs degraded, 48 pgs undersized
>
>   services:
> mon: 3 daemons, quorum mon1,mon2,mon3
> mgr: mon2(active), standbys: mon3, mon1
> osd: 120 osds: 120 up, 120 in; 779 remapped pgs
>  flags noscrub,nodeep-scrub
> rgw: 3 daemons active
>
>   data:
> pools:   19 pools, 3760 pgs
> objects: 38285k objects, 146 TB
> usage:   285 TB used, 150 TB / 436 TB avail
> pgs: 0.771% pgs not active
>  788023/289660233 objects degraded (0.272%)
>  6982/289660233 objects misplaced (11.509%)
>  2978 active+clean
>  646  active+remapped+backfill_wait
>  57   active+remapped+backfilling
>  27   active+undersized+degraded+remapped+backfill_wait
>  25   activating+remapped
>  17   active+undersized+degraded+remapped+backfilling
>  4activating+undersized+degraded+remapped
>  3active+recovery_wait+degraded
>  3active+recovery_wait+degraded+remapped
>
>   io:
> client:   2228 kB/s rd, 54831 kB/s wr, 539 op/s rd, 756 op/s wr
> recovery: 1360 MB/s, 348 objects/s


Now all PGs are active+clean.


Regards
Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All pools full after one OSD got OSD_FULL state

2018-03-05 Thread Jakub Jaszewski
One full OSD has caused that all pools got full. Can anyone help me
understand this ?

During ongoing PGs backfilling I see that MAX AVAIL values are changing
when USED values are constant.


GLOBAL:
SIZE AVAIL RAW USED %RAW USED
425T  145T 279T 65.70
POOLS:
NAME   ID USED   %USED MAX AVAIL
 OBJECTS
volumes3  41011G 91.14 3987G
 10520026
default.rgw.buckets.data   20   105T 93.11 7974G
 28484000




GLOBAL:
SIZE AVAIL RAW USED %RAW USED
425T  146T 279T 65.66
POOLS:
NAME   ID USED   %USED MAX AVAIL
 OBJECTS
volumes3  41013G 88.66 5246G
 10520539
default.rgw.buckets.data   20   105T 91.1310492G
 28484000


>From what I can read in docs The MAX AVAIL value is a complicated function
of the replication or erasure code used, the CRUSH rule that maps storage
to devices, the utilization of those devices, and the configured
mon_osd_full_ratio.

Any clue what more I can do to make better use of available raw storage ?
Increase number of PGs for better balanced OSDs utilization ?

Thanks
Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] All pools full after one OSD got OSD_FULL state

2018-03-03 Thread Jakub Jaszewski
Hi Ceph Admins,

This night our ceph cluster got all pools 100% full. This happend after
osd.56 (95% used) reached OSD_FULL state.

ceph versions 12.2.2

Logs

2018-03-03 17:15:22.560710 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224452
: cluster [ERR] overall HEALTH_ERR noscrub,nodeep-scrub flag(s) set; 1
backfillfull osd(s); 5 nearfull osd(s); 21 pool(s) backfillfull;
638551/287271738 objects misplaced (0.222%); Degraded data redundancy:
253066/287271738 objects degraded (0.088%), 25 pgs unclean; Degraded data
redundancy (low space): 25 pgs backfill_toofull
2018-03-03 17:15:42.513194 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224515
: cluster [WRN] Health check update: 638576/287284518 objects misplaced
(0.222%) (OBJECT_MISPLACED)
2018-03-03 17:15:42.513256 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224516
: cluster [WRN] Health check update: Degraded data redundancy:
253266/287284518 objects degraded (0.088%), 25 pgs unclean (PG_DEGRADED)
2018-03-03 17:15:44.684928 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224524
: cluster [ERR] Health check failed: 1 full osd(s) (OSD_FULL)
2018-03-03 17:15:44.684969 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224525
: cluster [WRN] Health check failed: 21 pool(s) full (POOL_FULL)
2018-03-03 17:15:44.684987 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224526
: cluster [INF] Health check cleared: OSD_BACKFILLFULL (was: 1 backfillfull
osd(s))
2018-03-03 17:15:44.685013 mon.cephnode01 mon.0 10.212.32.18:6789/0 5224527
: cluster [INF] Health check cleared: POOL_BACKFILLFULL (was: 21 pool(s)
backfillfull)


# ceph df detail from crush time
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
381T  102T 278T 73.05  38035k
POOLS:
NAME   ID QUOTA OBJECTS QUOTA BYTES
 USED   %USED  MAX AVAIL OBJECTS  DIRTY  READ
 WRITE  RAW USED
rbd0  N/A   N/A
  0  0 00  0  1
 134k0
vms1  N/A   N/A
  0  0 00  0  0
  00
images 2  N/A   N/A
  7659M 100.00 0 1022   1022   110k
 5668   22977M
volumes3  N/A   N/A
 40991G 100.00 0 10514980 10268k  3404M
4087M 120T
.rgw.root  4  N/A   N/A
   1588 100.00 04  4   402k
  4 4764
default.rgw.control5  N/A   N/A
  0  0 08  8  0
  00
default.rgw.data.root  6  N/A   N/A
  94942 100.00 0  339339   257k
 6422 278k
default.rgw.gc 7  N/A   N/A
  0  0 0   32 32  3125M
7410k0
default.rgw.log8  N/A   N/A
  0  0 0  186186 27222k
 18146k0
default.rgw.users.uid  9  N/A   N/A
   4252 100.00 0   17 17   262k
6456112756
default.rgw.usage  10 N/A   N/A
  0  0 08  8   332k
 665k0
default.rgw.users.email11 N/A   N/A
 87 100.00 04  4  0
  4  261
default.rgw.users.keys 12 N/A   N/A
206 100.00 0   11 11459
 23  618
default.rgw.users.swift13 N/A   N/A
 40 100.00 03  3  0
  3  120
default.rgw.buckets.index  14 N/A   N/A
  0  0 0  210210   321M
 41709k0
default.rgw.buckets.non-ec 16 N/A   N/A
  0  0 0  114114  18006
120550
default.rgw.buckets.extra  17 N/A   N/A
  0  0 00  0  0
  00
.rgw.buckets.extra 18 N/A   N/A
  0  0 00  0  0
  00
default.rgw.buckets.data   20 N/A   N/A
   104T 100.00 0 28334451 27670k   160M
 156M 156T
benchmark_replicated   21 N/A   N/A
 87136M 100.00 021792  21792  1450k
4497k 255G
benchmark_erasure_coded22 N/A   N/A
   292G 100.00 074779  

Re: [ceph-users] High apply latency

2018-02-06 Thread Jakub Jaszewski
​Hi Frederic,

I've not enable debug level logging on all OSDs, just on one for the test,
need to double check that.
But looks that merging is ongoing on few OSDs or OSDs are faulty, I will
dig into that tomorrow.
Write bandwidth is very random

# rados bench -p default.rgw.buckets.data 120 write
hints = 1
Maintaining 16 concurrent writes of 4194432 bytes to objects of size
4194432 for up to 120 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_59104
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
 0
1  16   155   139555.93   556.017   0.0750027
 0.10687
2  16   264   248   495.936   436.0130.154185
0.118693
3  16   330   314   418.616   264.0080.118476
0.142667
4  16   415   399   398.953340.01   0.0873379
 0.15102
5  16   483   467   373.557   272.0080.750453
0.159819
6  16   532   516   343.962   196.006   0.0298334
0.171218
7  16   617   601   343.391340.010.192698
0.177288
8  16   700   684   341.963332.01   0.0281355
0.171277
9  16   762   746   331.521   248.008   0.0962037
0.163734
   10  16   804   788   315.167   168.005 1.40356
0.196298
   11  16   897   881320.33   372.011   0.0369085
 0.19496
   12  16   985   969   322.966   352.011   0.0290563
0.193986
   13  15  1106  1091   335.657   488.015   0.0617642
0.188703
   14  16  1166  1150   328.537   236.007   0.0401884
0.186206
   15  16  1251  1235   329.299340.010.171256
0.190974
   16  16  1339  1323   330.716   352.0110.024222
0.189901
   17  16  1417  1401   329.613312.01   0.0289473
0.186562
   18  16  1465  1449   321.967   192.0060.028123
0.189153
   19  16  1522  1506317.02   228.0070.265448
0.188288
2018-02-06 13:43:21.412512 min lat: 0.0204657 max lat: 3.61509 avg lat:
0.18918
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
   20  16  1564  1548   309.568   168.005   0.0327581
 0.18918
   21  16  1636  1620308.54   288.009   0.0715159
0.187381
   22  16  1673  1657   301.242   148.005 1.57285
0.191596
   23  16  1762  1746   303.621   356.011 6.00352
0.206217
   24  16  1885  1869   311.468   492.015   0.0298435
0.203874
   25  16  2010  1994   319.008   500.015   0.0258761
0.199652
   26  16  2116  2100   323.044   424.013   0.0533319
 0.19631
   27  16  2201  2185323.67340.010.134796
0.195953
   28  16  2257  2241320.11   224.0070.473629
0.196464
   29  16  2333  2317   319.554   304.009   0.0362741
0.198054
   30  16  2371  2355   313.968   152.0050.438141
0.200265
   31  16  2459  2443   315.194   352.011   0.0610629
0.200858
   32  16  2525  2509   313.593   264.008   0.0234799
0.201008
   33  16  2612  2596   314.635   348.0110.072019
0.199094
   34  16  2682  2666   313.615   280.009 0.10062
0.197586
   35  16  2757  2741   313.225   300.009   0.0552581
0.196981
   36  16  2849  2833   314.746   368.0110.257323
 0.19565
   37  16  2891  2875   310.779   168.005   0.0918386
 0.19556
   38  16  2946  2930308.39   220.007   0.0276621
0.195792
   39  16  2975  2959   303.456   116.004   0.0588971
 0.19952
2018-02-06 13:43:41.415107 min lat: 0.0204657 max lat: 7.9873 avg lat:
0.198749
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
   40  16  3060  3044   304.369340.01   0.0217136
0.198749
   41  16  3098  3082   300.652   152.005   0.0717398
0.199052
   42  16  3141  3125   297.589   172.005   0.0257422
0.201899
   43  15  3241  3226   300.063   404.012   0.0733869
0.209446
   44  16  3332  3316   301.424   360.011   0.0327249
0.206686
   45  16  3430  3414   303.436   392.012   0.0413156
0.203727
   46  16  3534  3518   305.882   416.0130.033638
0.202182
   47  16  3602  3586   305.161   272.008   0.0453557
0.200028
   48  16  3663  3647   303.886   244.007   0.0779019
0.199777
   49  16  3736  3720   303.643   292.009   0.0285231
0.206274
   50  16  3849  3833   306.609   452.014   0.0537071
0.208127
   51  16  3909  3893   305.303   240.007   0.0366709
0.207793
   52  16  3972  3956   304.277   252.008   0.0289131
0.207989
   53  16  4048  4032   304.272   304.009   0.0348617
0.207844
   54  16  4114  4098   303.525   264.008   0.0799526
 0.20701
   55  16  419

Re: [ceph-users] High apply latency

2018-02-02 Thread Jakub Jaszewski
IOPS: 0
Average Latency(s):   0.0399277
Max latency(s):   2.54257
Min latency(s):   0.00632494
#

# rados bench -p benchmark_replicated 10 rand
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
 0
1  15   515   500   1999.26  2000   0.0260058
 0.0289045
2  16  1008   992   1983.45  1968  0.00914651
 0.0297668
3  16  1512  1496   1994.23  2016   0.0163293
 0.0309669
4  16  1996  19801979.6  1936   0.0123961
 0.0313833
5  15  2486  2471   1976.43  1964   0.0318256
 0.0312294
6  16  2992  2976   1983.64  2020   0.0346031
 0.0313301
7  15  3498  3483   1989.94  2028   0.0119796
 0.0314029
8  16  4018  4002   2000.65  2076   0.0374133
 0.0312428
9  16  4558  4542   2018.33  21600.024143
 0.0308669
   10  15  5101  5086   2034.07  2176   0.0317191
 0.0307552
Total time run:   10.032364
Total reads made: 5101
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   2033.82
Average IOPS: 508
Stddev IOPS:  20
Max IOPS: 544
Min IOPS: 484
Average Latency(s):   0.0307879
Max latency(s):   1.3466
Min latency(s):   0.00688148
#

Regards
Jakub

On Thu, Feb 1, 2018 at 3:33 PM, Jakub Jaszewski 
wrote:

> Regarding split & merge, I have default values
> filestore_merge_threshold = 10
> filestore_split_multiple = 2
>
> according to https://bugzilla.redhat.com/show_bug.cgi?id=1219974 the
> recommended values are
>
> filestore_merge_threshold = 40
> filestore_split_multiple = 8
>
> Is it something that I can easily change to default or lower values than
> proposed in case of further performance degradation ?
>
> I did tests of 4 pools: 2 replicated pools (x3 ) and 2 EC  pools (k=6,m=3)
>
> The pool with the lowest bandwidth has osd tree structure like
> ├── 20.115s1_head
> │   └── DIR_5
> │   └── DIR_1
> │   ├── DIR_1
> │   │   ├── DIR_0
> │   │   ├── DIR_1
> │   │   ├── DIR_2
> │   │   │   ├── DIR_0
> │   │   │   ├── DIR_1
> │   │   │   ├── DIR_2
> │   │   │   ├── DIR_3
> │   │   │   ├── DIR_4
> │   │   │   ├── DIR_5
> │   │   │   ├── DIR_6
> │   │   │   ├── DIR_7
> │   │   │   ├── DIR_8
> │   │   │   ├── DIR_9
> │   │   │   ├── DIR_A
> │   │   │   ├── DIR_B
> │   │   │   ├── DIR_C
> │   │   │   ├── DIR_D
> │   │   │   ├── DIR_E
> │   │   │   └── DIR_F
>
>
> Tests results
>
> # rados bench -p default.rgw.buckets.data 10 write
> hints = 1
> Maintaining 16 concurrent writes of 4194432 bytes to objects of size
> 4194432 for up to 10 seconds or 0 objects
> Object prefix: benchmark_data_sg08-09_180679
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
> 0   0 0 0 0 0   -
>  0
> 1  16   129   113   451.975   452.014   0.0376714
> 0.128027
> 2  16   209   193   385.964320.010.119609
> 0.138517
> 3  16   235   219   291.974   104.003   0.0337624
>  0.13731
> 4  16   235   219   218.981 0   -
>  0.13731
> 5  16   266   250   199.983   62.00190.111673
> 0.238424
> 6  16   317   301   200.649   204.006   0.0340569
> 0.298489
> 7  16   396   380   217.124316.01   0.0379956
> 0.283458
> 8  16   444   428   213.981   192.006   0.0304383
> 0.274193
> 9  16   485   469   208.426   164.0050.391956
> 0.283421
>10  16   496   480   191.983   44.00130.104497
> 0.292074
>11  16   497   481   174.894   4.000120.85
> 0.293545
>12  16   497   481160.32 0   -
> 0.293545
>13  16   497   481   147.987 0   -
> 0.293545
>14  16   497   481   137.417 0   -
> 0.293545
> Total time run: 14.493353
> Total writes made:  497
> Write size: 4194432
> Object size:4194432
> Bandwidth (MB/sec): 137.171
> Stddev Bandwidth:   147.001
> Max bandwidth (MB/sec): 452.014
> Min bandwidth (MB/sec): 0
> Average IOPS:   34
> Stddev IOPS:36
> Max IOPS:   113
> Min IOPS:   0
> Average Latency(s): 0.464281
> Stddev Latency(s):  1.09388
> Max latency

Re: [ceph-users] High apply latency

2018-02-01 Thread Jakub Jaszewski
Regarding split & merge, I have default values
filestore_merge_threshold = 10
filestore_split_multiple = 2

according to https://bugzilla.redhat.com/show_bug.cgi?id=1219974 the
recommended values are

filestore_merge_threshold = 40
filestore_split_multiple = 8

Is it something that I can easily change to default or lower values than
proposed in case of further performance degradation ?

I did tests of 4 pools: 2 replicated pools (x3 ) and 2 EC  pools (k=6,m=3)

The pool with the lowest bandwidth has osd tree structure like
├── 20.115s1_head
│   └── DIR_5
│   └── DIR_1
│   ├── DIR_1
│   │   ├── DIR_0
│   │   ├── DIR_1
│   │   ├── DIR_2
│   │   │   ├── DIR_0
│   │   │   ├── DIR_1
│   │   │   ├── DIR_2
│   │   │   ├── DIR_3
│   │   │   ├── DIR_4
│   │   │   ├── DIR_5
│   │   │   ├── DIR_6
│   │   │   ├── DIR_7
│   │   │   ├── DIR_8
│   │   │   ├── DIR_9
│   │   │   ├── DIR_A
│   │   │   ├── DIR_B
│   │   │   ├── DIR_C
│   │   │   ├── DIR_D
│   │   │   ├── DIR_E
│   │   │   └── DIR_F


Tests results

# rados bench -p default.rgw.buckets.data 10 write
hints = 1
Maintaining 16 concurrent writes of 4194432 bytes to objects of size
4194432 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_180679
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
 0
1  16   129   113   451.975   452.014   0.0376714
0.128027
2  16   209   193   385.964320.010.119609
0.138517
3  16   235   219   291.974   104.003   0.0337624
 0.13731
4  16   235   219   218.981 0   -
 0.13731
5  16   266   250   199.983   62.00190.111673
0.238424
6  16   317   301   200.649   204.006   0.0340569
0.298489
7  16   396   380   217.124316.01   0.0379956
0.283458
8  16   444   428   213.981   192.006   0.0304383
0.274193
9  16   485   469   208.426   164.0050.391956
0.283421
   10  16   496   480   191.983   44.00130.104497
0.292074
   11  16   497   481   174.894   4.000120.85
0.293545
   12  16   497   481160.32 0   -
0.293545
   13  16   497   481   147.987 0   -
0.293545
   14  16   497   481   137.417 0   -
0.293545
Total time run: 14.493353
Total writes made:  497
Write size: 4194432
Object size:4194432
Bandwidth (MB/sec): 137.171
Stddev Bandwidth:   147.001
Max bandwidth (MB/sec): 452.014
Min bandwidth (MB/sec): 0
Average IOPS:   34
Stddev IOPS:36
Max IOPS:   113
Min IOPS:   0
Average Latency(s): 0.464281
Stddev Latency(s):  1.09388
Max latency(s): 6.3723
Min latency(s): 0.023835
Cleaning up (deleting benchmark objects)
Removed 497 objects
Clean up completed and total clean up time :10.622382
#


# rados bench -p benchmark_erasure_coded 10 write
hints = 1
Maintaining 16 concurrent writes of 4202496 bytes to objects of size
4202496 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_sg08-09_180807
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
 0
1  16   424   408   1635.11   1635.19   0.0490434
 0.0379616
2  16   828   812   1627.03   1619.16   0.0616501
 0.0388467
3  16  1258  1242   1659.06   1723.36   0.0304412
 0.0384537
4  16  1659  1643   1646.03   1607.13   0.0155402
 0.0387351
5  16  2053  2037   1632.61   1579.08   0.0453354
 0.0390236
6  16  2455  2439  1629   1611.14   0.0485313
 0.0392376
7  16  2649  2633   1507.34   777.516   0.0148972
 0.0393161
8  16  2858  2842   1423.61   837.633   0.0157639
 0.0449088
9  16  3245  3229   1437.75   1551.02   0.0200845
 0.0444847
   10  16  3629  3613   1447.85  1539   0.0654451
 0.0441569
Total time run: 10.229591
Total writes made:  3630
Write size: 4202496
Object size:4202496
Bandwidth (MB/sec): 1422.18
Stddev Bandwidth:   341.609
Max bandwidth (MB/sec): 1723.36
Min bandwidth (MB/sec): 777.516
Average IOPS:   354
Stddev IOPS:85
Max IOPS:   430
Min IOPS:   194
Average Latency(s): 0.0448612
Stddev Latency(s):  0.0712224
Max latency(s): 1.08353
Min latency(s): 0.0134629
Cleaning up (deleting benchmark objects)
Removed 3630 objects
Clean up completed and total clean up time :2.321669
#



# rados bench -p volumes 10 write
hints = 1
Maintain

Re: [ceph-users] High apply latency

2018-02-01 Thread Jakub Jaszewski
  0.022683
 0.0450504
   17  16  6035  6019   1416.05  1376   0.0702069
 0.0451103
   18  16  6397  6381   1417.82  1448   0.0231964
 0.0450781
   19  16  6750  67341417.5  1412   0.0131453
 0.0450462
2018-02-01 08:30:57.941618 min lat: 0.0117176 max lat: 0.794775 avg lat:
0.0451095
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
   20  16  7100  7084   1416.62  1400   0.0239063
 0.0451095
Total time run: 20.040338
Total writes made:  7100
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 1417.14
Stddev Bandwidth:   40.6598
Max bandwidth (MB/sec): 1496
Min bandwidth (MB/sec): 1360
Average IOPS:   354
Stddev IOPS:10
Max IOPS:   374
Min IOPS:   340
Average Latency(s): 0.0451394
Stddev Latency(s):  0.0264402
Max latency(s): 0.794775
Min latency(s): 0.0117176
Cleaning up (deleting benchmark objects)
Removed 7100 objects
Clean up completed and total clean up time :0.658175


Thanks
Jakub



On Thu, Feb 1, 2018 at 12:43 AM, Sergey Malinin  wrote:

> Deep scrub is I/O-expensive. If deep scrub is unnecessary, you can disable
> it with "ceph osd pool set  nodeep-scrub".
>
> On Thursday, February 1, 2018 at 00:10, Jakub Jaszewski wrote:
>
>  3active+clean+scrubbing+deep
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
Hi Luis,

Thanks for your comment, I see high %util for few HDDs per each ceph node
but actually there is very low traffic from client.

iostat -xd shows ongoing operations

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0,00 1,600,003,00 0,0018,4012,27
   0,000,000,000,00   0,00   0,00
sdb   0,00 0,200,608,0014,40   488,10   116,86
   0,000,568,000,00   0,56   0,48
sdc   0,00 0,00  153,801,80 25304,0031,30   325,65
   1,127,207,280,00   4,21  65,52
sdd   0,00 5,40  406,80   44,00 102275,20  3295,60
 468,37 1,854,124,292,53   2,11  95,12
sde   0,00 0,603,20   12,0051,20  2461,50   330,62
   0,074,32   12,252,20   2,63   4,00
sdf   0,00 0,401,408,2044,00  1424,90   306,02
   0,011,50   10,290,00   1,50   1,44
sdg   0,00 0,60   92,80   19,00  5483,20  2998,90   151,74
   0,988,74   10,360,84   7,40  82,72
sdh   0,00 0,00  154,401,40 25299,2074,20   325,72
   1,076,886,940,00   4,07  63,44
sdi   0,00 0,000,207,8012,80   397,50   102,58
   0,000,30   12,000,00   0,30   0,24
sdj   0,00 0,200,004,00 0,00   645,60   322,80
   0,000,000,000,00   0,00   0,00
sdk   0,00 0,201,40   15,6032,00  1956,50   233,94
   0,021,08   13,140,00   1,08   1,84
sdl   0,00 0,400,604,0016,80   447,00   201,65
   0,024,35   20,002,00   2,78   1,28
sdm   0,00 0,00   10,004,40   232,00   521,80   104,69
   0,085,898,480,00   4,89   7,04
dm-0  0,00 0,000,004,60 0,0018,40 8,00
   0,000,000,000,00   0,00   0,00
nvme0n1   0,00 0,000,00  124,80 0,00 10366,40   166,13
   0,010,120,000,12   0,03   0,32

when ceph -s shows low client traffic

# ceph -s
  cluster:
id: 1023c49f-3b10-42de-9f62-9b122db32a9a
health: HEALTH_OK

  services:
mon: 3 daemons, quorum host01,host02,host03
mgr: host02(active), standbys: host01, host03
osd: 108 osds: 106 up, 106 in
rgw: 3 daemons active

  data:
pools:   22 pools, 4880 pgs
objects: 31121k objects, 119 TB
usage:   241 TB used, 143 TB / 385 TB avail
pgs: 4875 active+clean
 3active+clean+scrubbing+deep
 2active+clean+scrubbing

  io:
client:   17646 B/s rd, 19038 kB/s wr, 4 op/s rd, 175 op/s wr


Is there any background tasks running and utilizing disks? Is it scrubbing
generating this load?
 3active+clean+scrubbing+deep
 2active+clean+scrubbing




 Thanks
Jakub

On Wed, Jan 31, 2018 at 3:59 PM, Luis Periquito  wrote:

> on a cursory look of the information it seems the cluster is
> overloaded with the requests.
>
> Just a guess, but if you look at IO usage on those spindles they'll be
> at or around 100% usage most of the time.
>
> If that is the case then increasing the pg_num and pgp_num won't help,
> and short term, will make it worse.
>
> Metadata pools (like default.rgw.buckets.index) really excel in a SSD
> pool, even if small. I carved a small OSD in the journal SSDs for
> those kinds of workloads.
>
> On Wed, Jan 31, 2018 at 2:26 PM, Jakub Jaszewski
>  wrote:
> > Is it safe to increase pg_num and pgp_num from 1024 up to 2048 for
> volumes
> > and default.rgw.buckets.data pools?
> > How will it impact cluster behavior? I guess cluster rebalancing will
> occur
> > and will take long time considering amount of data we have on it ?
> >
> > Regards
> > Jakub
> >
> >
> >
> > On Wed, Jan 31, 2018 at 1:37 PM, Jakub Jaszewski <
> jaszewski.ja...@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I'm wondering why slow requests are being reported mainly when the
> request
> >> has been put into the queue for processing by its PG  (queued_for_pg ,
> >> http://docs.ceph.com/docs/master/rados/troubleshooting/
> troubleshooting-osd/#debugging-slow-request).
> >> Could it be due too low pg_num/pgp_num ?
> >>
> >> It looks that slow requests are mainly addressed to
> >> default.rgw.buckets.data (pool id 20) , volumes (pool id 3) and
> >> default.rgw.buckets.index (pool id 14)
> >>
> >> 2018-01-31 12:06:55.899557 osd.59 osd.59 10.212.32.22:6806/4413 38 :
> >> cluster [WRN] slow request 30.125793 seconds old, received at 2018-01-31
> >> 12:06:25.773675: osd_op(client.857003.0:126171692 3.a4fec1ad 3.a4fec1ad
>

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
Is it safe to increase pg_num and pgp_num from 1024 up to 2048 for volumes
and default.rgw.buckets.data pools?
How will it impact cluster behavior? I guess cluster rebalancing will occur
and will take long time considering amount of data we have on it ?

Regards
Jakub



On Wed, Jan 31, 2018 at 1:37 PM, Jakub Jaszewski 
wrote:

> ​​
> Hi,
>
> I'm wondering why slow requests are being reported mainly when the request
> has been put into the queue for processing by its PG  (queued_for_pg ,
> http://docs.ceph.com/docs/master/rados/troubleshooting/
> troubleshooting-osd/#debugging-slow-request).
> Could it be due too low pg_num/pgp_num ?
>
> It looks that slow requests are mainly addressed to
>  default.rgw.buckets.data (pool id 20) , volumes (pool id 3) and
> default.rgw.buckets.index (pool id 14)
>
> 2018-01-31 12:06:55.899557 osd.59 osd.59 10.212.32.22:6806/4413 38 :
> cluster [WRN] slow request 30.125793 seconds old, received at 2018-01-31
> 12:06:25.773675: osd_op(client.857003.0:126171692 3.a4fec1ad 3.a4fec1ad
> (undecoded) ack+ondisk+write+known_if_redirected e5722) currently
> queued_for_pg
>
> Btw how can I get more human-friendly client information from log entry
> like above ?
>
> Current pg_num/pgp_num
>
> ​
> pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool
> stripe_width 0 application rbd
> removed_snaps [1~3]
> pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2
> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags
> hashpspool stripe_width 0 application rgw
> pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1
> object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags
> hashpspool stripe_width 4224 application rgw
>
> Usage
>
> GLOBAL:
> SIZE AVAIL RAW USED %RAW USED OBJECTS
> 385T  144T 241T 62.54  31023k
> POOLS:
> NAME ID QUOTA OBJECTS QUOTA BYTES
> USED   %USED MAX AVAIL OBJECTS  DIRTY  READ
> WRITE  RAW USED
> volumes  3  N/A   N/A
> 40351G 70.9116557G 10352314 10109k  2130M
>  2520M 118T
> default.rgw.buckets.index14 N/A   N/A
>  0 016557G  205205   160M
> 27945k0
> default.rgw.buckets.data 20 N/A   N/A
> 79190G 70.5133115G 20865953 20376k   122M
> 113M 116T
>
>
>
> # ceph osd pool ls detail
> pool 0 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 4502 flags hashpspool
> stripe_width 0 application rbd
> pool 1 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool
> stripe_width 0 application rbd
> pool 2 'images' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 5175 flags hashpspool
> stripe_width 0 application rbd
> removed_snaps [1~7,14~2]
> pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool
> stripe_width 0 application rbd
> removed_snaps [1~3]
> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
> rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool stripe_width
> 0 application rgw
> pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
> stripe_width 0 application rgw
> pool 6 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
> stripe_width 0 application rgw
> pool 7 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
> stripe_width 0 application rgw
> pool 8 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
> stripe_width 0 application rgw
> pool 9 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
> stripe_width 0 application rgw
> pool 10 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 8 

Re: [ceph-users] High apply latency

2018-01-31 Thread Jakub Jaszewski
​​
Hi,

I'm wondering why slow requests are being reported mainly when the request
has been put into the queue for processing by its PG  (queued_for_pg ,
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#debugging-slow-request
).
Could it be due too low pg_num/pgp_num ?

It looks that slow requests are mainly addressed to
 default.rgw.buckets.data (pool id 20) , volumes (pool id 3) and
default.rgw.buckets.index (pool id 14)

2018-01-31 12:06:55.899557 osd.59 osd.59 10.212.32.22:6806/4413 38 :
cluster [WRN] slow request 30.125793 seconds old, received at 2018-01-31
12:06:25.773675: osd_op(client.857003.0:126171692 3.a4fec1ad 3.a4fec1ad
(undecoded) ack+ondisk+write+known_if_redirected e5722) currently
queued_for_pg

Btw how can I get more human-friendly client information from log entry
like above ?

Current pg_num/pgp_num

​
pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool
stripe_width 0 application rbd
removed_snaps [1~3]
pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 20 'default.rgw.buckets.data' erasure size 9 min_size 6 crush_rule 1
object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags
hashpspool stripe_width 4224 application rgw

Usage

GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
385T  144T 241T 62.54  31023k
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES
  USED   %USED MAX AVAIL OBJECTS  DIRTY  READ
WRITE  RAW USED
volumes  3  N/A   N/A
  40351G 70.9116557G 10352314 10109k  2130M
 2520M 118T
default.rgw.buckets.index14 N/A   N/A
   0 016557G  205205   160M
27945k0
default.rgw.buckets.data 20 N/A   N/A
  79190G 70.5133115G 20865953 20376k   122M
113M 116T



# ceph osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins
pg_num 64 pgp_num 64 last_change 4502 flags hashpspool stripe_width 0
application rbd
pool 1 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins
pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool stripe_width 0
application rbd
pool 2 'images' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 5175 flags hashpspool
stripe_width 0 application rbd
removed_snaps [1~7,14~2]
pool 3 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 4502 flags hashpspool
stripe_width 0 application rbd
removed_snaps [1~3]
pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool stripe_width
0 application rgw
pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 6 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 7 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 8 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 9 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 10 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 11 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 owner
18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 12 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 owner
18446744073709551615 flags hashpspool stripe_width 0 application rgw
pool 13 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 4502 flags hashpspool
stripe_width 0 application rgw
pool 15 'default.rgw.buckets.data.old' replicated size 3 min_size 2
crush_rule 0 ob

[ceph-users] High apply latency

2018-01-30 Thread Jakub Jaszewski
Hi,

We observe high apply_latency(ms) and poor write performance I believe.
In logs there are repetitive slow request warnings related different OSDs
and servers.

ceph versions 12.2.2

Cluster HW description:
9x Dell PowerEdge R730xd

1x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (10C/20T)
256 GB 2133 MHz DDR4
PERC H730 Mini 1GB cache
12x 4TB TOSHIBA MG04ACA400N (each disk configured as RAID 0 for single OSD)
2x 480GB SSDSC2BB480G7R (in RAID 1 for operating system)
1x NVMe PCIe SSD - Intel SSD DC D3700 Series for journaling, one for 12 OSDs
1x QLogic 57800 2x 10Gb DAC SFP+ and 2x 1Gb (configured as 2x10Gb in
802.3ad Dynamic link aggregation)


/etc/ceph/ceph.conf
[global]
  fsid = 1023c49f-3b10-42de-9f62-9b122db32a9a
  mon_initial_members = host01,host02,host03
  mon_host = 10.212.32.18,10.212.32.19,10.212.32.20
  auth_supported = cephx
  public_network = 10.212.32.0/24
  cluster_network = 10.212.14.0/24
  rgw thread pool size = 256
[client.rgw.host01]
  rgw host = host01
  rgw enable usage log = true
[client.rgw.host02]
  rgw host = host02
  rgw enable usage log = true
[client.rgw.host03]
  rgw host = host03
  rgw enable usage log = true
[osd]
  filestore xattr use omap = true
  osd journal size = 10240
  osd mount options xfs = noatime,inode64,logbsize=256k,logbufs=8
  osd crush location hook = /usr/bin/ceph-crush-location.sh
  osd pool default size = 3
[mon]
  mon compact on start = true
  mon compact on trim = true


OSDs topology as below. Actually I wonder why some OSDs have "hdd" class
assigned.

ID  CLASS WEIGHTREWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS TYPE NAME

 -1   392.73273-   385T   236T   149T 61.24 1.00   - root
default
 -6   392.73273-   385T   236T   149T 61.24 1.00   - region
region01
 -5   392.73273-   385T   236T   149T 61.24 1.00   -
 datacenter dc01
 -4   392.73273-   385T   236T   149T 61.24 1.00   -
 room room01
 -843.63699- 44684G 26573G 18111G 59.47 0.97   -
 rack rack01
 -743.63699- 44684G 26573G 18111G 59.47 0.97   -
 host host01
  0 3.63599  1.0  3723G  2405G  1318G 64.59 1.05 210
 osd.0
  2 3.63599  1.0  3723G  2004G  1719G 53.82 0.88 190
 osd.2
  4 3.63599  1.0  3723G  2460G  1262G 66.08 1.08 214
 osd.4
  6 3.63599  1.0  3723G  2474G  1249G 66.45 1.09 203
 osd.6
  8 3.63599  1.0  3723G  2308G  1415G 61.99 1.01 220
 osd.8
 11 3.63599  1.0  3723G  2356G  1367G 63.28 1.03 214
 osd.11
 12 3.63599  1.0  3723G  2303G  1420G 61.86 1.01 206
 osd.12
 14 3.63599  1.0  3723G  1920G  1803G 51.57 0.84 178
 osd.14
 16 3.63599  1.0  3723G  2236G  1486G 60.07 0.98 203
 osd.16
 18 3.63599  1.0  3723G  2203G  1520G 59.17 0.97 193
 osd.18
 20 3.63599  1.0  3723G  1904G  1819G 51.15 0.84 179
 osd.20
 22 3.63599  1.0  3723G  1995G  1728G 53.58 0.88 192
 osd.22
 -343.63699- 44684G 27090G 17593G 60.63 0.99   -
 rack rack02
 -243.63699- 44684G 27090G 17593G 60.63 0.99   -
 host host02
  1   hdd   3.63599  1.0  3723G  2447G  1275G 65.74 1.07 213
 osd.1
  3   hdd   3.63599  1.0  3723G  2696G  1027G 72.41 1.18 210
 osd.3
  5   hdd   3.63599  1.0  3723G  2290G  1433G 61.51 1.00 188
 osd.5
  7   hdd   3.63599  1.0  3723G  2171G  1551G 58.33 0.95 194
 osd.7
  9   hdd   3.63599  1.0  3723G  2129G  1594G 57.18 0.93 204
 osd.9
 10   hdd   3.63599  1.0  3723G  2153G  1570G 57.82 0.94 184
 osd.10
 13   hdd   3.63599  1.0  3723G  2142G  1580G 57.55 0.94 188
 osd.13
 15   hdd   3.63599  1.0  3723G  2147G  1576G 57.66 0.94 192
 osd.15
 17   hdd   3.63599  1.0  3723G  2093G  1630G 56.21 0.92 201
 osd.17
 19   hdd   3.63599  1.0  3723G  2079G  1643G 55.86 0.91 192
 osd.19
 21   hdd   3.63599  1.0  3723G  2266G  1457G 60.87 0.99 201
 osd.21
 23   hdd   3.63599  1.0  3723G  2472G  1251G 66.39 1.08 197
 osd.23
-1043.63699- 44684G 27247G 17436G 60.98 1.00   -
 rack rack03
 -943.63699- 44684G 27247G 17436G 60.98 1.00   -
 host host03
 24 3.63599  1.0  3723G  2289G  1433G 61.49 1.00 195
 osd.24
 25 3.63599  1.0  3723G  2584G  1138G 69.42 1.13 217
 osd.25
 26 3.63599  1.0  3723G  2183G  1539G 58.65 0.96 198
 osd.26
 28 3.63599  1.0  3723G  2540G  1182G 68.23 1.11 215
 osd.28
 30 3.63599  1.0  3723G  2134G  1589G 57.31 0.94 207
 osd.30
 32 3.63599  1.0  3723G  1715G  2008G 46.06 0.75 183
 osd.32
 34 3.63599  1.000

[ceph-users] Ceph with multiple public networks

2017-12-18 Thread Jakub Jaszewski
Hi,

We have ceph cluster in version luminous 12.2.2. It has public network and
cluster network configured.

Cluster provides services for two big groups of clients and some individual
clients
One group uses RGW and another uses RBD.
Ceph's public network and two mentioned groups are located in three
different vlans. Each clients group generates traffic above limit of
routing devices.

Right now RGW and MON roles are served by the same hosts.

I'd like to add additional vlan tagged interface to all MON and OSD ceph
nodes to streamline communication with big group of clients using RBD and
keep current public network for individual requests.
>From what I can find it is supported to have more than one public network,
according to
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/#id1
Is it possible to have MON host with two pulblic addresses assigned ? or I
need to desginate another hosts to handel MON roles with different public
IP addresses?

How should I approach with RGW service? In this case I also need to provide
RGW for big group of clients in dedicated vlan and keep access for
individual requests coming to IP in currently set public network.
Is it possible to bind one civetweb instance to two ip addresses or need
separete instances per network address ?

Current ceph.conf is

[global]
  fsid = 1023c49f-3a10-42de-9f62-9b122db32f1f
  mon_initial_members = host01,host02,host03
  mon_host = 10.212.32.18,10.212.32.19,10.212.32.20
  auth_supported = cephx
  public_network = 10.212.32.0/24
  cluster_network = 10.212.14.0/24
[client.rgw.host01]
  rgw host = host01
  rgw enable usage log = true
#  debug_rgw = 20
[client.rgw.host02]
  rgw host = host02
  rgw enable usage log = true
[client.rgw.host03]
  rgw host = host03
  rgw enable usage log = true
[osd]
  filestore xattr use omap = true
  osd journal size = 10240
  osd mount options xfs = noatime,inode64,logbsize=256k,logbufs=8
  osd crush location hook = /usr/bin/opera-ceph-crush-location.sh
  osd pool default size = 3
[mon]
  mon compact on start = true
  mon compact on trim = true


Thanks
Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-30 Thread Jakub Jaszewski
I've just did ceph upgrade jewel -> luminous and am facing the same case...

# EC profile
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8

5 hosts in the cluster and I run systemctl stop ceph.target on one of them
some PGs from EC pool were remapped (active+clean+remapped state) even when
there was not enough hosts in the cluster but some are still in
active+undersized+degraded state


root@host01:~# ceph status
  cluster:
id: a6f73750-1972-47f6-bcf5-a99753be65ad
health: HEALTH_WARN
Degraded data redundancy: 876/9115 objects degraded (9.611%),
540 pgs unclean, 540 pgs degraded, 540 pgs undersized

  services:
mon: 3 daemons, quorum host01,host02,host03
mgr: host01(active), standbys: host02, host03
osd: 60 osds: 48 up, 48 in; 484 remapped pgs
rgw: 3 daemons active

  data:
pools:   19 pools, 3736 pgs
objects: 1965 objects, 306 MB
usage:   5153 MB used, 174 TB / 174 TB avail
pgs: 876/9115 objects degraded (9.611%)
 2712 active+clean
 540  active+undersized+degraded
 484  active+clean+remapped

  io:
client:   17331 B/s rd, 20 op/s rd, 0 op/s wr

root@host01:~#



Anyone here able to explain this behavior to me ?

Jakub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-28 Thread Jakub Jaszewski
Hi David, thanks for quick feedback.

Then why some PGs were remapped and some were not?

# LOOKS THAT 338 PGs IN ERASURE CODED POOLS HAVE BEEN REMAPPED
# I DONT GET WHY 540 PGs STILL ENCOUNTER active+undersized+degraded STATE

root at host01 :~#
ceph pg dump pgs_brief  |grep 'active+remapped'
dumped pgs_brief in format plain
16.6f active+remapped [43,2147483647 <(214)%20748-3647>,2,31,12] 43
[43,33,2,31,12] 43
16.6e active+remapped [10,5,35,44,2147483647] 10 [10,5,35,44,41] 10
root at host01
:~# egrep
'16.6f|16.6e' PGs_on_HOST_host05
16.6f active+clean [43,33,2,59,12] 43 [43,33,2,59,12] 43
16.6e active+clean [10,5,49,35,41] 10 [10,5,49,35,41] 10root at host01
:~#

like PG 16.6f, prior to ceph services stop it was on [43,33,2,59,12]
then was remapped to [43,33,2,31,12], so OSD@31 and OSD@33 are on the
same HOST.

But for example PG 16.ee get to active+undersized+degraded state,
prior to services stop it was on

pg_stat state up up_primary acting acting_primary
16.ee active+clean [5,22,33,55,45] 5 [5,22,33,55,45] 5

after the stop of services on the host it was not remapped

16.ee   active+undersized+degraded  [5,22,33,2147483647
<(214)%20748-3647>,45]  5   [5,22,33,2147483647 <(214)%20748-3647>,45]  
5
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CRUSH rule seems to work fine not for all PGs in erasure coded pools

2017-11-28 Thread Jakub Jaszewski
Hi, I'm trying to understand erasure coded pools and why CRUSH rules seem
to work for only part of PGs in EC pools.

Basically what I'm trying to do is to check erasure coded pool recovering
behaviour after the single OSD or single HOST failure.
I noticed that in case of HOST failure only part of PGs get recovered to
active+remapped when other PGs remain in active+undersized+degraded state.
Why??
EC pool profile I use is k=3 , m=2.

Also I'm not really sure what is the meaning of all steps of below crush
rule (perhaps it is the root cause).
rule ecpool_3_2 {
ruleset 1
type erasure
min_size 3
max_size 5
step set_chooseleaf_tries 5 # should I maybe try to increase this number of
retry ?? Can I apply the changes to existing EC crush rule and pool or need
to create a new one ?
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host # Does it allow to choose more than one
OSD from single HOST but first trying to get only one OSD per HOST if there
are enough HOSTs in the cluster?
step emit
}

ceph version 10.2.9 (jewel)

# INITIAL CLUSTER STATE
root@host01:~# ceph osd tree
ID  WEIGHTTYPE NAMEUP/DOWN REWEIGHT
PRIMARY-AFFINITY
 -1 218.18401 root default

 -6 218.18401 region MyRegion

 -5 218.18401 datacenter MyDC

 -4 218.18401 room MyRoom

 -3  43.63699 rack Rack01

 -2  43.63699 host host01

  0   3.63599 osd.0 up  1.0
 1.0
  3   3.63599 osd.3 up  1.0
 1.0
  4   3.63599 osd.4 up  1.0
 1.0
  6   3.63599 osd.6 up  1.0
 1.0
  8   3.63599 osd.8 up  1.0
 1.0
 10   3.63599 osd.10up  1.0
 1.0
 12   3.63599 osd.12up  1.0
 1.0
 14   3.63599 osd.14up  1.0
 1.0
 16   3.63599 osd.16up  1.0
 1.0
 19   3.63599 osd.19up  1.0
 1.0
 22   3.63599 osd.22up  1.0
 1.0
 25   3.63599 osd.25up  1.0
 1.0
 -8  43.63699 rack Rack02

 -7  43.63699 host host02

  1   3.63599 osd.1 up  1.0
 1.0
  2   3.63599 osd.2 up  1.0
 1.0
  5   3.63599 osd.5 up  1.0
 1.0
  7   3.63599 osd.7 up  1.0
 1.0
  9   3.63599 osd.9 up  1.0
 1.0
 11   3.63599 osd.11up  1.0
 1.0
 13   3.63599 osd.13up  1.0
 1.0
 15   3.63599 osd.15up  1.0
 1.0
 17   3.63599 osd.17up  1.0
 1.0
 20   3.63599 osd.20up  1.0
 1.0
 23   3.63599 osd.23up  1.0
 1.0
 26   3.63599 osd.26up  1.0
 1.0
-10 130.91000 rack Rack03

 -9  43.63699 host host03

 18   3.63599 osd.18up  1.0
 1.0
 21   3.63599 osd.21up  1.0
 1.0
 24   3.63599 osd.24up  1.0
 1.0
 27   3.63599 osd.27up  1.0
 1.0
 28   3.63599 osd.28up  1.0
 1.0
 29   3.63599 osd.29up  1.0
 1.0
 30   3.63599 osd.30up  1.0
 1.0
 31   3.63599 osd.31up  1.0
 1.0
 32   3.63599 osd.32up  1.0
 1.0
 33   3.63599 osd.33up  1.0
 1.0
 34   3.63599 osd.34up  1.0
 1.0
 35   3.63599 osd.35up  1.0
 1.0
-11  43.63699 host host04

 36   3.63599 osd.36up  1.0
 1.0
 37   3.63599 osd.37up  1.0
 1.0
 38   3.63599 osd.38up  1.0
 1.0
 39   3.63599 osd.39up  1.0
 1.0
 40   3.63599 osd.40up  1.0
 1.0
 41   3.63599 osd.41up  1.0
 1.0
 42   3.63599 osd.42up  1.0
 1.0
 43   3.63599 osd.43up  1.0
 1.0
 44   3.63599 osd.44up  1.0
 1.0
 45   3.63599 osd.45up  1.0
 1.00