Re: [ceph-users] OSD up takes 15 minutes after machine restarts

2020-01-19 Thread huxia...@horebdata.cn
HI, Igor,

does this could cause the problem?




huxia...@horebdata.cn
 
From: Igor Fedotov
Date: 2020-01-19 11:41
To: huxia...@horebdata.cn; ceph-users
Subject: Re: [ceph-users] OSD up takes 15 minutes after machine restarts
Hi Samuel,

wondering if you have bluestore_fsck_on_mount option set to true? Can you see 
high read load over OSD device(s) during the startup?
If so it might be fsck running which takes that long. 

Thanks,
Igor

On 1/19/2020 11:53 AM, huxia...@horebdata.cn wrote:
Dear folks,

I had a strange situation with 3-node Ceph cluster on Luminous 12.2.12 with 
bluestore. Each machine has 5 OSDs on HDD, and each OSD uses a 30GB DB/WAL 
partition on SSD. At the beginning without much data, OSDs can quickly up if 
one node restarts. 

Then I ran 4-day long stress tests with vdbench, and then I restarts one node, 
and to my surprise, OSDs on that node takes ca 15 minutes to turn into up 
state. How can i speed up the OSD UP process ?

thanks,

Samuel




huxia...@horebdata.cn

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD up takes 15 minutes after machine restarts

2020-01-19 Thread huxia...@horebdata.cn
Dear folks,

I had a strange situation with 3-node Ceph cluster on Luminous 12.2.12 with 
bluestore. Each machine has 5 OSDs on HDD, and each OSD uses a 30GB DB/WAL 
partition on SSD. At the beginning without much data, OSDs can quickly up if 
one node restarts. 

Then I ran 4-day long stress tests with vdbench, and then I restarts one node, 
and to my surprise, OSDs on that node takes ca 15 minutes to turn into up 
state. How can i speed up the OSD UP process ?

thanks,

Samuel




huxia...@horebdata.cn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-17 Thread huxia...@horebdata.cn
hello, Robert 

thanks for the quick reply. I did test with  osd op queue = wpq , and osd 
op queue cut off = high
and 
osd_recovery_op_priority = 1  
osd recovery delay start = 20  
osd recovery max active = 1  
osd recovery max chunk = 1048576  
osd recovery sleep = 1   
osd recovery sleep hdd = 1
osd recovery sleep ssd = 1
osd recovery sleep hybrid = 1 
osd recovery priority = 1
osd max backfills = 1 
osd backfill scan max = 16  
osd backfill scan min = 4   
osd_op_thread_suicide_timeout = 300   

But still the ceph cluster showed extremely hug recovery activities during the 
beginning of the recovery, and after ca. 5-10 minutes, the recovery gradually 
get under the control. I guess this is quite similar to what you encountered in 
Nov. 2015.

It is really annoying, and what else can i do to mitigate this weird 
inital-recovery issue? any suggestions are much appreciated.

thanks again,

samuel



huxia...@horebdata.cn
 
From: Robert LeBlanc
Date: 2019-10-17 21:23
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph 
recovery
On Thu, Oct 17, 2019 at 12:08 PM huxia...@horebdata.cn
 wrote:
>
> I happened to find a note that you wrote in Nov 2015: 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-November/006173.html
> and I believe this is what i just hit exactly the same behavior : a host down 
> will badly take the client performance down 1/10 (with 200MB/s recovery 
> workload) and then took ten minutes  to get good control of OSD recovery.
>
> Could you please share how did you eventally solve that issue? by seting a 
> fair large OSD recovery delay start or any other parameter?
 
Wow! Dusting off the cobwebs here. I think this is what lead me to dig
into the code and write the WPQ scheduler. I can't remember doing
anything specific. I'm sorry I'm not much help in this regard.

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-17 Thread huxia...@horebdata.cn
Hello, Robert,

I happened to find a note that you wrote in Nov 2015: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-November/006173.html
and I believe this is what i just hit exactly the same behavior : a host down 
will badly take the client performance down 1/10 (with 200MB/s recovery 
workload) and then took ten minutes  to get good control of OSD recovery.

Could you please share how did you eventally solve that issue? by seting a fair 
large OSD recovery delay start or any other parameter?

best regards,

samuel
 



huxia...@horebdata.cn
 
From: Robert LeBlanc
Date: 2019-10-16 21:46
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph 
recovery
On Wed, Oct 16, 2019 at 11:53 AM huxia...@horebdata.cn
 wrote:
>
> My Ceph version is Luminuous 12.2.12. Do you think should i upgrade to 
> Nautilus, or will Nautilus have a better control of recovery/backfilling?
 
We have a Jewel cluster and Luminuous cluster that we have changed
these settings on and it really helped both of them.
 

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-16 Thread huxia...@horebdata.cn
My Ceph version is Luminuous 12.2.12. Do you think should i upgrade to 
Nautilus, or will Nautilus have a better control of recovery/backfilling?

best regards,

Samuel



huxia...@horebdata.cn
 
From: Robert LeBlanc
Date: 2019-10-14 16:27
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph 
recovery
On Thu, Oct 10, 2019 at 2:23 PM huxia...@horebdata.cn
 wrote:
>
> Hi, folks,
>
> I have a middle-size Ceph cluster as cinder backup for openstack (queens). 
> Duing testing, one Ceph node went down unexpected and powered up again ca 10 
> minutes later, Ceph cluster starts PG recovery. To my surprise,  VM IOPS 
> drops dramatically during Ceph recovery, from ca. 13K IOPS to about 400, a 
> factor of 1/30, and I did put a stringent throttling on backfill and 
> recovery, with the following ceph parameters
>
> osd_max_backfills = 1
> osd_recovery_max_active = 1
> osd_client_op_priority=63
> osd_recovery_op_priority=1
> osd_recovery_sleep = 0.5
>
> The most weird thing is,
> 1) when there is no IO activity from any VM (ALL VMs are quiet except the 
> recovery IO), the recovery bandwidth is ca. 10MiB/s, 2 objects/s. Seems like 
> recovery throttle setting is working properly
> 2) when using FIO testing inside a VM, the recovery bandwith is going up 
> quickly, reaching above 200MiB/s, 60 objects/s. FIO IOPS performance inside 
> VM, however, is only at 400 IOPS/s (8KiB block size), around 3MiB/s. Obvious 
> recovery throttling DOES NOT work properly
> 3) If i stop the FIO testing in VM, the recovery bandwith then goes down to  
> 10MiB/s, 2 objects/s again, strange enough.
>
> How can this weird behavior happen? I just wonder, is there a method to 
> configure recovery bandwith to a specific value, or the number of recovery 
> objects per second? this may give better control of bakcfilling/recovery, 
> instead of the faulty logic or relative osd_client_op_priority vs 
> osd_recovery_op_priority.
>
> any ideas or suggests to make the recovery under control?
>
> best regards,
>
> Samuel
 
Not sure which version of Ceph you are on, but add these to your
/etc/ceph/ceph.conf on all your OSDs and restart them.
 
osd op queue = wpq
osd op queue cut off = high
 
That should really help and make backfills and recovery be
non-impactful. This will be the default in Octopus.
 

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-10 Thread huxia...@horebdata.cn
Hi, folks,

I have a middle-size Ceph cluster as cinder backup for openstack (queens). 
Duing testing, one Ceph node went down unexpected and powered up again ca 10 
minutes later, Ceph cluster starts PG recovery. To my surprise,  VM IOPS drops 
dramatically during Ceph recovery, from ca. 13K IOPS to about 400, a factor of 
1/30, and I did put a stringent throttling on backfill and recovery, with the 
following ceph parameters

osd_max_backfills = 1
osd_recovery_max_active = 1
osd_client_op_priority=63
osd_recovery_op_priority=1
osd_recovery_sleep = 0.5

The most weird thing is, 
1) when there is no IO activity from any VM (ALL VMs are quiet except the 
recovery IO), the recovery bandwidth is ca. 10MiB/s, 2 objects/s. Seems like 
recovery throttle setting is working properly
2) when using FIO testing inside a VM, the recovery bandwith is going up 
quickly, reaching above 200MiB/s, 60 objects/s. FIO IOPS performance inside VM, 
however, is only at 400 IOPS/s (8KiB block size), around 3MiB/s. Obvious 
recovery throttling DOES NOT work properly
3) If i stop the FIO testing in VM, the recovery bandwith then goes down to  
10MiB/s, 2 objects/s again, strange enough.

How can this weird behavior happen? I just wonder, is there a method to 
configure recovery bandwith to a specific value, or the number of recovery 
objects per second? this may give better control of bakcfilling/recovery, 
instead of the faulty logic or relative osd_client_op_priority vs 
osd_recovery_op_priority.

any ideas or suggests to make the recovery under control?

best regards,

Samuel





huxia...@horebdata.cn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs inconsistent

2019-08-15 Thread huxia...@horebdata.cn
Dear folks,

I had a Ceph cluster with replication 2, 3 nodes, each node with 3 OSDs, on 
Luminous 12.2.12. Some days ago i had one OSD down (the disk is still fine) due 
to some errors on rocksdb crash. I tried to restart that OSD but failed. So I 
tried to rebalance but encountered PGs inconsistent.

what can i do to make the cluster working again?

thanks a lot for helping me out

Samuel 

**
# ceph -s
  cluster:
id: 289e3afa-f188-49b0-9bea-1ab57cc2beb8
health: HEALTH_ERR
pauserd,pausewr,noout flag(s) set
191444 scrub errors
Possible data damage: 376 pgs inconsistent
 
  services:
mon: 3 daemons, quorum horeb71,horeb72,horeb73
mgr: horeb73(active), standbys: horeb71, horeb72
osd: 9 osds: 8 up, 8 in
 flags pauserd,pausewr,noout
 
  data:
pools:   1 pools, 1024 pgs
objects: 524.29k objects, 1.99TiB
usage:   3.67TiB used, 2.58TiB / 6.25TiB avail
pgs: 645 active+clean
 376 active+clean+inconsistent
 3   active+clean+scrubbing+deep
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Possibly a bug on rocksdb

2019-08-11 Thread huxia...@horebdata.cn
:11:02.300110 7ff4822f1d80  0  set rocksdb option 
compaction_threads = 32
   -12> 2019-08-11 17:11:02.300121 7ff4822f1d80  0  set rocksdb option 
compression = kNoCompression
   -11> 2019-08-11 17:11:02.300129 7ff4822f1d80  0  set rocksdb option 
flusher_threads = 8
   -10> 2019-08-11 17:11:02.300135 7ff4822f1d80  0  set rocksdb option 
level0_file_num_compaction_trigger = 64
-9> 2019-08-11 17:11:02.300142 7ff4822f1d80  0  set rocksdb option 
level0_slowdown_writes_trigger = 128
-8> 2019-08-11 17:11:02.300146 7ff4822f1d80  0  set rocksdb option 
level0_stop_writes_trigger = 256
-7> 2019-08-11 17:11:02.300150 7ff4822f1d80  0  set rocksdb option 
max_background_compactions = 64
-6> 2019-08-11 17:11:02.300155 7ff4822f1d80  0  set rocksdb option 
max_bytes_for_level_base = 2GB
-5> 2019-08-11 17:11:02.300159 7ff4822f1d80  0  set rocksdb option 
max_write_buffer_number = 64
-4> 2019-08-11 17:11:02.300166 7ff4822f1d80  0  set rocksdb option 
min_write_buffer_number_to_merge = 32
-3> 2019-08-11 17:11:02.300176 7ff4822f1d80  0  set rocksdb option 
recycle_log_file_num = 64
-2> 2019-08-11 17:11:02.300185 7ff4822f1d80  0  set rocksdb option 
target_file_size_base = 4MB
-1> 2019-08-11 17:11:02.300193 7ff4822f1d80  0  set rocksdb option 
write_buffer_size = 4MB
 0> 2019-08-11 17:11:02.819067 7ff4822f1d80 -1 *** Caught signal (Aborted) 
**
 in thread 7ff4822f1d80 thread_name:ceph-osd


huxia...@horebdata.cn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous OSD can not be up

2019-05-21 Thread huxia...@horebdata.cn
Hi, Folks,

I just encountered an OSD being down and can not be up again. Below atttached 
is the log messages. Anyone can tell what is wrong with the OSD? and what 
should i do?

thanks in advance,

Samuel

***
# tail -500 /var/log/ceph/ceph-osd.0.log.522 
2019-05-22 09:23:35.139017 7f9d18f71d80  0  set rocksdb option 
min_write_buffer_number_to_merge = 32
2019-05-22 09:23:35.139019 7f9d18f71d80  0  set rocksdb option 
recycle_log_file_num = 64
2019-05-22 09:23:35.139023 7f9d18f71d80  0  set rocksdb option 
target_file_size_base = 4MB
2019-05-22 09:23:35.139025 7f9d18f71d80  0  set rocksdb option 
write_buffer_size = 4MB
2019-05-22 09:23:35.139058 7f9d18f71d80  0  set rocksdb option 
compaction_readahead_size = 2MB
2019-05-22 09:23:35.139063 7f9d18f71d80  0  set rocksdb option compaction_style 
= kCompactionStyleLevel
2019-05-22 09:23:35.139067 7f9d18f71d80  0  set rocksdb option 
compaction_threads = 32
2019-05-22 09:23:35.139069 7f9d18f71d80  0  set rocksdb option compression = 
kNoCompression
2019-05-22 09:23:35.139072 7f9d18f71d80  0  set rocksdb option flusher_threads 
= 8
2019-05-22 09:23:35.139074 7f9d18f71d80  0  set rocksdb option 
level0_file_num_compaction_trigger = 64
2019-05-22 09:23:35.139077 7f9d18f71d80  0  set rocksdb option 
level0_slowdown_writes_trigger = 128
2019-05-22 09:23:35.139079 7f9d18f71d80  0  set rocksdb option 
level0_stop_writes_trigger = 256
2019-05-22 09:23:35.139081 7f9d18f71d80  0  set rocksdb option 
max_background_compactions = 64
2019-05-22 09:23:35.139084 7f9d18f71d80  0  set rocksdb option 
max_bytes_for_level_base = 6GB
2019-05-22 09:23:35.139086 7f9d18f71d80  0  set rocksdb option 
max_write_buffer_number = 64
2019-05-22 09:23:35.139088 7f9d18f71d80  0  set rocksdb option 
min_write_buffer_number_to_merge = 32
2019-05-22 09:23:35.139090 7f9d18f71d80  0  set rocksdb option 
recycle_log_file_num = 64
2019-05-22 09:23:35.139093 7f9d18f71d80  0  set rocksdb option 
target_file_size_base = 4MB
2019-05-22 09:23:35.139095 7f9d18f71d80  0  set rocksdb option 
write_buffer_size = 4MB
2019-05-22 09:23:35.171770 7f9d18f71d80  0  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.12/rpm/el7/BUILD/ceph-12.2.12/src/cls/cephfs/cls_cephfs.cc:197:
 loading cephfs
2019-05-22 09:23:35.172018 7f9d18f71d80  0  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.12/rpm/el7/BUILD/ceph-12.2.12/src/cls/hello/cls_hello.cc:296:
 loading cls_hello
2019-05-22 09:23:35.172865 7f9d18f71d80  0 _get_class not permitted to load kvs
2019-05-22 09:23:35.173387 7f9d18f71d80  0 _get_class not permitted to load lua
2019-05-22 09:23:35.178435 7f9d18f71d80  0 _get_class not permitted to load sdk
2019-05-22 09:23:35.179798 7f9d18f71d80  0 osd.0 723 crush map has features 
288514051259236352, adjusting msgr requires for clients
2019-05-22 09:23:35.179806 7f9d18f71d80  0 osd.0 723 crush map has features 
288514051259236352 was 8705, adjusting msgr requires for mons
2019-05-22 09:23:35.179809 7f9d18f71d80  0 osd.0 723 crush map has features 
1009089991638532096, adjusting msgr requires for osds
2019-05-22 09:23:35.245033 7f9d18f71d80  0 osd.0 723 load_pgs
2019-05-22 09:25:01.427911 7f9d18f71d80 -1 *** Caught signal (Segmentation 
fault) **
 in thread 7f9d18f71d80 thread_name:ceph-osd

 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous 
(stable)
 1: (()+0xa64ee1) [0x55fc16d31ee1]
 2: (()+0xf6d0) [0x7f9d1621e6d0]
 3: 
(std::__detail::_List_node_base::_M_hook(std::__detail::_List_node_base*)+0x4) 
[0x7f9d15b5ce54]
 4: (pg_log_t::pg_log_t(pg_log_t const&)+0x109) [0x55fc168d1e79]
 5: (PGLog::IndexedLog::operator=(PGLog::IndexedLog const&)+0x29) 
[0x55fc168dcb29]
 6: (void PGLog::read_log_and_missing >(ObjectStore*, 
coll_t, coll_t, ghobject_t, pg_info_t const&, PGLog::IndexedLog&, 
pg_missing_set&, bool, std::basic_ostringstream, std::allocator >&, bool, bool*, 
DoutPrefixProvider const*, std::set, 
std::allocator >*, bool)+0x793) [0x55fc168ddde3]
 7: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x52b) [0x55fc1687fc2b]
 8: (OSD::load_pgs()+0x9b4) [0x55fc167cdd64]
 9: (OSD::init()+0x2169) [0x55fc167ec8a9]
 10: (main()+0x2d07) [0x55fc166edef7]
 11: (__libc_start_main()+0xf5) [0x7f9d1522b445]
 12: (()+0x4c0dc3) [0x55fc1678ddc3]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to tune Ceph RBD mirroring parameters to speed up replication

2019-04-04 Thread huxia...@horebdata.cn
thanks a lot, Jason.

how much performance loss should i expect by enabling rbd mirroring? I really 
need to minimize any performance impact while using this disaster recovery 
feature. Will a dedicated journal on Intel Optane NVMe help? If so, how big the 
size should be?

cheers,

Samuel



huxia...@horebdata.cn
 
From: Jason Dillaman
Date: 2019-04-03 23:03
To: huxia...@horebdata.cn
CC: ceph-users
Subject: Re: [ceph-users] How to tune Ceph RBD mirroring parameters to speed up 
replication
For better or worse, out of the box, librbd and rbd-mirror are
configured to conserve memory at the expense of performance to support
the potential case of thousands of images being mirrored and only a
single "rbd-mirror" daemon attempting to handle the load.
 
You can optimize writes by adding "rbd_journal_max_payload_bytes =
8388608" to the "[client]" section on the librbd client nodes.
Normally, writes larger than 16KiB are broken into multiple journal
entries to allow the remote "rbd-mirror" daemon to make forward
progress w/o using too much memory, so this will ensure large IOs only
require a single journal entry.
 
You can also add "rbd_mirror_journal_max_fetch_bytes = 33554432" to
the "[client]" section on the "rbd-mirror" daemon nodes and restart
the daemon for the change to take effect. Normally, the daemon tries
to nibble the per-image journal events to prevent excessive memory use
in the case where potentially thousands of images are being mirrored.
 
On Wed, Apr 3, 2019 at 4:34 PM huxia...@horebdata.cn
 wrote:
>
> Hello, folks,
>
> I am setting up two ceph clusters to test async replication via RBD 
> mirroring. The two clusters are very close, just in two buildings about 20m 
> away, and the networking is very good as well, 10Gb Fiber connection. In this 
> case, how should i tune the relevant RBD mirroring parameters to accelerate 
> the replication?
>
> thanks in advance,
>
> Samuel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
-- 
Jason
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to tune Ceph RBD mirroring parameters to speed up replication

2019-04-03 Thread huxia...@horebdata.cn
Hello, folks,

I am setting up two ceph clusters to test async replication via RBD mirroring. 
The two clusters are very close, just in two buildings about 20m away, and the 
networking is very good as well, 10Gb Fiber connection. In this case, how 
should i tune the relevant RBD mirroring parameters to accelerate the 
replication?

thanks in advance,

Samuel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com