The issue we have is large leveldb's . do we have any setting to disable compaction of leveldb on osd start? in.linkedin.com/in/nikhilravindra
On Fri, Mar 29, 2019 at 7:44 PM Nikhil R <nikh.ravin...@gmail.com> wrote: > Any help on this would be much appreciated as our prod is down since a day > and each osd restart is taking 4-5 hours. > in.linkedin.com/in/nikhilravindra > > > > On Fri, Mar 29, 2019 at 7:43 PM Nikhil R <nikh.ravin...@gmail.com> wrote: > >> We have maxed out the files per dir. CEPH is trying to do an online split >> due to which osd's are crashing. We increased the split_multiple and >> merge_threshold for now and are restarting osd's. Now on these restarts the >> leveldb compaction is taking a long time. Below are some of the logs. >> >> 2019-03-29 06:25:37.082055 7f3c6320a8c0 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: FIEMAP >> ioctl is disabled via 'filestore fiemap' config option >> 2019-03-29 06:25:37.082064 7f3c6320a8c0 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: >> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option >> 2019-03-29 06:25:37.082079 7f3c6320a8c0 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: splice >> is supported >> 2019-03-29 06:25:37.096658 7f3c6320a8c0 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_features: >> syncfs(2) syscall fully supported (by glibc and kernel) >> 2019-03-29 06:25:37.096703 7f3c6320a8c0 0 >> xfsfilestorebackend(/var/lib/ceph/osd/ceph-83) detect_feature: extsize is >> disabled by conf >> 2019-03-29 06:25:37.295577 7f3c6320a8c0 1 leveldb: Recovering log >> #1151738 >> 2019-03-29 06:25:37.445516 7f3c6320a8c0 1 leveldb: Delete type=0 #1151738 >> 2019-03-29 06:25:37.445574 7f3c6320a8c0 1 leveldb: Delete type=3 #1151737 >> 2019-03-29 07:11:50.619313 7ff6c792b700 1 leveldb: Compacting 1@3 + 12@4 >> files >> 2019-03-29 07:11:50.639795 7ff6c792b700 1 leveldb: Generated table >> #1029200: 7805 keys, 2141956 bytes >> 2019-03-29 07:11:50.649315 7ff6c792b700 1 leveldb: Generated table >> #1029201: 4464 keys, 1220994 bytes >> 2019-03-29 07:11:50.660485 7ff6c792b700 1 leveldb: Generated table >> #1029202: 7813 keys, 2142882 bytes >> 2019-03-29 07:11:50.672235 7ff6c792b700 1 leveldb: Generated table >> #1029203: 6283 keys, 1712810 bytes >> 2019-03-29 07:11:50.697949 7ff6c792b700 1 leveldb: Generated table >> #1029204: 7805 keys, 2142841 bytes >> 2019-03-29 07:11:50.714648 7ff6c792b700 1 leveldb: Generated table >> #1029205: 5173 keys, 1428905 bytes >> 2019-03-29 07:11:50.757146 7ff6c792b700 1 leveldb: Generated table >> #1029206: 7888 keys, 2143304 bytes >> 2019-03-29 07:11:50.774357 7ff6c792b700 1 leveldb: Generated table >> #1029207: 5168 keys, 1425634 bytes >> 2019-03-29 07:11:50.830276 7ff6c792b700 1 leveldb: Generated table >> #1029208: 7821 keys, 2146114 bytes >> 2019-03-29 07:11:50.849116 7ff6c792b700 1 leveldb: Generated table >> #1029209: 6106 keys, 1680947 bytes >> 2019-03-29 07:11:50.909866 7ff6c792b700 1 leveldb: Generated table >> #1029210: 7799 keys, 2142782 bytes >> 2019-03-29 07:11:50.921143 7ff6c792b700 1 leveldb: Generated table >> #1029211: 5737 keys, 1574963 bytes >> 2019-03-29 07:11:50.923357 7ff6c792b700 1 leveldb: Generated table >> #1029212: 1149 keys, 310202 bytes >> 2019-03-29 07:11:50.923388 7ff6c792b700 1 leveldb: Compacted 1@3 + 12@4 >> files => 22214334 bytes >> 2019-03-29 07:11:50.924224 7ff6c792b700 1 leveldb: compacted to: files[ >> 0 3 54 715 6304 24079 0 ] >> 2019-03-29 07:11:50.942586 7ff6c792b700 1 leveldb: Delete type=2 #1029109 >> >> Is there a way i can skip this? >> >> in.linkedin.com/in/nikhilravindra >> >> >> >> On Fri, Mar 29, 2019 at 11:32 AM huang jun <hjwsm1...@gmail.com> wrote: >> >>> Nikhil R <nikh.ravin...@gmail.com> 于2019年3月29日周五 下午1:44写道: >>> > >>> > if i comment filestore_split_multiple = 72 filestore_merge_threshold = >>> 480 in the ceph.conf wont ceph take the default value of 2 and 10 and we >>> would be in more splits and crashes? >>> > >>> Yes, that aimed to make it clear what results in the long start time, >>> leveldb compact or filestore split? >>> > in.linkedin.com/in/nikhilravindra >>> > >>> > >>> > >>> > On Fri, Mar 29, 2019 at 6:55 AM huang jun <hjwsm1...@gmail.com> wrote: >>> >> >>> >> It seems like the split settings result the problem, >>> >> what about comment out those settings then see it still used that long >>> >> time to restart? >>> >> As a fast search in code, these two >>> >> filestore_split_multiple = 72 >>> >> filestore_merge_threshold = 480 >>> >> doesn't support online change. >>> >> >>> >> Nikhil R <nikh.ravin...@gmail.com> 于2019年3月28日周四 下午6:33写道: >>> >> > >>> >> > Thanks huang for the reply. >>> >> > Its is the disk compaction taking more time >>> >> > the disk i/o is completely utilized upto 100% >>> >> > looks like both osd_compact_leveldb_on_mount = false & >>> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 >>> >> > is there a way to turn off compaction? >>> >> > >>> >> > Also, the reason why we are restarting osd's is due to splitting >>> and we increased split multiple and merge_threshold. >>> >> > Is there a way we would inject it? Is osd restarts the only >>> solution? >>> >> > >>> >> > Thanks In Advance >>> >> > >>> >> > in.linkedin.com/in/nikhilravindra >>> >> > >>> >> > >>> >> > >>> >> > On Thu, Mar 28, 2019 at 3:58 PM huang jun <hjwsm1...@gmail.com> >>> wrote: >>> >> >> >>> >> >> Did the time really cost on db compact operation? >>> >> >> or you can turn on debug_osd=20 to see what happens, >>> >> >> what about the disk util during start? >>> >> >> >>> >> >> Nikhil R <nikh.ravin...@gmail.com> 于2019年3月28日周四 下午4:36写道: >>> >> >> > >>> >> >> > CEPH osd restarts are taking too long a time >>> >> >> > below is my ceph.conf >>> >> >> > [osd] >>> >> >> > osd_compact_leveldb_on_mount = false >>> >> >> > leveldb_compact_on_mount = false >>> >> >> > leveldb_cache_size=1073741824 >>> >> >> > leveldb_compression = false >>> >> >> > osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k" >>> >> >> > osd_max_backfills = 1 >>> >> >> > osd_recovery_max_active = 1 >>> >> >> > osd_recovery_op_priority = 1 >>> >> >> > filestore_split_multiple = 72 >>> >> >> > filestore_merge_threshold = 480 >>> >> >> > osd_max_scrubs = 1 >>> >> >> > osd_scrub_begin_hour = 22 >>> >> >> > osd_scrub_end_hour = 3 >>> >> >> > osd_deep_scrub_interval = 2419200 >>> >> >> > osd_scrub_sleep = 0.1 >>> >> >> > >>> >> >> > looks like both osd_compact_leveldb_on_mount = false & >>> leveldb_compact_on_mount = false isnt working as expected on ceph v10.2.9 >>> >> >> > >>> >> >> > Any ideas on a fix would be appreciated asap >>> >> >> > in.linkedin.com/in/nikhilravindra >>> >> >> > >>> >> >> > _______________________________________________ >>> >> >> > ceph-users mailing list >>> >> >> > ceph-users@lists.ceph.com >>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >>> >> >> >>> >> >> >>> >> >> -- >>> >> >> Thank you! >>> >> >> HuangJun >>> >> >>> >> >>> >> >>> >> -- >>> >> Thank you! >>> >> HuangJun >>> >>> >>> >>> -- >>> Thank you! >>> HuangJun >>> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com