Hi Jakub,
for the crashing OSD could you please set
debug_bluestore=10
bluestore_bluefs_balance_failure_dump_interval=1
and collect more logs.
This will hopefully provide more insight on why additional space isn't
allocated for bluefs.
Thanks,
Igor
On 8/14/2018 12:41 PM, Jakub Stańczak wrote:
Hello All!
I am using mimic full bluestore cluster with pure RGW workload. We use
AWS i3 instance family for osd machines - each instance has 1 NVMe
disk which is split into 4 partitions and each of those partitions is
devoted to bluestore block device. We use 1 device per partition - so
everything is managed by bluestore internally.
The problem is that under write heavy conditions DB device is growing
fast and at some point bluefs will stop getting more space which
results in osd death. There is no recovery from this error - when
bluefs runs out of space for rocksdb, osd dies and it cannot be restarted.
With this particular osd there is plenty of free space but we can see
that it cannot allocate more space under weird address
'_balance_bluefs_freespace no allocate on 0x80000000'.
I've also did some bluefs tuning cause previously I had similar
problems but it appeared that bluestore could not keep up with
providing enough storage for bluefs.
bluefs settings:
bluestore_bluefs_balance_interval = 0.333 bluestore_bluefs_gift_ratio
= 0.05 bluestore_bluefs_min_free = 3221225472
snippet from osd logs:
2018-08-13 18:15:10.960 7f6a54073700 0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x80000000 min_alloc_size 0x2000 2018-08-13 18:15:11.330
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6)
_balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size
0x2000 2018-08-13 18:15:11.752 7f6a54073700 0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x80000000 min_alloc_size 0x2000 2018-08-13 18:15:11.785
7f6a5b882700 4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14590:
304401 keys, 68804532 bytes 2018-08-13 18:15:11.785 7f6a5b882700 4
rocksdb: EVENT_LOG_v1 {"time_micros": 1534184111786253, "cf_name":
"default", "job": 41, "event": "table_file_creation", "file_number":
14590, "file_size": 68804532, "table_properties": {"data_size ":
67112437, "index_size": 777792, "filter_size": 913252, "raw_key_size":
13383306, "raw_average_key_size": 43, "raw_value_size": 58673606,
"raw_average_value_size": 192, "num_data_blocks": 17090,
"num_entries": 304401, "filter_policy_na me":
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands":
"0"}} 2018-08-13 18:15:12.245 7f6a54073700 0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x80000000 min_alloc_size 0x2000 2018-08-13 18:15:12.664
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6)
_balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size
0x2000 2018-08-13 18:15:12.743 7f6a5b882700 4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14591:
313351 keys, 68830515 bytes 2018-08-13 18:15:12.743 7f6a5b882700 4
rocksdb: EVENT_LOG_v1 {"time_micros": 1534184112744129, "cf_name":
"default", "job": 41, "event": "table_file_creation", "file_number":
14591, "file_size": 68830515, "table_properties": {"data_size ":
67109446, "index_size": 785852, "filter_size": 934166, "raw_key_size":
13762246, "raw_average_key_size": 43, "raw_value_size": 58469928,
"raw_average_value_size": 186, "num_data_blocks": 17124,
"num_entries": 313351, "filter_policy_na me":
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands":
"0"}} 2018-08-13 18:15:13.025 7f6a54073700 0
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no
allocate on 0x80000000 min_alloc_size 0x2000 2018-08-13 18:15:13.405
7f6a5b882700 1 bluefs _allocate failed to allocate 0x4200000 on bdev
1, free 0x3500000; fallback to bdev 2 2018-08-13 18:15:13.405
7f6a5b882700 -1 bluefs _allocate failed to allocate 0x4200000 on bdev
2, dne 2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _flush_range
allocated: 0x0 offset: 0x0 length: 0x419db1f 2018-08-13 18:15:13.405
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6)
_balance_bluefs_freespace no allocate on 0x80000000 min_alloc_size
0x2000 2018-08-13 18:15:13.409 7f6a5b882700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/Blue
FS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*,
uint64_t, uint64_t)' thread 7f6a5b882700 time 2018-08-13
18:15:13.406645
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/BlueFS.cc:
1663: FAILED assert(0 == "bluefs enospc") ceph version 13.2.1
(5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable) 1:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0xff) [0x7f6a6b660e1f] 2: (()+0x284fe7) [0x7f6a6b660fe7] 3:
(BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned
long)+0x1ac6) [0x55f6c6db9146] 4:
(BlueRocksWritableFile::Flush()+0x3d) [0x55f6c6dcf0cd] 5:
(rocksdb::WritableFileWriter::Flush()+0x196) [0x55f6c6faf7c6] 6:
(rocksdb::WritableFileWriter::Sync(bool)+0x2e) [0x55f6c6fafa8e] 7:
(rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status
const&, rocksdb::CompactionJob::SubcompactionState*,
rocksdb::RangeDelAggregator*, CompactionIterationStats*,
rocksdb::Slice const*)+0x73b) [0x55f6c6fed26b] 8:
(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x77f)
[0x55f6c6feff3f] 9: (rocksdb::CompactionJob::Run()+0x2c8)
[0x55f6c6ff1508] 10: (rocksdb::DBImpl::BackgroundCompaction(bool*,
rocksdb::JobContext*, rocksdb::LogBuffer*,
rocksdb::DBImpl::PrepickedCompaction*)+0xab4) [0x55f6c6e57da4] 11:
(rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
rocksdb::Env::Priority)+0xd0) [0x55f6c6e59680] 12:
(rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a) [0x55f6c6e59b6a] 13:
(rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x266)
[0x55f6c7034536] 14:
(rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f)
[0x55f6c70346bf] 15: (()+0x6ae17f) [0x7f6a6ba8a17f] 16: (()+0x7e25)
[0x7f6a681c5e25] 17: (clone()+0x6d) [0x7f6a672b5bad] NOTE: a copy of
the executable, or `objdump -rdS <executable>` is needed to interpret
this.
Has anyone stumbled upon similar problem? It looks like a bug to me - happened
on several OSDs already, always different size of bluefs, different saturation
of osd.
Best Regards, Kuba Stańczak
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com