Re: [ceph-users] mimic/bluestore cluster can't allocate space for bluefs

2018-08-30 Thread David Turner
I have 2 OSDs failing to start due to this [1] segfault.  What is happening
matches what Sage said about this [2] bug.  The OSDs are NVMe disks and
rocksdb is compacting omaps.  I attempted setting `bluestore_bluefs_min_free
= 10737418240` and then start the OSDs, but they both segfaulted with the
same error. The segfault is immediate on OSD start happening within 5
seconds. Is there any testing that would be helpful to figuring this out
and/or get these 2 OSDs back up. All data has successfully migrated off of
them, so I'm at health OK with them marked out.


[1] FAILED assert(0 == "bluefs enospc")

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1600138


On Tue, Aug 14, 2018 at 12:29 PM Igor Fedotov  wrote:

> Hi Jakub,
>
> for the crashing OSD could you please set
>
> debug_bluestore=10
>
> bluestore_bluefs_balance_failure_dump_interval=1
>
>
> and collect more logs.
>
> This will hopefully provide more insight on why additional space isn't
> allocated for bluefs.
>
> Thanks,
>
> Igor
>
> On 8/14/2018 12:41 PM, Jakub Stańczak wrote:
>
> Hello All!
>
> I am using mimic full bluestore cluster with pure RGW workload. We use AWS
> i3 instance family for osd machines - each instance has 1 NVMe disk which
> is split into 4 partitions and each of those partitions is devoted to
> bluestore block device. We use 1 device per partition - so everything is
> managed by bluestore internally.
>
> The problem is that under write heavy conditions DB device is growing fast
> and at some point bluefs will stop getting more space which results in osd
> death. There is no recovery from this error - when bluefs runs out of space
> for rocksdb, osd dies and it cannot be restarted.
>
> With this particular osd there is plenty of free space but we can see that
> it cannot allocate more space under weird address '_balance_bluefs_freespace
> no allocate on 0x8000'.
>
> I've also did some bluefs tuning cause previously I had similar problems
> but it appeared that bluestore could not keep up with providing enough
> storage for bluefs.
>
> bluefs settings:
> bluestore_bluefs_balance_interval = 0.333 bluestore_bluefs_gift_ratio =
> 0.05 bluestore_bluefs_min_free = 3221225472
>
> snippet from osd logs:
>
> 2018-08-13 18:15:10.960 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 0x2000
> 2018-08-13 18:15:11.330 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 0x2000
> 2018-08-13 18:15:11.752 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 0x2000
> 2018-08-13 18:15:11.785 7f6a5b882700  4 rocksdb: 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
> /db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14590: 304401 
> keys, 68804532 bytes
> 2018-08-13 18:15:11.785 7f6a5b882700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 
> 1534184111786253, "cf_name": "default", "job": 41, "event": 
> "table_file_creation", "file_number": 14590, "file_size": 68804532, 
> "table_properties": {"data_size
> ": 67112437, "index_size": 92, "filter_size": 913252, "raw_key_size": 
> 13383306, "raw_average_key_size": 43, "raw_value_size": 58673606, 
> "raw_average_value_size": 192, "num_data_blocks": 17090, "num_entries": 
> 304401, "filter_policy_na
> me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": 
> "0"}}
> 2018-08-13 18:15:12.245 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 0x2000
> 2018-08-13 18:15:12.664 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 0x2000
> 2018-08-13 18:15:12.743 7f6a5b882700  4 rocksdb: 
> [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb
> /db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14591: 313351 
> keys, 68830515 bytes
> 2018-08-13 18:15:12.743 7f6a5b882700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 
> 1534184112744129, "cf_name": "default", "job": 41, "event": 
> "table_file_creation", "file_number": 14591, "file_size": 68830515, 
> "table_properties": {"data_size
> ": 67109446, "index_size": 785852, "filter_size": 934166, "raw_key_size": 
> 13762246, "raw_average_key_size": 43, "raw_value_size": 58469928, 
> "raw_average_value_size": 186, "num_data_blocks": 17124, "num_entries": 
> 313351, "filter_policy_na
> me": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": 
> "0"}}
> 2018-08-13 18:15:13.025 7f6a54073700  0 bluestore(/var/lib/ceph/osd/ceph-6) 
> _balance_bluefs_freespace no allocate on 0x8000 

Re: [ceph-users] mimic/bluestore cluster can't allocate space for bluefs

2018-08-14 Thread Igor Fedotov

Hi Jakub,

for the crashing OSD could you please set

debug_bluestore=10

bluestore_bluefs_balance_failure_dump_interval=1


and collect more logs.

This will hopefully provide more insight on why additional space isn't 
allocated for bluefs.


Thanks,

Igor


On 8/14/2018 12:41 PM, Jakub Stańczak wrote:

Hello All!

I am using mimic full bluestore cluster with pure RGW workload. We use 
AWS i3 instance family for osd machines - each instance has 1 NVMe 
disk which is split into 4 partitions and each of those partitions is 
devoted to bluestore block device. We use 1 device per partition - so 
everything is managed by bluestore internally.


The problem is that under write heavy conditions DB device is growing 
fast and at some point bluefs will stop getting more space which 
results in osd death. There is no recovery from this error - when 
bluefs runs out of space for rocksdb, osd dies and it cannot be restarted.


With this particular osd there is plenty of free space but we can see 
that it cannot allocate more space under weird address 
'_balance_bluefs_freespace no allocate on 0x8000'.


I've also did some bluefs tuning cause previously I had similar 
problems but it appeared that bluestore could not keep up with 
providing enough storage for bluefs.


bluefs settings:
bluestore_bluefs_balance_interval = 0.333 bluestore_bluefs_gift_ratio 
= 0.05 bluestore_bluefs_min_free = 3221225472


snippet from osd logs:
2018-08-13 18:15:10.960 7f6a54073700 0 
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no 
allocate on 0x8000 min_alloc_size 0x2000 2018-08-13 18:15:11.330 
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) 
_balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 
0x2000 2018-08-13 18:15:11.752 7f6a54073700 0 
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no 
allocate on 0x8000 min_alloc_size 0x2000 2018-08-13 18:15:11.785 
7f6a5b882700 4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb 
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14590: 
304401 keys, 68804532 bytes 2018-08-13 18:15:11.785 7f6a5b882700 4 
rocksdb: EVENT_LOG_v1 {"time_micros": 1534184111786253, "cf_name": 
"default", "job": 41, "event": "table_file_creation", "file_number": 
14590, "file_size": 68804532, "table_properties": {"data_size ": 
67112437, "index_size": 92, "filter_size": 913252, "raw_key_size": 
13383306, "raw_average_key_size": 43, "raw_value_size": 58673606, 
"raw_average_value_size": 192, "num_data_blocks": 17090, 
"num_entries": 304401, "filter_policy_na me": 
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": 
"0"}} 2018-08-13 18:15:12.245 7f6a54073700 0 
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no 
allocate on 0x8000 min_alloc_size 0x2000 2018-08-13 18:15:12.664 
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) 
_balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 
0x2000 2018-08-13 18:15:12.743 7f6a5b882700 4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/rocksdb 
/db/compaction_job.cc:1166] [default] [JOB 41] Generated table #14591: 
313351 keys, 68830515 bytes 2018-08-13 18:15:12.743 7f6a5b882700 4 
rocksdb: EVENT_LOG_v1 {"time_micros": 1534184112744129, "cf_name": 
"default", "job": 41, "event": "table_file_creation", "file_number": 
14591, "file_size": 68830515, "table_properties": {"data_size ": 
67109446, "index_size": 785852, "filter_size": 934166, "raw_key_size": 
13762246, "raw_average_key_size": 43, "raw_value_size": 58469928, 
"raw_average_value_size": 186, "num_data_blocks": 17124, 
"num_entries": 313351, "filter_policy_na me": 
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "0", "kMergeOperands": 
"0"}} 2018-08-13 18:15:13.025 7f6a54073700 0 
bluestore(/var/lib/ceph/osd/ceph-6) _balance_bluefs_freespace no 
allocate on 0x8000 min_alloc_size 0x2000 2018-08-13 18:15:13.405 
7f6a5b882700 1 bluefs _allocate failed to allocate 0x420 on bdev 
1, free 0x350; fallback to bdev 2 2018-08-13 18:15:13.405 
7f6a5b882700 -1 bluefs _allocate failed to allocate 0x420 on bdev 
2, dne 2018-08-13 18:15:13.405 7f6a5b882700 -1 bluefs _flush_range 
allocated: 0x0 offset: 0x0 length: 0x419db1f 2018-08-13 18:15:13.405 
7f6a54073700 0 bluestore(/var/lib/ceph/osd/ceph-6) 
_balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 
0x2000 2018-08-13 18:15:13.409 7f6a5b882700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/src/os/bluestore/Blue 
FS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, 
uint64_t, uint64_t)' thread