Re: [ceph-users] Rocksdb: Try to delete WAL files size....

2018-02-12 Thread Dietmar Rieder
Anyone?

Am 9. Februar 2018 09:59:54 MEZ schrieb Dietmar Rieder 
:
>Hi,
>
>we are running ceph version 12.2.2 (10 nodes, 240 OSDs, 3 mon). While
>monitoring the WAL db used bytes we noticed that there are some OSDs
>that use proportionally more WAL db bytes than others (800Mb vs 300Mb).
>These OSDs eventually exceed the WAL db size (1GB in our case) and
>spill
>over to the HDD data device. So it seems flushing the WAL db does not
>free space.
>
>We looked for some hints in the logs of the OSDs in question and
>spotted
>the following entries:
>
>[...]
>2018-02-08 16:17:27.496695 7f0ffce55700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:684]
>reusing log 152 from recycle list
>2018-02-08 16:17:27.496768 7f0ffce55700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:725]
>[default] New memtable created with log file: #162. Immutable
>memtables: 0.
>2018-02-08 16:17:27.496976 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.496841)
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1158]
>Calling FlushMemTableToOutputFile with column family [default], flush
>slots available 1, compaction slots allowed 1, compaction slots
>scheduled 1
>2018-02-08 16:17:27.496983 7f0fe7e2b700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:264]
>[default] [JOB 6] Flushing memtable with next log file: 162
>2018-02-08 16:17:27.497001 7f0fe7e2b700  4 rocksdb: EVENT_LOG_v1
>{"time_micros": 1518103047496990, "job": 6, "event": "flush_started",
>"num_memtables": 1, "num_entries": 328542, "num_deletes": 66632,
>"memory_usage": 260058032}
>2018-02-08 16:17:27.497006 7f0fe7e2b700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:293]
>[default] [JOB 6] Level-0 flush table #163: started
>2018-02-08 16:17:27.627110 7f0fe7e2b700  4 rocksdb: EVENT_LOG_v1
>{"time_micros": 1518103047627094, "cf_name": "default", "job": 6,
>"event": "table_file_creation", "file_number": 163, "file_size":
>5502182, "table_properties": {"data_size": 5160167, "index_size":
>81548,
>"filter_size": 259478, "raw_key_size": 5138655, "raw_average_key_size":
>51, "raw_value_size": 3606384, "raw_average_value_size": 36,
>"num_data_blocks": 1287, "num_entries": 98984, "filter_policy_name":
>"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "66093",
>"kMergeOperands":
>"192"}}
>2018-02-08 16:17:27.627127 7f0fe7e2b700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:319]
>[default] [JOB 6] Level-0 flush table #163: 5502182 bytes OK
>2018-02-08 16:17:27.627449 7f0fe7e2b700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_files.cc:242]
>adding log 155 to recycle list
>2018-02-08 16:17:27.627457 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.627136)
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:360]
>[default] Level-0 commit table #163 started
>2018-02-08 16:17:27.627461 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.627402)
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:383]
>[default] Level-0 commit table #163: memtable #1 done
>2018-02-08 16:17:27.627474 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.627415) EVENT_LOG_v1 {"time_micros":
>1518103047627409, "job": 6, "event": "flush_finished", "lsm_state": [1,
>2, 3, 0, 0, 0, 0], "immutable_memtables": 0}
>2018-02-08 16:17:27.627476 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.627435)

[ceph-users] Rocksdb: Try to delete WAL files size....

2018-02-09 Thread Dietmar Rieder
Hi,

we are running ceph version 12.2.2 (10 nodes, 240 OSDs, 3 mon). While
monitoring the WAL db used bytes we noticed that there are some OSDs
that use proportionally more WAL db bytes than others (800Mb vs 300Mb).
These OSDs eventually exceed the WAL db size (1GB in our case) and spill
over to the HDD data device. So it seems flushing the WAL db does not
free space.

We looked for some hints in the logs of the OSDs in question and spotted
the following entries:

[...]
2018-02-08 16:17:27.496695 7f0ffce55700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:684]
reusing log 152 from recycle list
2018-02-08 16:17:27.496768 7f0ffce55700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:725]
[default] New memtable created with log file: #162. Immutable memtables: 0.
2018-02-08 16:17:27.496976 7f0fe7e2b700  4 rocksdb: (Original Log Time
2018/02/08-16:17:27.496841)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1158]
Calling FlushMemTableToOutputFile with column family [default], flush
slots available 1, compaction slots allowed 1, compaction slots scheduled 1
2018-02-08 16:17:27.496983 7f0fe7e2b700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:264]
[default] [JOB 6] Flushing memtable with next log file: 162
2018-02-08 16:17:27.497001 7f0fe7e2b700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1518103047496990, "job": 6, "event": "flush_started",
"num_memtables": 1, "num_entries": 328542, "num_deletes": 66632,
"memory_usage": 260058032}
2018-02-08 16:17:27.497006 7f0fe7e2b700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:293]
[default] [JOB 6] Level-0 flush table #163: started
2018-02-08 16:17:27.627110 7f0fe7e2b700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1518103047627094, "cf_name": "default", "job": 6,
"event": "table_file_creation", "file_number": 163, "file_size":
5502182, "table_properties": {"data_size": 5160167, "index_size": 81548,
"filter_size": 259478, "raw_key_size": 5138655, "raw_average_key_size":
51, "raw_value_size": 3606384, "raw_average_value_size": 36,
"num_data_blocks": 1287, "num_entries": 98984, "filter_policy_name":
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "66093", "kMergeOperands":
"192"}}
2018-02-08 16:17:27.627127 7f0fe7e2b700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:319]
[default] [JOB 6] Level-0 flush table #163: 5502182 bytes OK
2018-02-08 16:17:27.627449 7f0fe7e2b700  4 rocksdb:
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_files.cc:242]
adding log 155 to recycle list
2018-02-08 16:17:27.627457 7f0fe7e2b700  4 rocksdb: (Original Log Time
2018/02/08-16:17:27.627136)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:360]
[default] Level-0 commit table #163 started
2018-02-08 16:17:27.627461 7f0fe7e2b700  4 rocksdb: (Original Log Time
2018/02/08-16:17:27.627402)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:383]
[default] Level-0 commit table #163: memtable #1 done
2018-02-08 16:17:27.627474 7f0fe7e2b700  4 rocksdb: (Original Log Time
2018/02/08-16:17:27.627415) EVENT_LOG_v1 {"time_micros":
1518103047627409, "job": 6, "event": "flush_finished", "lsm_state": [1,
2, 3, 0, 0, 0, 0], "immutable_memtables": 0}
2018-02-08 16:17:27.627476 7f0fe7e2b700  4 rocksdb: (Original Log Time
2018/02/08-16:17:27.627435)
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:132]
[default] Level summary: base level 1 max bytes base 268435456 files[1 2
3 0 0 0