Re: [ceph-users] Rocksdb: Try to delete WAL files size....
Anyone? Am 9. Februar 2018 09:59:54 MEZ schrieb Dietmar Rieder: >Hi, > >we are running ceph version 12.2.2 (10 nodes, 240 OSDs, 3 mon). While >monitoring the WAL db used bytes we noticed that there are some OSDs >that use proportionally more WAL db bytes than others (800Mb vs 300Mb). >These OSDs eventually exceed the WAL db size (1GB in our case) and >spill >over to the HDD data device. So it seems flushing the WAL db does not >free space. > >We looked for some hints in the logs of the OSDs in question and >spotted >the following entries: > >[...] >2018-02-08 16:17:27.496695 7f0ffce55700 4 rocksdb: >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:684] >reusing log 152 from recycle list >2018-02-08 16:17:27.496768 7f0ffce55700 4 rocksdb: >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:725] >[default] New memtable created with log file: #162. Immutable >memtables: 0. >2018-02-08 16:17:27.496976 7f0fe7e2b700 4 rocksdb: (Original Log Time >2018/02/08-16:17:27.496841) >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1158] >Calling FlushMemTableToOutputFile with column family [default], flush >slots available 1, compaction slots allowed 1, compaction slots >scheduled 1 >2018-02-08 16:17:27.496983 7f0fe7e2b700 4 rocksdb: >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:264] >[default] [JOB 6] Flushing memtable with next log file: 162 >2018-02-08 16:17:27.497001 7f0fe7e2b700 4 rocksdb: EVENT_LOG_v1 >{"time_micros": 1518103047496990, "job": 6, "event": "flush_started", >"num_memtables": 1, "num_entries": 328542, "num_deletes": 66632, >"memory_usage": 260058032} >2018-02-08 16:17:27.497006 7f0fe7e2b700 4 rocksdb: >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:293] >[default] [JOB 6] Level-0 flush table #163: started >2018-02-08 16:17:27.627110 7f0fe7e2b700 4 rocksdb: EVENT_LOG_v1 >{"time_micros": 1518103047627094, "cf_name": "default", "job": 6, >"event": "table_file_creation", "file_number": 163, "file_size": >5502182, "table_properties": {"data_size": 5160167, "index_size": >81548, >"filter_size": 259478, "raw_key_size": 5138655, "raw_average_key_size": >51, "raw_value_size": 3606384, "raw_average_value_size": 36, >"num_data_blocks": 1287, "num_entries": 98984, "filter_policy_name": >"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "66093", >"kMergeOperands": >"192"}} >2018-02-08 16:17:27.627127 7f0fe7e2b700 4 rocksdb: >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:319] >[default] [JOB 6] Level-0 flush table #163: 5502182 bytes OK >2018-02-08 16:17:27.627449 7f0fe7e2b700 4 rocksdb: >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_files.cc:242] >adding log 155 to recycle list >2018-02-08 16:17:27.627457 7f0fe7e2b700 4 rocksdb: (Original Log Time >2018/02/08-16:17:27.627136) >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:360] >[default] Level-0 commit table #163 started >2018-02-08 16:17:27.627461 7f0fe7e2b700 4 rocksdb: (Original Log Time >2018/02/08-16:17:27.627402) >[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:383] >[default] Level-0 commit table #163: memtable #1 done >2018-02-08 16:17:27.627474 7f0fe7e2b700 4 rocksdb: (Original Log Time >2018/02/08-16:17:27.627415) EVENT_LOG_v1 {"time_micros": >1518103047627409, "job": 6, "event": "flush_finished", "lsm_state": [1, >2, 3, 0, 0, 0, 0], "immutable_memtables": 0} >2018-02-08 16:17:27.627476 7f0fe7e2b700 4 rocksdb: (Original Log Time >2018/02/08-16:17:27.627435)
[ceph-users] Rocksdb: Try to delete WAL files size....
Hi, we are running ceph version 12.2.2 (10 nodes, 240 OSDs, 3 mon). While monitoring the WAL db used bytes we noticed that there are some OSDs that use proportionally more WAL db bytes than others (800Mb vs 300Mb). These OSDs eventually exceed the WAL db size (1GB in our case) and spill over to the HDD data device. So it seems flushing the WAL db does not free space. We looked for some hints in the logs of the OSDs in question and spotted the following entries: [...] 2018-02-08 16:17:27.496695 7f0ffce55700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:684] reusing log 152 from recycle list 2018-02-08 16:17:27.496768 7f0ffce55700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:725] [default] New memtable created with log file: #162. Immutable memtables: 0. 2018-02-08 16:17:27.496976 7f0fe7e2b700 4 rocksdb: (Original Log Time 2018/02/08-16:17:27.496841) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1158] Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 1 2018-02-08 16:17:27.496983 7f0fe7e2b700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:264] [default] [JOB 6] Flushing memtable with next log file: 162 2018-02-08 16:17:27.497001 7f0fe7e2b700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1518103047496990, "job": 6, "event": "flush_started", "num_memtables": 1, "num_entries": 328542, "num_deletes": 66632, "memory_usage": 260058032} 2018-02-08 16:17:27.497006 7f0fe7e2b700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:293] [default] [JOB 6] Level-0 flush table #163: started 2018-02-08 16:17:27.627110 7f0fe7e2b700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1518103047627094, "cf_name": "default", "job": 6, "event": "table_file_creation", "file_number": 163, "file_size": 5502182, "table_properties": {"data_size": 5160167, "index_size": 81548, "filter_size": 259478, "raw_key_size": 5138655, "raw_average_key_size": 51, "raw_value_size": 3606384, "raw_average_value_size": 36, "num_data_blocks": 1287, "num_entries": 98984, "filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "66093", "kMergeOperands": "192"}} 2018-02-08 16:17:27.627127 7f0fe7e2b700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:319] [default] [JOB 6] Level-0 flush table #163: 5502182 bytes OK 2018-02-08 16:17:27.627449 7f0fe7e2b700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_files.cc:242] adding log 155 to recycle list 2018-02-08 16:17:27.627457 7f0fe7e2b700 4 rocksdb: (Original Log Time 2018/02/08-16:17:27.627136) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:360] [default] Level-0 commit table #163 started 2018-02-08 16:17:27.627461 7f0fe7e2b700 4 rocksdb: (Original Log Time 2018/02/08-16:17:27.627402) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:383] [default] Level-0 commit table #163: memtable #1 done 2018-02-08 16:17:27.627474 7f0fe7e2b700 4 rocksdb: (Original Log Time 2018/02/08-16:17:27.627415) EVENT_LOG_v1 {"time_micros": 1518103047627409, "job": 6, "event": "flush_finished", "lsm_state": [1, 2, 3, 0, 0, 0, 0], "immutable_memtables": 0} 2018-02-08 16:17:27.627476 7f0fe7e2b700 4 rocksdb: (Original Log Time 2018/02/08-16:17:27.627435) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:132] [default] Level summary: base level 1 max bytes base 268435456 files[1 2 3 0 0 0