On Wed, Mar 25, 2020 at 08:43:41AM +0100, Eneko Lacunza wrote: > Hi Alwin, > > El 24/3/20 a las 14:54, Alwin Antreich escribió: > > On Tue, Mar 24, 2020 at 01:12:03PM +0100, Eneko Lacunza wrote: > > > Hi Allwin, > > > > > > El 24/3/20 a las 12:24, Alwin Antreich escribió: > > > > On Tue, Mar 24, 2020 at 10:34:15AM +0100, Eneko Lacunza wrote: > > > > > We're seeing a spillover issue with Ceph, using 14.2.8: > > > [...] > > > > > 3. ceph health detail > > > > > HEALTH_WARN BlueFS spillover detected on 3 OSD > > > > > BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD > > > > > osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used > > > > > of > > > > > 6.0 GiB) to slow device > > > > > osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used > > > > > of > > > > > 6.0 GiB) to slow device > > > > > osd.5 spilled over 5 MiB metadata from 'db' device (551 MiB used > > > > > of > > > > > 6.0 GiB) to slow device > > > > > > > > > > I may be overlooking something, any idea? Just found also the > > > > > following ceph > > > > > issue: > > > > > > > > > > https://tracker.ceph.com/issues/38745 > > > > > > > > > > 5MiB of metadata in slow isn't a big problem, but cluster is > > > > > permanently in > > > > > health Warning state... :) > > > > The DB/WAL device is to small and all the new metadata has to be written > > > > to the slow device. This will destroy performance. > > > > > > > > I think the size changes, as the DB gets compacted. > > > Yes. But it isn't too small... it's 6 GiB and there's only ~560MiB of > > > data. > > Yes true. I meant the used of size. But the message is oddly. > > > > You should find the compaction stats in the OSD log files. It could be, > > as in the bug tracker reasoned, that the compaction needs to much space > > and spills over to the slow device. Addionally, if no set extra, the WAL > > will take up 512 MB on the DB device. > I don't see any indication that compaction needs too much space: > > 2020-03-24 14:24:04.883 7f03ffbee700 4 rocksdb: [db/db_impl.cc:777] ------- > DUMPING STATS ------- > 2020-03-24 14:24:04.883 7f03ffbee700 4 rocksdb: [db/db_impl.cc:778] > ** DB Stats ** > Uptime(secs): 15000.1 total, 600.0 interval > Cumulative writes: 4646 writes, 18K keys, 4646 commit groups, 1.0 writes per > commit group, ingest: 0.01 GB, 0.00 MB/s > Cumulative WAL: 4646 writes, 1891 syncs, 2.46 writes per sync, written: 0.01 > GB, 0.00 MB/s > Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent > Interval writes: 163 writes, 637 keys, 163 commit groups, 1.0 writes per > commit group, ingest: 0.63 MB, 0.00 MB/s > Interval WAL: 163 writes, 67 syncs, 2.40 writes per sync, written: 0.00 MB, > 0.00 MB/s > Interval stall: 00:00:0.000 H:M:S, 0.0 percent > > ** Compaction Stats [default] ** > Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) > Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) > Avg(sec) KeyIn KeyDrop > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > L0 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 1.0 0.0 33.4 0.02 0.00 2 0.009 > 0 0 > L1 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.8 162.1 134.6 0.09 0.06 1 0.092 > 127K 10K > L2 9/0 538.64 MB 0.2 0.5 0.0 0.5 0.5 0.0 > 0.0 43.6 102.7 101.2 5.32 1.31 1 5.325 > 1496K 110K > Sum 9/0 538.64 MB 0.0 0.5 0.0 0.5 0.5 0.0 > 0.0 961.1 103.3 101.5 5.43 1.37 4 1.358 > 1623K 121K > Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 > 0 0 > > ** Compaction Stats [default] ** > Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) > Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) > Comp(cnt) Avg(sec) KeyIn KeyDrop > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Low 0/0 0.00 KB 0.0 0.5 0.0 0.5 0.5 0.0 > 0.0 0.0 103.7 101.7 5.42 1.36 2 2.708 > 1623K 121K > High 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 43.9 0.01 0.00 1 0.013 > 0 0 > User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 0.4 0.00 0.00 1 0.004 > 0 0 > Uptime(secs): 15000.1 total, 600.0 interval > Flush(GB): cumulative 0.001, interval 0.000 > AddFile(GB): cumulative 0.000, interval 0.000 > AddFile(Total Files): cumulative 0, interval 0 > AddFile(L0 Files): cumulative 0, interval 0 > AddFile(Keys): cumulative 0, interval 0 > Cumulative compaction: 0.54 GB write, 0.04 MB/s write, 0.55 GB read, 0.04 > MB/s read, 5.4 seconds > Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s > read, 0.0 seconds > Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 > level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for > pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 > memtable_compaction, 0 memtable_slowdown, interval 0 total count > > I see the following in a perf dump: > > "bluefs": { > "gift_bytes": 0, > "reclaim_bytes": 0, > "db_total_bytes": 6442442752, > "db_used_bytes": 696246272, > "wal_total_bytes": 0, > "wal_used_bytes": 0, > "slow_total_bytes": 40004222976, > "slow_used_bytes": 5242880, > "num_files": 20, > "log_bytes": 41631744, > "log_compactions": 0, > "logged_bytes": 40550400, > "files_written_wal": 2, > "files_written_sst": 41, > "bytes_written_wal": 102040973, > "bytes_written_sst": 2233090674, > "bytes_written_slow": 0, > "max_bytes_wal": 0, > "max_bytes_db": 1153425408, > "max_bytes_slow": 0, > "read_random_count": 127832, > "read_random_bytes": 2761102524, > "read_random_disk_count": 19206, > "read_random_disk_bytes": 2330400597, > "read_random_buffer_count": 108844, > "read_random_buffer_bytes": 430701927, > "read_count": 21457, > "read_bytes": 1087948189, > "read_prefetch_count": 21438, > "read_prefetch_bytes": 1086853927 > }, > > > > If the above doesn't give any information then you may need to export > > the bluefs (RocksDB). Then you can run the kvstore-tool on it. > I'll look to try this, although I'd say it's some kind of bug. > > > > > > The easiest way ist to destroy and re-create the OSD with a bigger > > > > DB/WAL. The guideline from Facebook for RocksDB is 3/30/300 GB. > > > It's well below the 3GiB limit in the guideline ;) > > For now. ;) > Cluster has 2 years now, data amount is quite stable, I think it will hold > for some time ;) Hm... Igor recons that this seems to be normal. https://tracker.ceph.com/issues/38745#note-28
-- Cheers, Alwin _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user