I have a problem with the snap_schedule MGR module. It seams to forget at least parts of the configuration after the active MGR is restarted. The following cli commands (lines starting with ‘$’) and their std out (lines starting with >) demonstrates the problem.
$ ceph fs snap-schedule add /shares/users 1h 2021-10-31T18:00 > Schedule set for path /shares/users $ ceph fs snap-schedule retention add /shares/users 14h10d12m > Retention added to path /shares/users Wait until the next complete hour. $ ceph fs snap-schedule status /shares/users > {"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": > "/shares/users", "schedule": "1h", "retention": {"h": 14, "d": 10, "m": 12}, > "start": "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": > "2022-01-27T00:00:00", "last": "2022-01-27T00:00:00", "last_pruned": > "2022-01-27T00:00:00", "created_count": 1, "pruned_count": 1, "active": true} Now everything looks and works as expected. However, if I restart the active MGR, no new snapshots will be created and the status command does unexpectedly report NULL for some of the properties. $ systemctl restart ceph-mgr@apollon.service $ ceph fs snap-schedule status /shares/users > {"fs": "cephfs", "subvol": null, "path": "/shares/users", "rel_path": > "/shares/users", "schedule": "1h", "retention": {}, "start": > "2021-10-31T18:00:00", "created": "2022-01-26T23:52:03", "first": null, > "last": null, "last_pruned": null, "created_count": 0, "pruned_count": 0, > "active": true} I did look into the source file mgr/snap_schedule/fs/schedule.py. Since, I never used python I do not understand much, but I understand the SQL code that is given. Therefore, I did save the sqlight DB dump before and after a MGR restart by the following commands: List RADOS objects in order to find the sqlight DB dump: $ rados --pool fs.metadata-root-pool --namespace cephfs-snap-schedule ls > snap_db_v0 Copy the sqlight DB dump into a regular file $ rados --pool fs.metadata-root-pool --namespace cephfs-snap-schedule get snap_db_v0 /tmp/snap_db_v0 To my surprise, the sqlight DB dump never contains the information for retention, first, last, and last_pruned. The sqlight DB dump always looks like this: ———————————————— BEGIN TRANSACTION; CREATE TABLE schedules( id INTEGER PRIMARY KEY ASC, path TEXT NOT NULL UNIQUE, subvol TEXT, retention TEXT DEFAULT '{}', rel_path TEXT NOT NULL ); INSERT INTO "schedules" VALUES(2,'/shares/groups',NULL,'{}','/shares/groups'); INSERT INTO "schedules" VALUES(3,'/shares/backup-clients',NULL,'{}','/shares/backup-clients'); INSERT INTO "schedules" VALUES(4,'/shares/users',NULL,'{}','/shares/users'); CREATE TABLE schedules_meta( id INTEGER PRIMARY KEY ASC, schedule_id INT, start TEXT NOT NULL, first TEXT, last TEXT, last_pruned TEXT, created TEXT NOT NULL, repeat INT NOT NULL, schedule TEXT NOT NULL, created_count INT DEFAULT 0, pruned_count INT DEFAULT 0, active INT NOT NULL, FOREIGN KEY(schedule_id) REFERENCES schedules(id) ON DELETE CASCADE, UNIQUE (schedule_id, start, repeat) ); INSERT INTO "schedules_meta" VALUES(2,2,'2021-10-31T18:00:00',NULL,NULL,NULL,'2022-01-21T11:41:35',3600,'1h',0,0,1); INSERT INTO "schedules_meta" VALUES(3,3,'2021-10-31T13:30:00',NULL,NULL,NULL,'2022-01-21T11:41:41',21600,'6h',0,0,1); INSERT INTO "schedules_meta" VALUES(4,4,'2021-10-31T18:00:00',NULL,NULL,NULL,'2022-01-26T23:52:03',3600,'1h',0,0,1); COMMIT; ———————————————— Why are the information about retention, first, last, and last_pruned are not part of the sqlight dump? Is this the reason why my snapshot scheduling stops working after the active MGR is restarted? My ceph version is: 16.2.6 Thanks is advance, Sebastian _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io