Re: [ceph-users] To backport or not to backport
Hi, On 7/4/19 3:00 PM, Stefan Kooman wrote: > - Only backport fixes that do not introduce new functionality, but addresses > (impaired) functionality already present in the release. ack, and also my full agrement/support for everything else you wrote, thanks. reading in the changelogs about backported features (in particular the one release where bluestor was backported to) left me quite scared for upgrading our cluster. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Debian Buster builds
On 6/18/19 3:39 PM, Paul Emmerich wrote: > we maintain (unofficial) Nautilus builds for Buster here: > https://mirror.croit.io/debian-nautilus/ the repository doesn't contain the source packages. just out of curiosity to see what you might have changes, apart from just (re)building the packages.. are they available somewhere? Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Debian Buster builds
On 6/18/19 3:11 PM, Tobias Gall wrote: > I would like to switch to debian buster and test the upgrade from > luminous but there are currently no ceph releases/builds for buster. shameless plug: we're re-building ceph packages in our repository that we do for our university (and a few other users; hence the neutral project name). if you feel comfortable adding a third-party repo, you can use: # backports on top of buster for packages that are not in debian deb https://cdn.deb.progress-linux.org/packages engywuck-backports-extras main contrib non-free (trust path to the archive signing keys can be established via the progress-linux package in debian) Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Changing the release cadence
Hi, I didn't bother to create a twitter account just to be able to participate in the poll.. so.. please count me in for October. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Changing the release cadence
On 6/6/19 9:26 AM, Xiaoxi Chen wrote: > I will vote for November for several reasons: [...] as an academic institution we're aligned by August to July (school year) instead of the January to December (calendar year), so all your reasons (thanks!) are valid for us.. just shifted by 6 months, hence Q1 is ideal for us. however, given that academic institutions are the minority, I'm convinced now that November is the better choice for everyone. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Changing the release cadence
On 6/5/19 5:57 PM, Sage Weil wrote: > So far the balance of opinion seems to favor a shift to a 12 month > cycle [...] it seems pretty likely we'll make that shift. thanks, much appreciated (from an cluster operating point of view). > Thoughts? GNOME and a few others are doing April and October releases which seems balanced and to be good timing for most people; personally I prefer spring rather than autum for upgrades, hence.. would suggest April. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic 13.2.3?
On 01/04/2019 07:32 PM, Peter Woodman wrote: > not to mention that the current released version of mimic (.2) has a > bug that is potentially catastrophic to cephfs, known about for > months, yet it's not in the release notes. would have upgraded and > destroyed data had i not caught a thread on this list. indeed. we're a big cephfs user here for HPC. everytime I get asked about it by my peers, sadly I have to tell them that they should not use it for production, that it's not stable and has serious stability bugs (eventhough it was declared "stable" upstream some time ago). (e.g. doing an rsync on, from or to a cephfs, just like someone wrote a couple of days again on the list, reliably kills it, everytime - we reproduce it with every kernel release and every ceph release since february 2015 on several independent clusters. even more catastropic is that single inconsistent files stopps the whole cephfs which then cannot be restored unless the affected cephfs is unmounted on all(!) machines that have it mounted, etc. we can use cephfs only in our sort-of-stable setup with 12.2.5 because we have mostly non-malicious users that usualy behave nicely. but it's to brittle in the end and apparently no silver lining ahead. because of that, during our scaling up of our cephfs cluster from 300tb to 1.2pb this spring, we'll be moving away from cephfs entirely and switch to mounting RBDs and export them with samba instead. we have good experiences with RBDs on other clusters. but using RBDs that way is quite painful when knowing that cephfs exists, it's slower, and not really HA anymore, but it's overall more reliable than cephfs) as much as I like ceph, I unfortunatly can't say the same for cephfs :( Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic 13.2.3?
On 01/04/2019 05:07 PM, Matthew Vernon wrote: > how is it still the case that packages are being pushed onto the official > ceph.com repos that people > shouldn't install? We're still on 12.2.5 because of this. Basically every 12.2.x after that had notes on the mailinglist like "don't use, wait for ..." I don't dare updating to 13.2. For the 10.2.x and 11.2.x cycles, we upgraded our production cluster within a matter of days after the release of an update. Since the second half of the 12.2.x releases, this seems to be not possible anymore. Ceph is great and all, but this decrease of release quality seriously harms the image and perception of Ceph as a stable software platform in the enterprise environment and makes people do the wrong things (rotting systems update-wise, for the sake of stability). Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem with CephFS
Hi, On 11/21/2018 07:04 PM, Rodrigo Embeita wrote: > Reduced data availability: 7 pgs inactive, 7 pgs down this is your first problem: unless you have all data available again, cephfs will not be back. after that, I would take care about the redundancy next, and get the one missing monitor back online. once that is done, get the mds working again and your cephfs should be back in service. if you encounter problems with any of the steps, send all the necessary commands and outputs to the list and I (or others) can try to help. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic and Debian 9
Hi, On 10/17/2018 04:04 PM, John Spray wrote: > If there isn't anything > too hacky involved in the build perhaps your packages could simply be > the official ones? being a Debian Developer, I can upload my backports that I maintain/use at work to e.g. people.debian.org/~daniel or so. Given time constrains, I can't do it right now.. but until end of month. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ls operation is too slow in cephfs
On 07/17/2018 11:43 AM, Marc Roos wrote: > I had similar thing with doing the ls. Increasing the cache limit helped > with our test cluster same here; additionally we also had to use more than one MDS to get good performance (currently 3 MDS plus 2 stand-by per FS). Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] fuse vs kernel client
On 07/09/2018 10:18 AM, Manuel Sopena Ballesteros wrote: > FUSE is supposed to run slower. in our tests with ceph 11.2.x and 12.2.x clusters, cephfs-fuse is always around 10 times slower than kernel cephfs. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] samba gateway experiences with cephfs ?
Hi, On 05/24/2018 02:53 PM, David Disseldorp wrote: >> [ceph_test] >> path = /ceph-kernel >> guest ok = no >> delete readonly = yes >> oplocks = yes >> posix locking = no jftr, we use the following to disable all locking (on samba 4.8.2): oplocks = False level2 oplocks = False kernel oplocks = no Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph replication factor of 2
Hi, I coudn't agree more, but just to re-emphasize what others already said: the point of replica 3 is not to have extra safety for (human|software|server) failures, but to have enough data around to allow rebalancing the cluster when disks fail. after a certain amount of disks in a cluster, you're going to get disks failures all the time. if you don't pay extra attention (and wasting lots and lots of time/money) to carefully arrange/choose disks of different vendor productions lines/dates, simultaneous disk failures happen within minutes. example from our past: on our (at that time small) cluster of 72 disks spread over 6 storage nodes, half of them were seagate enterprice capacity disks, the other half western digitial red pro. for each disk manufacturer, we bought only half of the disks from the same production. so.. we had.. * 18 disks wd, production charge A * 18 disks wd, production charge B * 18 disks seagate, production charge C * 18 disks seagate, production charge D one day, 6 disks failed simultaneously spread over two storage nodes. had we had replica 2, we couldn't recover and would have lost data. instead, because of replica 3, we didn't loose any data and ceph automatically rebalanced all data before further disks were failing. so: if re-creating data stored on the cluster is valuable (because it costs much time and effort to 're-collect' it, or you can't accept the time it takes to restore from backup, or worse to re-create it from scratch), you have to assume that whatever manufacturer/production charge of HDs you're using, they *can* fail all at the same time because you could have hit a faulty production. the only way out here is replica >=3. (of course, the whole MTBF and why raid doesn't scale applies as well) Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] samba gateway experiences with cephfs ?
Hi On 05/21/2018 05:38 PM, Jake Grimmett wrote: > Unfortunately we have a large number (~200) of Windows and Macs clients > which need CIFS/SMB access to cephfs. we too, which is why we're (partially) exporting cephfs over samba too, 1.5y in production now. for us, cephfs-over-samba is significantly slower than cephfs directly too, but it's not really an issue here (basically, if people use a windows client here, they're already on the slow track anyway). we had to do two things to get it working reliably though: a) disable all locking on samba (otherwise "opportunistic locking" on windows clients killed within hours all mds (kraken at that time)) b) only allow writes to a specific space on cephfs, reserved to samba (with luminous; otherwise, we'd have problems with data consistency on cephfs with people writing the same files from linux->cephfs and samba->cephfs concurrently). my hunch is that samba caches writes and doesn't give them back appropriatly. > Finally, is the vfs_ceph module for Samba useful? It doesn't seem to be > widely available pre-complied for for RHEL derivatives. Can anyone > comment on their experiences using vfs_ceph, or point me to a Centos 7.x > repo that has it? we use debian, with backported kernel and backported samba, which has vfs_ceph pre-compiled. however, we couldn't make vfs_ceph work at all - the snapshot patters just don't seem to match/align (and nothing we tried seem to work). Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] (yet another) multi active mds advise needed
On 05/19/2018 01:13 AM, Webert de Souza Lima wrote: > New question: will it make any difference in the balancing if instead of > having the MAIL directory in the root of cephfs and the domains's > subtrees inside it, I discard the parent dir and put all the subtress right > in cephfs root? the balancing between the MDS is influenced by which directories are accessed, the currently accessed directory-trees are diveded between the MDS's (also check the dirfrag option in the docs). assuming you have the same access pattern, the "fragmentation" between the MDS's happens at these "target-directories", so it doesn't matter if these directories are further up or down in the same filesystem tree. in the multi-MDS scenario where the MDS serving rank 0 fails, the effects in the moment of the failure for any cephfs client accessing a directory/file are the same (as described in an earlier mail), regardless on which level the directory/file is within the filesystem. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] (yet another) multi active mds advise needed
On 05/18/2018 11:19 PM, Patrick Donnelly wrote: > So, you would want to have a standby-replay > daemon for each rank or just have normal standbys. It will likely > depend on the size of your MDS (cache size) and available hardware. jftr, having 3 active mds and 3 standby-replay resulted May 20217 in a longer downtime for us due to http://tracker.ceph.com/issues/21749 (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/thread.html#21390 - thanks again for the help back then, still much appreciated) we're not using standby-replay MDS's anymore but only "normal" standby, and didn't have had any problems anymore (running kraken then, upgraded to luminous last fall). Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Multi-MDS Failover
On 04/27/2018 07:11 PM, Patrick Donnelly wrote: > The answer is that there may be partial availability from > the up:active ranks which may hand out capabilities for the subtrees > they manage or no availability if that's not possible because it > cannot obtain the necessary locks. additionally: if rank 0 is lost, the whole FS stands still (no new client can mount the fs; no existing client can change a directory, etc.). my guess is that the root of a cephfs (/; which is always served by rank 0) is needed in order to do traversals/lookups of any directories on the top-level (which then can be served by ranks 1-n). last year, we had quite some troubles with unstable cephfs (MDS reliably and reproducibly crashing when hitting them with rsync over multi-TB directories with files all being <<1mb) and had lots of situations where ranks (most of the time including 0) were down. fortunatly we could always get the fs back my unmounting it on all clients, restarting all mds. the last of these unstabilities seem to have gone with 12.2.3/12.2.4 (we're now running 12.2.5). Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2
ceph is cluster - so reboots aren't an issue (we do set noout during a planed serial reboot of all machines of the cluster). personally i don't think the hassle of live patching is worth it. it's a very gross hack that only works well in very specific niche cases. ceph (as every proper cluster) is imho not such a use case. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?
Hi, On 01/19/18 14:46, Youzhong Yang wrote: > Just wondering if anyone has seen the same issue, or it's just me. we're using debian with our own backported kernels and ceph, works rock solid. what you're describing sounds more like hardware issues to me. if you don't fully "trust"/have confidence in your hardware (and your logs don't reveal anything), I'd recommend running some burn-in tests (memtest, cpuburn, etc.) on them for 24 hours/machine to rule out cpu/ram/etc. issues. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS log jam prevention
Hi, On 12/05/17 17:58, Dan Jakubiec wrote: > Is this is configuration problem or a bug? we had massive problems with both kraken (feb-sept 2017) and luminous (12.2.0), seeing the same behaviour as you. ceph.conf was containing defaults only, except that we had to crank up mds_cache_size and mds_bal_fragment_size_max. using dirfrag and multi-mds did not change anything. even with luminous (12.2.0) basically a single rsync over a large directory tree could kill cephfs for all clients within seconds, where even a waiting period of >8 hours did not help. since the cluster was semi-productive, we coudn't take the downtime so we switched to unmounting all cephfs, flush journal, and re-mount it. interestingly with 12.2.1 on kernel 4.13 however, this doesn't occur anymore (the 'mds lagging behind' still happens, but recovers quickly within minutes, and the rsync doesn not need to be aborted). i'm not sure if 12.2.1 fixed it itself, or it was your config changes happening at the same time: mds_session_autoclose = 10 mds_reconnect_timeout = 10 mds_blacklist_interval = 10 mds_session_blacklist_on_timeout = false mds_session_blacklist_on_evict = false Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-disk is now deprecated
On 11/30/17 14:04, Fabian Grünbichler wrote: > point is - you should not purposefully attempt to annoy users and/or > downstreams by changing behaviour in the middle of an LTS release cycle, exactly. upgrading the patch level (x.y.z to x.y.z+1) should imho never introduce a behaviour-change, regardless if it's "just" adding new warnings or not. this is a stable update we're talking about, even more so since it's an LTS release. you never know how people use stuff (e.g. by parsing stupid things), so such behaviour-change will break stuff for *some* people (granted, most likely a really low number). my expection to an stable release is, that it stays, literally, stable. that's the whole point of having it in the first place. otherwise we would all be running git snapshots and update randomly to newer ones. adding deprecation messages in mimic makes sense, and getting rid of it/not provide support for it in mimic+1 is reasonable. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS - Mounting a second Ceph file system
On 11/29/17 00:06, Nigel Williams wrote: > Are their opinions on how stable multiple filesystems per single Ceph > cluster is in practice? we're using a single cephfs in production since february, and switched to three cephfs in september - without any problem so far (running 12.2.1). workload is backend for smb, hpc number crunching, and running generic linux containers on it. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS - Mounting a second Ceph file system
On 11/28/17 15:09, Geoffrey Rhodes wrote: > I'd like to run more than one Ceph file system in the same cluster. > Can anybody point me in the right direction to explain how to mount the > second file system? if you use the kernel client, you can use the mds_namespace option, i.e.: mount -t ceph $monitor_address:/ -o mds_namespace=$fsname \ /mnt/$you_mountpoint Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to debug (in order to repair) damaged MDS (rank)?
On 10/10/2017 02:10 PM, John Spray wrote: > Yes. worked, rank 6 is back and cephfs up again. thank you very much. > Do a final ls to make sure you got all of them -- it is > dangerous to leave any fragments behind. will do. > BTW opened http://tracker.ceph.com/issues/21749 for the underlying bug. thanks; I've saved all the logs, so I'm happy to provide anything you need. Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to debug (in order to repair) damaged MDS (rank)?
Hi John, thank you very much for your help. On 10/10/2017 12:57 PM, John Spray wrote: > A) Do a "rados -p ls | grep "^506\." or similar, to > get a list of the objects done, gives me these: 506. 506.0017 506.001b 506.0019 506.001a 506.001c 506.0018 506.0016 506.001e 506.001f 506.001d > B) Write a short bash loop to do a "rados -p get" on > each of those objects into a file. done, saved them as the object name as filename, resulting in these 11 files: 90 Oct 10 13:17 506. 4.0M Oct 10 13:17 506.0016 4.0M Oct 10 13:17 506.0017 4.0M Oct 10 13:17 506.0018 4.0M Oct 10 13:17 506.0019 4.0M Oct 10 13:17 506.001a 4.0M Oct 10 13:17 506.001b 4.0M Oct 10 13:17 506.001c 4.0M Oct 10 13:17 506.001d 4.0M Oct 10 13:17 506.001e 4.0M Oct 10 13:17 506.001f > C) Stop the MDS, set "debug mds = 20" and "debug journaler = 20", > mark the rank repaired, start the MDS again, and then gather the > resulting log (it should end in the same "Error -22 recovering > write_pos", but have much much more detail about what came before). I've attached the entire log from right before issueing "repaired" until after the mds drops to standby again. > Because you've hit a serious bug, it's really important to gather all > this and share it, so that we can try to fix it and prevent it > happening again to you or others. absolutely, sure. If you need anything more, I'm happy to share. > You have two options, depending on how much downtime you can tolerate: > - carefully remove all the metadata objects that start with 506. -- given the outtage (and people need access to their data), I'd go with this. Just to be safe: that would go like this? rados -p rm 506. rados -p rm 506.0016 [...] Regards, Daniel 2017-10-10 13:21:55.413752 7f3f3011a700 5 mds.mds9 handle_mds_map epoch 96224 from mon.0 2017-10-10 13:21:55.413836 7f3f3011a700 10 mds.mds9 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2} 2017-10-10 13:21:55.413847 7f3f3011a700 10 mds.mds9 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2} 2017-10-10 13:21:55.413852 7f3f3011a700 10 mds.mds9 map says I am 147.87.226.189:6800/1634944095 mds.6.96224 state up:replay 2017-10-10 13:21:55.414088 7f3f3011a700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-10 13:21:55.414141 7f3f3011a700 10 mds.mds9 handle_mds_map: initializing MDS rank 6 2017-10-10 13:21:55.414410 7f3f3011a700 10 mds.6.0 update_log_config log_to_monitors {default=true} 2017-10-10 13:21:55.414415 7f3f3011a700 10 mds.6.0 create_logger 2017-10-10 13:21:55.414635 7f3f3011a700 7 mds.6.server operator(): full = 0 epoch = 0 2017-10-10 13:21:55.414644 7f3f3011a700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-10 13:21:55.414648 7f3f3011a700 4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries 2017-10-10 13:21:55.414660 7f3f3011a700 10 mds.6.server apply_blacklist: killed 0 2017-10-10 13:21:55.414830 7f3f3011a700 10 mds.mds9 handle_mds_map: handling map as rank 6 2017-10-10 13:21:55.414839 7f3f3011a700 1 mds.6.96224 handle_mds_map i am now mds.6.96224 2017-10-10 13:21:55.414843 7f3f3011a700 1 mds.6.96224 handle_mds_map state change up:boot --> up:replay 2017-10-10 13:21:55.414855 7f3f3011a700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay 2017-10-10 13:21:55.414859 7f3f3011a700 1 mds.6.96224 replay_start 2017-10-10 13:21:55.414873 7f3f3011a700 7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8 2017-10-10 13:21:55.414883 7f3f3011a700 1 mds.6.96224 recovery set is 0,1,2,3,4,5,7,8 2017-10-10 13:21:55.414893 7f3f3011a700 1 mds.6.96224 waiting for osdmap 18607 (which blacklists prior instance) 2017-10-10 13:21:55.414901 7f3f3011a700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-10 13:21:55.416011 7f3f3011a700 7 mds.6.server operator(): full = 0 epoch = 18608 2017-10-10 13:21:55.416024 7f3f3011a700 4 mds.6.96224 handle_osd_map epoch 18608, 0 new blacklist entries 2017-10-10 13:21:55.416027 7f3f3011a700 10 mds.6.server apply_blacklist: killed 0 2017-10-10 13:21:55.416076 7f3f2a10e700 10 MDSIOContextBase::complete: 12C_IO_Wrapper 2017-10-10 13:21:55.416095 7f3f2a10e700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart 2017-10-10 13:21:55.416101 7f3f2a10e700 2 mds.6.96224 boot_start 0: opening inotable 2017-10-10 13:21:55.416120 7f3f2a10e700 10 mds.6.inotable: load 2017-10-10 13:21:55.416301 7f3f2a10e700 2 mds.6.96224 boot_start 0: opening sessionmap 2017-10-10 13:21:55.416310 7f3f2a10
[ceph-users] how to debug (in order to repair) damaged MDS (rank)?
Hi all, unfortunatly I'm still struggling bringing cephfs back up after one of the MDS has been marked "damaged" (see messages from monday). 1. When I mark the rank as "repaired", this is what I get in the monitor log (leaving unrelated leveldb compacting chatter aside): 2017-10-10 10:51:23.177865 7f3290710700 0 log_channel(audit) log [INF] : from='client.? 147.87.226.72:0/1658479115' entity='client.admin' cmd ='[{"prefix": "mds repaired", "rank": "6"}]': finished 2017-10-10 10:51:23.177993 7f3290710700 0 log_channel(cluster) log [DBG] : fsmap cephfs-9/9/9 up {0=mds1=up:resolve,1=mds2=up:resolve,2=mds3 =up:resolve,3=mds4=up:resolve,4=mds5=up:resolve,5=mds6=up:resolve,6=mds9=up:replay,7=mds7=up:resolve,8=mds8=up:resolve} [...] 2017-10-10 10:51:23.492040 7f328ab1c700 1 mon.mon1@0(leader).mds e96186 mds mds.? 147.87.226.189:6800/524543767 can't write to fsmap compat= {},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versi oned encoding,6=dirfrag is stored in omap,8=file layout v2} [...] 2017-10-10 10:51:24.291827 7f328d321700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 mds daemon damaged (MDS_DAMAGE) 2. ...and this is what I get on the mds: 2017-10-10 11:21:26.537204 7fcb01702700 -1 mds.6.journaler.pq(ro) _decode error from assimilate_prefetch 2017-10-10 11:21:26.537223 7fcb01702700 -1 mds.6.purge_queue _recover: Error -22 recovering write_pos (see attachment for the full mds log during the "repair" action) I'm really stuck here and would greatly appreciate any help. How can I see what is actually going on/the problem? Running ceph-mon/ceph-mds with debug levels logs just "damaged" as quoted above, but doesn't tell what is wrong or why it's failing. would going back to single MDS with "ceph fs reset" allow me to access the data again? Regards, Daniel 2017-10-10 11:21:26.419394 7fcb0670c700 10 mds.mds9 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=de fault file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2} 2017-10-10 11:21:26.419399 7fcb0670c700 10 mds.mds9 map says I am 147.87.226.189:6800/1182896077 mds.6.96195 state up:replay 2017-10-10 11:21:26.419623 7fcb0670c700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-10 11:21:26.419679 7fcb0670c700 10 mds.mds9 handle_mds_map: initializing MDS rank 6 2017-10-10 11:21:26.419916 7fcb0670c700 10 mds.6.0 update_log_config log_to_monitors {default=true} 2017-10-10 11:21:26.419920 7fcb0670c700 10 mds.6.0 create_logger 2017-10-10 11:21:26.420138 7fcb0670c700 7 mds.6.server operator(): full = 0 epoch = 0 2017-10-10 11:21:26.420146 7fcb0670c700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-10 11:21:26.420150 7fcb0670c700 4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries 2017-10-10 11:21:26.420159 7fcb0670c700 10 mds.6.server apply_blacklist: killed 0 2017-10-10 11:21:26.420338 7fcb0670c700 10 mds.mds9 handle_mds_map: handling map as rank 6 2017-10-10 11:21:26.420347 7fcb0670c700 1 mds.6.96195 handle_mds_map i am now mds.6.96195 2017-10-10 11:21:26.420351 7fcb0670c700 1 mds.6.96195 handle_mds_map state change up:boot --> up:replay 2017-10-10 11:21:26.420366 7fcb0670c700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay 2017-10-10 11:21:26.420370 7fcb0670c700 1 mds.6.96195 replay_start 2017-10-10 11:21:26.420375 7fcb0670c700 7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8 2017-10-10 11:21:26.420380 7fcb0670c700 1 mds.6.96195 recovery set is 0,1,2,3,4,5,7,8 2017-10-10 11:21:26.420395 7fcb0670c700 1 mds.6.96195 waiting for osdmap 18593 (which blacklists prior instance) 2017-10-10 11:21:26.420401 7fcb0670c700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-10 11:21:26.421206 7fcb0670c700 7 mds.6.server operator(): full = 0 epoch = 18593 2017-10-10 11:21:26.421217 7fcb0670c700 4 mds.6.96195 handle_osd_map epoch 18593, 0 new blacklist entries 2017-10-10 11:21:26.421220 7fcb0670c700 10 mds.6.server apply_blacklist: killed 0 2017-10-10 11:21:26.421253 7fcb00700700 10 MDSIOContextBase::complete: 12C_IO_Wrapper 2017-10-10 11:21:26.421263 7fcb00700700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart 2017-10-10 11:21:26.421267 7fcb00700700 2 mds.6.96195 boot_start 0: opening inotable 2017-10-10 11:21:26.421285 7fcb00700700 10 mds.6.inotable: load 2017-10-10 11:21:26.421441 7fcb00700700 2 mds.6.96195 boot_start 0: opening sessionmap 2017-10-10 11:21:26.421449 7fcb00700700 10 mds.6.sessionmap load 2017-10-10 11:21:26.421551 7fcb00700700 2 mds.6.96195 boot_start 0: opening mds log 2017-10-10 11:21:26.421558 7fcb00700700 5 mds.6.log open discovering log bounds 2017-10-10 11:21:26.421720 7fcaff6fe700 10 mds.6.log _submit_thread start 2017-10-10 11:21:26.423002 7fcb00700700 10 MDSIOContextBase::complete: N12_GLOBAL__N_112C_IO_SM_LoadE 201
Re: [ceph-users] cephfs: how to repair damaged mds rank?
Hi John, On 10/09/2017 10:47 AM, John Spray wrote: > When a rank is "damaged", that means the MDS rank is blocked from > starting because Ceph thinks the on-disk metadata is damaged -- no > amount of restarting things will help. thanks. > The place to start with the investigation is to find the source of the > damage. Look in your monitor log for "marking rank 6 damaged" I found this in the mon log: 2017-10-09 03:24:28.207424 7f3290710700 0 log_channel(cluster) log [DBG] : mds.6 147.87.226.187:6800/1120166215 down:damaged so at the time it was marked damaged, rank 6 was running on mds7. > and then look in your MDS logs at that timestamp (find the MDS that held > rank 6 at the time). looking at mds7 log for that timespan, I think I understand that: * at "early" 03:24, mds7 was serving rank 5 and crashed, restarted automatically twice, and then picked up rank 6 at 03:24:21. * at 03:24:21, mds7 got rank 6 and got into 'standby'-mode(?): 2017-10-09 03:24:21.598446 7f70ca01c240 0 set uid:gid to 64045:64045 (ceph:ceph) 2017-10-09 03:24:21.598469 7f70ca01c240 0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 1337 2017-10-09 03:24:21.601958 7f70ca01c240 0 pidfile_write: ignore empty --pid-file 2017-10-09 03:24:26.108545 7f70c2580700 1 mds.mds7 handle_mds_map standby 2017-10-09 03:24:26.115469 7f70c2580700 1 mds.6.95474 handle_mds_map i am now mds.6.95474 2017-10-09 03:24:26.115479 7f70c2580700 1 mds.6.95474 handle_mds_map state change up:boot --> up:replay 2017-10-09 03:24:26.115493 7f70c2580700 1 mds.6.95474 replay_start 2017-10-09 03:24:26.115502 7f70c2580700 1 mds.6.95474 recovery set is 0,1,2,3,4,5,7,8 2017-10-09 03:24:26.115511 7f70c2580700 1 mds.6.95474 waiting for osdmap 18284 (which blacklists prior instance) 2017-10-09 03:24:26.536629 7f70bc574700 0 mds.6.cache creating system inode with ino:0x106 2017-10-09 03:24:26.537009 7f70bc574700 0 mds.6.cache creating system inode with ino:0x1 2017-10-09 03:24:27.233759 7f70bd576700 -1 mds.6.journaler.pq(ro) _decode error from assimilate_prefetch 2017-10-09 03:24:27.233780 7f70bd576700 -1 mds.6.purge_queue _recover: Error -22 recovering write_pos 2017-10-09 03:24:27.238820 7f70bd576700 1 mds.mds7 respawn 2017-10-09 03:24:27.238828 7f70bd576700 1 mds.mds7 e: '/usr/bin/ceph-mds' 2017-10-09 03:24:27.238831 7f70bd576700 1 mds.mds7 0: '/usr/bin/ceph-mds' 2017-10-09 03:24:27.238833 7f70bd576700 1 mds.mds7 1: '-f' 2017-10-09 03:24:27.238835 7f70bd576700 1 mds.mds7 2: '--cluster' 2017-10-09 03:24:27.238836 7f70bd576700 1 mds.mds7 3: 'ceph' 2017-10-09 03:24:27.238838 7f70bd576700 1 mds.mds7 4: '--id' 2017-10-09 03:24:27.238839 7f70bd576700 1 mds.mds7 5: 'mds7' 2017-10-09 03:24:27.239567 7f70bd576700 1 mds.mds7 6: '--setuser' 2017-10-09 03:24:27.239579 7f70bd576700 1 mds.mds7 7: 'ceph' 2017-10-09 03:24:27.239580 7f70bd576700 1 mds.mds7 8: '--setgroup' 2017-10-09 03:24:27.239581 7f70bd576700 1 mds.mds7 9: 'ceph' 2017-10-09 03:24:27.239612 7f70bd576700 1 mds.mds7 respawning with exe /usr/bin/ceph-mds 2017-10-09 03:24:27.239614 7f70bd576700 1 mds.mds7 exe_path /proc/self/exe 2017-10-09 03:24:27.268448 7f9c7eafa240 0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 1337 2017-10-09 03:24:27.271987 7f9c7eafa240 0 pidfile_write: ignore empty --pid-file 2017-10-09 03:24:31.325891 7f9c7789c700 1 mds.mds7 handle_mds_map standby 2017-10-09 03:24:31.332376 7f9c7789c700 1 mds.1.0 handle_mds_map i am now mds.28178286.0 replaying mds.1.0 2017-10-09 03:24:31.332388 7f9c7789c700 1 mds.1.0 handle_mds_map state change up:boot --> up:standby-replay 2017-10-09 03:24:31.332401 7f9c7789c700 1 mds.1.0 replay_start 2017-10-09 03:24:31.332410 7f9c7789c700 1 mds.1.0 recovery set is 0,2,3,4,5,6,7,8 2017-10-09 03:24:31.332425 7f9c7789c700 1 mds.1.0 waiting for osdmap 18285 (which blacklists prior instance) 2017-10-09 03:24:31.351850 7f9c7108f700 0 mds.1.cache creating system inode with ino:0x101 2017-10-09 03:24:31.352204 7f9c7108f700 0 mds.1.cache creating system inode with ino:0x1 2017-10-09 03:24:32.144505 7f9c7008d700 0 mds.1.cache creating system inode with ino:0x100 2017-10-09 03:24:32.144671 7f9c7008d700 1 mds.1.0 replay_done (as standby) 2017-10-09 03:24:33.150117 7f9c71890700 1 mds.1.0 replay_done (as standby) for about two hours, then, the last line repeats unchanged for every following second. where can I go with this? anything I can do further? also, just in case: it seems that at the time of the crash a large (= a lot, lot of small files) 'rm -rf' was running (all clients use kernel 4.13.4 to mount the cephfs, not fuse). Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs: how to repair damaged mds rank?
On 10/09/2017 09:17 AM, Daniel Baumann wrote: > The relevant portion from the ceph-mds log (when starting mds9 which > should then take up rank 6; I'm happy to provide any logs): i've turned up the logging (see attachment).. could it be that we hit this bug here? http://tracker.ceph.com/issues/17670 Regards, Daniel 2017-10-09 10:07:14.677308 7f7972bd6700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 6 rtt 0.000642 2017-10-09 10:07:15.547453 7f7972bd6700 5 mds.mds9 handle_mds_map epoch 96022 from mon.0 2017-10-09 10:07:15.547526 7f7972bd6700 10 mds.mds9 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in om ap,7=mds uses inline data,8=file layout v2} 2017-10-09 10:07:15.547546 7f7972bd6700 10 mds.mds9 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in om ap,8=file layout v2} 2017-10-09 10:07:15.547555 7f7972bd6700 10 mds.mds9 map says I am 147.87.226.189:6800/6621615 mds.6.96022 state up:replay 2017-10-09 10:07:15.547825 7f7972bd6700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-09 10:07:15.547882 7f7972bd6700 10 mds.mds9 handle_mds_map: initializing MDS rank 6 2017-10-09 10:07:15.548165 7f7972bd6700 10 mds.6.0 update_log_config log_to_monitors {default=true} 2017-10-09 10:07:15.548171 7f7972bd6700 10 mds.6.0 create_logger 2017-10-09 10:07:15.548410 7f7972bd6700 7 mds.6.server operator(): full = 0 epoch = 0 2017-10-09 10:07:15.548423 7f7972bd6700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-09 10:07:15.548427 7f7972bd6700 4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries 2017-10-09 10:07:15.548439 7f7972bd6700 10 mds.6.server apply_blacklist: killed 0 2017-10-09 10:07:15.548634 7f7972bd6700 10 mds.mds9 handle_mds_map: handling map as rank 6 2017-10-09 10:07:15.548647 7f7972bd6700 1 mds.6.96022 handle_mds_map i am now mds.6.96022 2017-10-09 10:07:15.548650 7f7972bd6700 1 mds.6.96022 handle_mds_map state change up:boot --> up:replay 2017-10-09 10:07:15.548668 7f7972bd6700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay 2017-10-09 10:07:15.548687 7f7972bd6700 1 mds.6.96022 replay_start 2017-10-09 10:07:15.548699 7f7972bd6700 7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8 2017-10-09 10:07:15.548706 7f7972bd6700 1 mds.6.96022 recovery set is 0,1,2,3,4,5,7,8 2017-10-09 10:07:15.548720 7f7972bd6700 1 mds.6.96022 waiting for osdmap 18484 (which blacklists prior instance) 2017-10-09 10:07:15.548737 7f7972bd6700 4 mds.6.purge_queue operator(): data pool 7 not found in OSDMap 2017-10-09 10:07:15.549521 7f7972bd6700 7 mds.6.server operator(): full = 0 epoch = 18492 2017-10-09 10:07:15.549534 7f7972bd6700 4 mds.6.96022 handle_osd_map epoch 18492, 0 new blacklist entries 2017-10-09 10:07:15.549537 7f7972bd6700 10 mds.6.server apply_blacklist: killed 0 2017-10-09 10:07:15.549582 7f796cbca700 10 MDSIOContextBase::complete: 12C_IO_Wrapper 2017-10-09 10:07:15.549679 7f796cbca700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart 2017-10-09 10:07:15.549685 7f796cbca700 2 mds.6.96022 boot_start 0: opening inotable 2017-10-09 10:07:15.549695 7f796cbca700 10 mds.6.inotable: load 2017-10-09 10:07:15.549880 7f796cbca700 2 mds.6.96022 boot_start 0: opening sessionmap 2017-10-09 10:07:15.549888 7f796cbca700 10 mds.6.sessionmap load 2017-10-09 10:07:15.549977 7f796cbca700 2 mds.6.96022 boot_start 0: opening mds log 2017-10-09 10:07:15.549984 7f796cbca700 5 mds.6.log open discovering log bounds 2017-10-09 10:07:15.550113 7f796c3c9700 4 mds.6.journalpointer Reading journal pointer '406.' 2017-10-09 10:07:15.550132 7f796bbc8700 10 mds.6.log _submit_thread start 2017-10-09 10:07:15.551165 7f796cbca700 10 MDSIOContextBase::complete: 12C_IO_MT_Load 2017-10-09 10:07:15.551178 7f796cbca700 10 mds.6.inotable: load_2 got 34 bytes 2017-10-09 10:07:15.551184 7f796cbca700 10 mds.6.inotable: load_2 loaded v1 2017-10-09 10:07:15.565382 7f796cbca700 10 MDSIOContextBase::complete: N12_GLOBAL__N_112C_IO_SM_LoadE 2017-10-09 10:07:15.565397 7f796cbca700 10 mds.6.sessionmap _load_finish loaded version 0 2017-10-09 10:07:15.565401 7f796cbca700 10 mds.6.sessionmap _load_finish: omap load complete 2017-10-09 10:07:15.565403 7f796cbca700 10 mds.6.sessionmap _load_finish: v 0, 0 sessions 2017-10-09 10:07:15.565408 7f796cbca700 10 mds.6.sessionmap dump 2017-10-09 10:07:15.583721 7f796c3c9700 1 mds.6.journaler.mdlog(ro) recover start 2017-10-09 10:07:15.583732 7f796c3c9700 1 mds.6.journaler.mdlog(ro) read_head 2017-10-09 10:07:15.583854 7f796c3c9700 4 mds.6.log Waiting for journal 0x206 to recover... 2017-10-09 10:07:15.796523 7f796cbca700 1 mds.6.journaler.mdlog(ro) _finish_read_head loghead(trim 25992101888, expir
[ceph-users] cephfs: how to repair damaged mds rank?
Hi all, we have a Ceph Cluster (12.2.1) with 9 MDS ranks in multi-mds mode. "out of the blue", rank 6 is marked as damaged (and all other MDS are in state up:resolve) and I can't bring the FS up again. 'ceph -s' says: [...] 1 filesystem is degraded 1 mds daemon damaged mds: cephfs-8/9/9 up {0=mds1=up:resolve,1=mds2=up:resolve,2=mds3=up:resolve,3=mds4=up:resolve,4=mds5=up:resolve,5=mds6=up:resolve,7=mds7= up:resolve,8=mds8=up:resolve}, 1 up:standby, 1 damaged [...] 'ceph fs get cephfs' says: [...] max_mds 9 in 0,1,2,3,4,5,6,7,8 up {0=28309098,1=28309128,2=28309149,3=28309188,4=28309209,5=28317918,7=28311732,8=28312272} failed damaged 6 stopped [...] 28309098: 147.87.226.60:6800/2627352929 'mds1' mds.0.95936 up:resolve seq 3 28309128: 147.87.226.61:6800/416822271 'mds2' mds.1.95939 up:resolve seq 3 28309149: 147.87.226.62:6800/1969015920 'mds3' mds.2.95942 up:resolve seq 3 28309188: 147.87.226.184:6800/4074580566 'mds4' mds.3.95945 up:resolve seq 3 28309209: 147.87.226.185:6800/805082194 'mds5' mds.4.95948 up:resolve seq 3 28317918: 147.87.226.186:6800/1913199036 'mds6' mds.5.95984 up:resolve seq 3 28311732: 147.87.226.187:6800/4117561729 'mds7' mds.7.95957 up:resolve seq 3 28312272: 147.87.226.188:6800/2936268159 'mds8' mds.8.95960 up:resolve seq 3 I think I've tried almost anything already without success :(, including: * stopping all MDS, and bringing them up one after one (works nice for the first ones up to rank 5, then the next one just grabs rank 7 and no MDS after that wants to take rank 6) * stopped all MDS, flushed MDS journal, manually marked rank 6 as repaired, started all MDS again. * tried to switch back to only one MDS (stopping all MDS, setting max_mds=1, disallowing multi-mds, disallowing dirfrag, removing "mds_bal_frag=true" from ceph.conf, then starting the first mds), didn't work.. the one single MDS stayed in up:resolve forever. * during all of the above, all CephFS clients have been unmounted, so there's no access/stale access to the FS * did find a few things in the mailinglist archive, but seems there's nothing conclusive on how to get it back online ("formating" the FS is not possible). I didn't dare trying 'ceph mds rmfailed 6' in fear of dataloss. How can I get it back online? The relevant portion from the ceph-mds log (when starting mds9 which should then take up rank 6; I'm happy to provide any logs): ---snip--- 2017-10-09 08:55:56.418237 7f1ec6ef3240 0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 421 2017-10-09 08:55:56.421672 7f1ec6ef3240 0 pidfile_write: ignore empty --pid-file 2017-10-09 08:56:00.990530 7f1ebf457700 1 mds.mds9 handle_mds_map standby 2017-10-09 08:56:00.997044 7f1ebf457700 1 mds.6.95988 handle_mds_map i am now mds.6.95988 2017-10-09 08:56:00.997053 7f1ebf457700 1 mds.6.95988 handle_mds_map state change up:boot --> up:replay 2017-10-09 08:56:00.997068 7f1ebf457700 1 mds.6.95988 replay_start 2017-10-09 08:56:00.997076 7f1ebf457700 1 mds.6.95988 recovery set is 0,1,2,3,4,5,7,8 2017-10-09 08:56:01.003203 7f1eb8c4a700 0 mds.6.cache creating system inode with ino:0x106 2017-10-09 08:56:01.003592 7f1eb8c4a700 0 mds.6.cache creating system inode with ino:0x1 2017-10-09 08:56:01.016403 7f1eba44d700 -1 mds.6.journaler.pq(ro) _decode error from assimilate_prefetch 2017-10-09 08:56:01.016425 7f1eba44d700 -1 mds.6.purge_queue _recover: Error -22 recovering write_pos 2017-10-09 08:56:01.019746 7f1eba44d700 1 mds.mds9 respawn 2017-10-09 08:56:01.019762 7f1eba44d700 1 mds.mds9 e: '/usr/bin/ceph-mds' 2017-10-09 08:56:01.019765 7f1eba44d700 1 mds.mds9 0: '/usr/bin/ceph-mds' 2017-10-09 08:56:01.019767 7f1eba44d700 1 mds.mds9 1: '-f' 2017-10-09 08:56:01.019769 7f1eba44d700 1 mds.mds9 2: '--cluster' 2017-10-09 08:56:01.019771 7f1eba44d700 1 mds.mds9 3: 'ceph' 2017-10-09 08:56:01.019772 7f1eba44d700 1 mds.mds9 4: '--id' 2017-10-09 08:56:01.019773 7f1eba44d700 1 mds.mds9 5: 'mds9' 2017-10-09 08:56:01.019774 7f1eba44d700 1 mds.mds9 6: '--setuser' 2017-10-09 08:56:01.019775 7f1eba44d700 1 mds.mds9 7: 'ceph' 2017-10-09 08:56:01.019776 7f1eba44d700 1 mds.mds9 8: '--setgroup' 2017-10-09 08:56:01.019778 7f1eba44d700 1 mds.mds9 9: 'ceph' 2017-10-09 08:56:01.019811 7f1eba44d700 1 mds.mds9 respawning with exe /usr/bin/ceph-mds 2017-10-09 08:56:01.019814 7f1eba44d700 1 mds.mds9 exe_path /proc/self/exe 2017-10-09 08:56:01.046396 7f5ed6090240 0 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process (unknown), pid 421 2017-10-09 08:56:01.049516 7f5ed6090240 0 pidfile_write: ignore empty --pid-file 2017-10-09 08:56:05.162732 7f5ecee32700 1 mds.mds9 handle_mds_map standby [...] ---snap--- Regards, Daniel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.c