On Mon, Feb 18, 2019 at 10:55 PM Hennen, Christian <christian.hen...@uni-trier.de> wrote: > > Dear Community, > > > > we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During > setup, we made the mistake of configuring the OSDs on RAID Volumes. Initially > our cluster consisted of 3 nodes, each housing 1 OSD. Currently, we are in > the process of remediating this. After a loss of metadata > (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025612.html) > due to resetting the journal (journal entries were not being flushed fast > enough), we managed to bring the cluster back up and started adding 2 > additional nodes > (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027563.html) . > > > > After adding the two additional nodes, we increased the number of placement > groups to not only accomodate the new nodes, but also to prepare for > reinstallation of the misconfigured nodes. Since then, the number of > placement groups per OSD is too high of course. Despite this fact, cluster > health remained fine over the last few months. > > > > However, we are currently observing massive problems: Whenever we try to > access any folder via CephFS, e.g. by listing its contents, there is no > response. Clients are getting blacklisted, but there is no warning. ceph -s > shows everything is ok, except for the number of PGs being too high. If I > grep for „assert“ or „error“ in any of the logs, nothing comes up. Also, it > is not possible to reduce the number of active MDS to 1. After issuing ‚ceph > fs set fs_data max_mds 1‘ nothing happens. > > > > Cluster details are available here: https://gitlab.uni-trier.de/snippets/77 > > > > The MDS log > (https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple) > contains no „nicely exporting to“ messages as usual, but instead these: > > 2019-02-15 08:44:52.464926 7fdb13474700 7 mds.0.server > try_open_auth_dirfrag: not auth for [dir 0x100011ce7c6 /home/r-admin/ > [2,head] rep@1.1 dir_auth=1 state=0 f(v4 m2019-02-14 13:19:41.300993 > 80=48+32) n(v11339 rc2019-02-14 13:19:41.300993 b10116465260 10869=10202+667) > hs=7+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1 replicated=0 dirty=0 > waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw to mds.1 > >
MDS show a client got evicted. Nothing else looks abnormal. Do new cephfs clients also get evicted quickly? > > Updates from 12.2.8 to 12.2.11 I ran last week didn’t help. > > > > Anybody got an idea or a hint where I could look into next? Any help would be > greatly appreciated! > > > > Kind regards > > Christian Hennen > > > > Project Manager Infrastructural Services > ZIMK University of Trier > > Germany > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com