On Mon, Feb 18, 2019 at 10:55 PM Hennen, Christian
<christian.hen...@uni-trier.de> wrote:
>
> Dear Community,
>
>
>
> we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During 
> setup, we made the mistake of configuring the OSDs on RAID Volumes. Initially 
> our cluster consisted of 3 nodes, each housing 1 OSD. Currently, we are in 
> the process of remediating this. After a loss of metadata 
> (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025612.html) 
> due to resetting the journal (journal entries were not being flushed fast 
> enough), we managed to bring the cluster back up and started adding 2 
> additional nodes 
> (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027563.html) .
>
>
>
> After adding the two additional nodes, we increased the number of placement 
> groups to not only accomodate the new nodes, but also to prepare for 
> reinstallation of the misconfigured nodes. Since then, the number of 
> placement groups per OSD is too high of course. Despite this fact, cluster 
> health remained fine over the last few months.
>
>
>
> However, we are currently observing massive problems: Whenever we try to 
> access any folder via CephFS, e.g. by listing its contents, there is no 
> response. Clients are getting blacklisted, but there is no warning. ceph -s 
> shows everything is ok, except for the number of PGs being too high. If I 
> grep for „assert“ or „error“ in any of the logs, nothing comes up. Also, it 
> is not possible to reduce the number of active MDS to 1. After issuing ‚ceph 
> fs set fs_data max_mds 1‘ nothing happens.
>
>
>
> Cluster details are available here: https://gitlab.uni-trier.de/snippets/77
>
>
>
> The MDS log  
> (https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple) 
> contains no „nicely exporting to“ messages as usual, but instead these:
>
> 2019-02-15 08:44:52.464926 7fdb13474700  7 mds.0.server 
> try_open_auth_dirfrag: not auth for [dir 0x100011ce7c6 /home/r-admin/ 
> [2,head] rep@1.1 dir_auth=1 state=0 f(v4 m2019-02-14 13:19:41.300993 
> 80=48+32) n(v11339 rc2019-02-14 13:19:41.300993 b10116465260 10869=10202+667) 
> hs=7+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1 replicated=0 dirty=0 
> waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw to mds.1
>
>

MDS show a client got evicted. Nothing else looks abnormal.  Do new
cephfs clients also get evicted quickly?

>
> Updates from 12.2.8 to 12.2.11 I ran last week didn’t help.
>
>
>
> Anybody got an idea or a hint where I could look into next? Any help would be 
> greatly appreciated!
>
>
>
> Kind regards
>
> Christian Hennen
>
>
>
> Project Manager Infrastructural Services
> ZIMK University of Trier
>
> Germany
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to