I know this may sound simple.

Have you tried raising the PG per an OSD limit, I'm sure I have seen in the
past people with the same kind of issue as you and was just I/O being
blocked due to a limit but not actively logged.

mon_max_pg_per_osd = 400

In the ceph.conf and then restart all the services / or inject the config
into the running admin

On Mon, Feb 18, 2019 at 10:55 PM Hennen, Christian <
christian.hen...@uni-trier.de> wrote:

> Dear Community,
>
>
>
> we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs).
> During setup, we made the mistake of configuring the OSDs on RAID Volumes.
> Initially our cluster consisted of 3 nodes, each housing 1 OSD. Currently,
> we are in the process of remediating this. After a loss of metadata (
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025612.html)
> due to resetting the journal (journal entries were not being flushed fast
> enough), we managed to bring the cluster back up and started adding 2
> additional nodes (
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027563.html)
> .
>
>
>
> After adding the two additional nodes, we increased the number of
> placement groups to not only accomodate the new nodes, but also to prepare
> for reinstallation of the misconfigured nodes. Since then, the number of
> placement groups per OSD is too high of course. Despite this fact, cluster
> health remained fine over the last few months.
>
>
>
> However, we are currently observing massive problems: Whenever we try to
> access any folder via CephFS, e.g. by listing its contents, there is no
> response. Clients are getting blacklisted, but there is no warning. ceph -s
> shows everything is ok, except for the number of PGs being too high. If I
> grep for „assert“ or „error“ in any of the logs, nothing comes up. Also, it
> is not possible to reduce the number of active MDS to 1. After issuing
> ‚ceph fs set fs_data max_mds 1‘ nothing happens.
>
>
>
> Cluster details are available here:
> https://gitlab.uni-trier.de/snippets/77
>
>
>
> The MDS log  (
> https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)
> contains no „nicely exporting to“ messages as usual, but instead these:
>
> 2019-02-15 08:44:52.464926 7fdb13474700  7 mds.0.server
> try_open_auth_dirfrag: not auth for [dir 0x100011ce7c6 /home/r-admin/
> [2,head] rep@1.1 dir_auth=1 state=0 f(v4 m2019-02-14 13:19:41.300993
> 80=48+32) n(v11339 rc2019-02-14 13:19:41.300993 b10116465260
> 10869=10202+667) hs=7+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1
> replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw
> to mds.1
>
>
>
> Updates from 12.2.8 to 12.2.11 I ran last week didn’t help.
>
>
>
> Anybody got an idea or a hint where I could look into next? Any help would
> be greatly appreciated!
>
>
>
> Kind regards
>
> Christian Hennen
>
>
>
> Project Manager Infrastructural Services
> ZIMK University of Trier
>
> Germany
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to