Re: [ceph-users] CephFS: client hangs

2019-02-21 Thread Hennen, Christian
Of coure, you’re right. After using the right name, the connection worked :) I 
tried to connect via a newer kernel client (under Ubuntu 16.04) and it worked 
as well. So the issue clearly seems to be related to our client kernel version.

 

Thank you all very much for your time and help! 

 

 

Von: David Turner  
Gesendet: Dienstag, 19. Februar 2019 19:32
An: Hennen, Christian 
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] CephFS: client hangs

 

You're attempting to use mismatching client name and keyring.  You want to use 
matching name and keyring.  For your example, you would want to either use 
`--keyring /etc/ceph/ceph.client.admin.keyring --name client.admin` or 
`--keyring /etc/ceph/ceph.client.cephfs.keyring --name client.cephfs`.  Mixing 
and matching does not work.  Treat them like username and password.  You 
wouldn't try to log into your computer under your account with the admin 
password.

 

On Tue, Feb 19, 2019 at 12:58 PM Hennen, Christian 
mailto:christian.hen...@uni-trier.de> > wrote:

> sounds like network issue. are there firewall/NAT between nodes?
No, there is currently no firewall in place. Nodes and clients are on the same 
network. MTUs match, ports are opened according to nmap.

> try running ceph-fuse on the node that run mds, check if it works properly.
When I try to run ceph-fuse on either a client or cephfiler1 (MON,MGR,MDS,OSDs) 
I get
- "operation not permitted" when using the client keyring
- "invalid argument" when using the admin keyring
- "ms_handle_refused" when using the admin keyring and connecting to 
127.0.0.1:6789 <http://127.0.0.1:6789> 

ceph-fuse --keyring /etc/ceph/ceph.client.admin.keyring --name client.cephfs -m 
192.168.1.17:6789 <http://192.168.1.17:6789>  /mnt/cephfs

-Ursprüngliche Nachricht-
Von: Yan, Zheng mailto:uker...@gmail.com> > 
Gesendet: Dienstag, 19. Februar 2019 11:31
An: Hennen, Christian mailto:christian.hen...@uni-trier.de> >
Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
Betreff: Re: [ceph-users] CephFS: client hangs

On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian 
mailto:christian.hen...@uni-trier.de> > wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the 
> >config into the running admin
>
> I restarted each server (MONs and OSDs weren’t enough) and now the health 
> warning is gone. Still no luck accessing CephFS though.
>
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new 
> > cephfs clients also get evicted quickly?
>
> Aside from the fact that evicted clients don’t show up in ceph –s, we observe 
> other strange things:
>
> ·   Setting max_mds has no effect
>
> ·   Ceph osd blacklist ls sometimes lists cluster nodes
>

sounds like network issue. are there firewall/NAT between nodes?

> The only client that is currently running is ‚master1‘. It also hosts a MON 
> and a MGR. Its syslog (https://gitlab.uni-trier.de/snippets/78) shows 
> messages like:
>
> Feb 13 06:40:33 master1 kernel: [56165.943008] libceph: wrong peer, 
> want 192.168.1.17:6800/-2045158358 <http://192.168.1.17:6800/-2045158358> , 
> got 192.168.1.17:6800/1699349984 <http://192.168.1.17:6800/1699349984> 
>
> Feb 13 06:40:33 master1 kernel: [56165.943014] libceph: mds1 
> 192.168.1.17:6800 <http://192.168.1.17:6800>  wrong peer at address
>
> The other day I did the update from 12.2.8 to 12.2.11, which can also be seen 
> in the logs. Again, there appeared these messages. I assume that’s normal 
> operations since ports can change and daemons have to find each other again? 
> But what about Feb 13 in the morning? I didn’t do any restarts then.
>
> Also, clients are printing messages like the following on the console:
>
> [1026589.751040] ceph: handle_cap_import: mismatched seq/mseq: ino 
> (1994988.fffe) mds0 seq1 mseq 15 importer mds1 has 
> peer seq 2 mseq 15
>
> [1352658.876507] ceph: build_path did not end path lookup where 
> expected, namelen is 23, pos is 0
>
> Oh, and btw, the ceph nodes are running on Ubuntu 16.04, clients are on 14.04 
> with kernel 4.4.0-133.
>

try running ceph-fuse on the node that run mds, check if it works properly.


> For reference:
>
> > Cluster details: https://gitlab.uni-trier.de/snippets/77
>
> > MDS log: 
> > https://gitlab.uni-trier.de/snippets/79?expanded=true 
> > <https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple> 
> > &viewer=simple)
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier 
> Germany
>
> Von: Ashley Merrick  <mailt

Re: [ceph-users] CephFS: client hangs

2019-02-19 Thread Hennen, Christian
> sounds like network issue. are there firewall/NAT between nodes?
No, there is currently no firewall in place. Nodes and clients are on the same 
network. MTUs match, ports are opened according to nmap.

> try running ceph-fuse on the node that run mds, check if it works properly.
When I try to run ceph-fuse on either a client or cephfiler1 (MON,MGR,MDS,OSDs) 
I get
- "operation not permitted" when using the client keyring
- "invalid argument" when using the admin keyring
- "ms_handle_refused" when using the admin keyring and connecting to 
127.0.0.1:6789

ceph-fuse --keyring /etc/ceph/ceph.client.admin.keyring --name client.cephfs -m 
192.168.1.17:6789 /mnt/cephfs

-Ursprüngliche Nachricht-
Von: Yan, Zheng  
Gesendet: Dienstag, 19. Februar 2019 11:31
An: Hennen, Christian 
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] CephFS: client hangs

On Tue, Feb 19, 2019 at 5:10 PM Hennen, Christian 
 wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the 
> >config into the running admin
>
> I restarted each server (MONs and OSDs weren’t enough) and now the health 
> warning is gone. Still no luck accessing CephFS though.
>
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new 
> > cephfs clients also get evicted quickly?
>
> Aside from the fact that evicted clients don’t show up in ceph –s, we observe 
> other strange things:
>
> ·   Setting max_mds has no effect
>
> ·   Ceph osd blacklist ls sometimes lists cluster nodes
>

sounds like network issue. are there firewall/NAT between nodes?

> The only client that is currently running is ‚master1‘. It also hosts a MON 
> and a MGR. Its syslog (https://gitlab.uni-trier.de/snippets/78) shows 
> messages like:
>
> Feb 13 06:40:33 master1 kernel: [56165.943008] libceph: wrong peer, 
> want 192.168.1.17:6800/-2045158358, got 192.168.1.17:6800/1699349984
>
> Feb 13 06:40:33 master1 kernel: [56165.943014] libceph: mds1 
> 192.168.1.17:6800 wrong peer at address
>
> The other day I did the update from 12.2.8 to 12.2.11, which can also be seen 
> in the logs. Again, there appeared these messages. I assume that’s normal 
> operations since ports can change and daemons have to find each other again? 
> But what about Feb 13 in the morning? I didn’t do any restarts then.
>
> Also, clients are printing messages like the following on the console:
>
> [1026589.751040] ceph: handle_cap_import: mismatched seq/mseq: ino 
> (1994988.fffe) mds0 seq1 mseq 15 importer mds1 has 
> peer seq 2 mseq 15
>
> [1352658.876507] ceph: build_path did not end path lookup where 
> expected, namelen is 23, pos is 0
>
> Oh, and btw, the ceph nodes are running on Ubuntu 16.04, clients are on 14.04 
> with kernel 4.4.0-133.
>

try running ceph-fuse on the node that run mds, check if it works properly.


> For reference:
>
> > Cluster details: https://gitlab.uni-trier.de/snippets/77
>
> > MDS log: 
> > https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier 
> Germany
>
> Von: Ashley Merrick 
> Gesendet: Montag, 18. Februar 2019 16:53
> An: Hennen, Christian 
> Cc: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] CephFS: client hangs
>
> Correct yes from my expirence OSD’s aswel.
>
> On Mon, 18 Feb 2019 at 11:51 PM, Hennen, Christian 
>  wrote:
>
> Hi!
>
> >mon_max_pg_per_osd = 400
> >
> >In the ceph.conf and then restart all the services / or inject the 
> >config into the running admin
>
> I restarted all MONs, but I assume the OSDs need to be restarted as well?
>
> > MDS show a client got evicted. Nothing else looks abnormal.  Do new 
> > cephfs clients also get evicted quickly?
>
> Yeah, it seems so. But strangely there is no indication of it in 'ceph 
> -s' or 'ceph health detail'. And they don't seem to be evicted 
> permanently? Right now, only 1 client is connected. The others are shut down 
> since last week.
> 'ceph osd blacklist ls' shows 0 entries.
>
>
> Kind regards
> Christian Hennen
>
> Project Manager Infrastructural Services ZIMK University of Trier 
> Germany
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: client hangs

2019-02-19 Thread Hennen, Christian
Hi!

>mon_max_pg_per_osd = 400
>
>In the ceph.conf and then restart all the services / or inject the config 
>into the running admin

I restarted each server (MONs and OSDs weren’t enough) and now the health 
warning is gone. Still no luck accessing CephFS though.

> MDS show a client got evicted. Nothing else looks abnormal.  Do new cephfs 
> clients also get evicted quickly?

Aside from the fact that evicted clients don’t show up in ceph –s, we observe 
other strange things:
*   Setting max_mds has no effect
*   Ceph osd blacklist ls sometimes lists cluster nodes

The only client that is currently running is ‚master1‘. It also hosts a MON and 
a MGR. Its syslog (https://gitlab.uni-trier.de/snippets/78) shows messages like:
Feb 13 06:40:33 master1 kernel: [56165.943008] libceph: wrong peer, want 
192.168.1.17:6800/-2045158358, got 192.168.1.17:6800/1699349984
Feb 13 06:40:33 master1 kernel: [56165.943014] libceph: mds1 192.168.1.17:6800 
wrong peer at address
The other day I did the update from 12.2.8 to 12.2.11, which can also be seen 
in the logs. Again, there appeared these messages. I assume that’s normal 
operations since ports can change and daemons have to find each other again? 
But what about Feb 13 in the morning? I didn’t do any restarts then.

Also, clients are printing messages like the following on the console:
[1026589.751040] ceph: handle_cap_import: mismatched seq/mseq: ino 
(1994988.fffe) mds0 seq1 mseq 15 importer mds1 has peer seq 2 
mseq 15
[1352658.876507] ceph: build_path did not end path lookup where expected, 
namelen is 23, pos is 0

Oh, and btw, the ceph nodes are running on Ubuntu 16.04, clients are on 14.04 
with kernel 4.4.0-133.

For reference:
> Cluster details: https://gitlab.uni-trier.de/snippets/77 
> MDS log: https://gitlab.uni-trier.de/snippets/79?expanded=true&viewer=simple)

Kind regards
Christian Hennen

Project Manager Infrastructural Services ZIMK University of Trier
Germany

Von: Ashley Merrick  
Gesendet: Montag, 18. Februar 2019 16:53
An: Hennen, Christian 
Cc: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] CephFS: client hangs

Correct yes from my expirence OSD’s aswel.

On Mon, 18 Feb 2019 at 11:51 PM, Hennen, Christian 
mailto:christian.hen...@uni-trier.de> > wrote:
Hi!

>mon_max_pg_per_osd = 400
>
>In the ceph.conf and then restart all the services / or inject the config 
>into the running admin

I restarted all MONs, but I assume the OSDs need to be restarted as well?

> MDS show a client got evicted. Nothing else looks abnormal.  Do new cephfs 
> clients also get evicted quickly?

Yeah, it seems so. But strangely there is no indication of it in 'ceph -s' or 
'ceph health detail'. And they don't seem to be evicted permanently? Right 
now, only 1 client is connected. The others are shut down since last week. 
'ceph osd blacklist ls' shows 0 entries.


Kind regards
Christian Hennen

Project Manager Infrastructural Services ZIMK University of Trier
Germany

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: client hangs

2019-02-18 Thread Hennen, Christian
Hi!

>mon_max_pg_per_osd = 400
>
>In the ceph.conf and then restart all the services / or inject the config 
>into the running admin

I restarted all MONs, but I assume the OSDs need to be restarted as well?

> MDS show a client got evicted. Nothing else looks abnormal.  Do new cephfs 
> clients also get evicted quickly?

Yeah, it seems so. But strangely there is no indication of it in 'ceph -s' or 
'ceph health detail'. And they don't seem to be evicted permanently? Right 
now, only 1 client is connected. The others are shut down since last week. 
'ceph osd blacklist ls' shows 0 entries.


Kind regards
Christian Hennen

Project Manager Infrastructural Services ZIMK University of Trier
Germany



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS: client hangs

2019-02-18 Thread Hennen, Christian
Dear Community,

 

we are running a Ceph Luminous Cluster with CephFS (Bluestore OSDs). During
setup, we made the mistake of configuring the OSDs on RAID Volumes.
Initially our cluster consisted of 3 nodes, each housing 1 OSD. Currently,
we are in the process of remediating this. After a loss of metadata
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-March/025612.html)
due to resetting the journal (journal entries were not being flushed fast
enough), we managed to bring the cluster back up and started adding 2
additional nodes
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027563.html)
.

 

After adding the two additional nodes, we increased the number of placement
groups to not only accomodate the new nodes, but also to prepare for
reinstallation of the misconfigured nodes. Since then, the number of
placement groups per OSD is too high of course. Despite this fact, cluster
health remained fine over the last few months.

 

However, we are currently observing massive problems: Whenever we try to
access any folder via CephFS, e.g. by listing its contents, there is no
response. Clients are getting blacklisted, but there is no warning. ceph -s
shows everything is ok, except for the number of PGs being too high. If I
grep for "assert" or "error" in any of the logs, nothing comes up. Also, it
is not possible to reduce the number of active MDS to 1. After issuing ,ceph
fs set fs_data max_mds 1' nothing happens.

 

Cluster details are available here: https://gitlab.uni-trier.de/snippets/77 

 

The MDS log  (https://gitlab.uni-trier.de/snippets/79?expanded=true

&viewer=simple) contains no "nicely exporting to" messages as usual, but
instead these:

2019-02-15 08:44:52.464926 7fdb13474700  7 mds.0.server
try_open_auth_dirfrag: not auth for [dir 0x100011ce7c6 /home/r-admin/
[2,head] rep@1.1 dir_auth=1 state=0 f(v4 m2019-02-14 13:19:41.300993
80=48+32) n(v11339 rc2019-02-14 13:19:41.300993 b10116465260
10869=10202+667) hs=7+0,ss=0+0 | dnwaiter=0 child=1 frozen=0 subtree=1
replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x564343eed100], fw
to mds.1

 

Updates from 12.2.8 to 12.2.11 I ran last week didn't help.

 

Anybody got an idea or a hint where I could look into next? Any help would
be greatly appreciated!

 

Kind regards

Christian Hennen

 

Project Manager Infrastructural Services
ZIMK University of Trier

Germany



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS reports metadata damage

2018-06-21 Thread Hennen, Christian
Dear Community,

 

here at ZIMK at the University of Trier we operate a Ceph Luminous Cluster
as filer for a HPC environment via CephFS (Bluestore backend). During setup
last year we made the mistake of not configuring the RAID as JBOD, so
initially the 3 nodes only housed 1 OSD each. Currently, we are in the
process of remediating this. After a loss of metadata due to resetting the
journal (journal entries were not being flushed fast enough), we managed to
bring the cluster back up and started adding 2 additional nodes. The
hardware is a little bit older than the first 3 nodes. We configured the
drives on these individually (RAID-0 on each disk since there is no
pass-through mode on the controller) and after some rebalancing and
re-weight, the first of the original nodes is now empty and ready to be
re-installed.

 

However, due to the aforementioned metadata loss, we are currently getting
warnings about metadata damage. 

damage ls shows, that only one folder is affected. As we don’t need this
folder, we’d like to delete it and the associated metadata and other
informations if possible. Taking the cluster offline for a data-scan right
now would be a little bit difficult, so any other suggestions would be
appreciated.

 

Cluster health details are available here:
https://gitlab.uni-trier.de/snippets/65 

 

Regards

Christian Hennen

 

Project Manager Infrastructural Services

Zentrum für Informations-, Medien-

und Kommunikationstechnologie (ZIMK)

Universität Trier

54286 Trier

 

Tel.: +49 651 201 3488

Fax: +49 651 201 3921

E-Mail:  
christian.hen...@uni-trier.de

Web:   http://zimk.uni-trier.de



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com