[ceph-users] Re: Mon crash when client mounts CephFS

2021-06-15 Thread Phil Merricks
Thanks for the replies folks.

This one was resolved, I wish I could tell you I know what I changed to fix
it, but there were several undocumented changes to the deployment script
I'm using whilst I was distracted by something else.. Tearing down and
redeploying today seems to not be suffering from this particular issue.

I do have a new thing though, less concerning.  I'll start a new thread..

On Tue, 8 Jun 2021 at 12:48, Robert W. Eckert  wrote:

> When I had issues with the monitors, it was access on the monitor folder
> under /var/lib/ceph//mon./store.db,
> make sure it is owned by the ceph user.
>
> My issues originated from a hardware issue - the memory needed 1.3 v, but
> the mother board was only reading 1.2 (The memory had the issue, the
> firmware said 1.2v required, the sticker on the side said 1.3).  So I had a
> script that copied the store across and fixed the permissions.
>
> The other thing that helped a lot compared to the crash logs, was to edit
> the unit.run and remove  -rm parameter from the command.  That lets you see
> the podman logs using podman logs   it was  a bit more detailed.
>
> When you do this, you will need to restore that afterwards, and clean up
> the 'cid' and 'pid' files from /run/ceph-@mon..service-cid
> and /run/ceph-@mon..service-pid
>
> My reference is from Redhat enterprise 8, so things may be a bit different
> on ubuntu.
>
> If you get a message about the store.db files being off,  its easiest to
> stop the working node, copy them over , set the user id/group to ceph and
> start things up.
>
> Rob
>
> -Original Message-
> From: Phil Merricks 
> Sent: Tuesday, June 8, 2021 3:18 PM
> To: ceph-users 
> Subject: [ceph-users] Mon crash when client mounts CephFS
>
> Hey folks,
>
> I have deployed a 3 node dev cluster using cephadm.  Deployment went
> smoothly and all seems well.
>
> If I try to mount a CephFS from a client node, 2/3 mons crash however.
> I've begun picking through the logs to see what I can see, but so far
> other than seeing the crash in the log itself, it's unclear what the cause
> of the crash is.
>
> Here's a log. .  You can see where the crash is
> occurring around the line that begins with "Jun 08 18:56:04 okcomputer
> podman[790987]:"
>
> I would welcome any advice on either what the cause may be, or how I can
> advance the analysis of what's wrong.
>
> Best regards
>
> Phil
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon crash when client mounts CephFS

2021-06-08 Thread Robert W. Eckert
When I had issues with the monitors, it was access on the monitor folder under 
/var/lib/ceph//mon./store.db, make sure 
it is owned by the ceph user.

My issues originated from a hardware issue - the memory needed 1.3 v, but the 
mother board was only reading 1.2 (The memory had the issue, the firmware said 
1.2v required, the sticker on the side said 1.3).  So I had a script that 
copied the store across and fixed the permissions.

The other thing that helped a lot compared to the crash logs, was to edit the 
unit.run and remove  -rm parameter from the command.  That lets you see the 
podman logs using podman logs   it was  a bit more detailed.

When you do this, you will need to restore that afterwards, and clean up the 
'cid' and 'pid' files from /run/ceph-@mon..service-cid and 
/run/ceph-@mon..service-pid

My reference is from Redhat enterprise 8, so things may be a bit different on 
ubuntu.

If you get a message about the store.db files being off,  its easiest to stop 
the working node, copy them over , set the user id/group to ceph and start 
things up.  

Rob

-Original Message-
From: Phil Merricks  
Sent: Tuesday, June 8, 2021 3:18 PM
To: ceph-users 
Subject: [ceph-users] Mon crash when client mounts CephFS

Hey folks,

I have deployed a 3 node dev cluster using cephadm.  Deployment went smoothly 
and all seems well.

If I try to mount a CephFS from a client node, 2/3 mons crash however.
I've begun picking through the logs to see what I can see, but so far other 
than seeing the crash in the log itself, it's unclear what the cause of the 
crash is.

Here's a log. .  You can see where the crash is 
occurring around the line that begins with "Jun 08 18:56:04 okcomputer 
podman[790987]:"

I would welcome any advice on either what the cause may be, or how I can 
advance the analysis of what's wrong.

Best regards

Phil
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mon crash when client mounts CephFS

2021-06-08 Thread Ilya Dryomov
On Tue, Jun 8, 2021 at 9:20 PM Phil Merricks  wrote:
>
> Hey folks,
>
> I have deployed a 3 node dev cluster using cephadm.  Deployment went
> smoothly and all seems well.
>
> If I try to mount a CephFS from a client node, 2/3 mons crash however.
> I've begun picking through the logs to see what I can see, but so far
> other than seeing the crash in the log itself, it's unclear what the cause
> of the crash is.
>
> Here's a log. .  You can see where the crash is
> occurring around the line that begins with "Jun 08 18:56:04 okcomputer
> podman[790987]:"

Hi Phil,

I assume you are mounting the kernel client, not ceph-fuse?  If so,
what is the kernel version on the client node?

ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0)
pacific (stable)
1: /lib64/libpthread.so.0(+0x12b20) [0x7fc36de86b20]
2: gsignal()
3: abort()
4: /lib64/libstdc++.so.6(+0x9009b) [0x7fc36d4a409b]
5: /lib64/libstdc++.so.6(+0x9653c) [0x7fc36d4aa53c]
6: /lib64/libstdc++.so.6(+0x96597) [0x7fc36d4aa597]
7: /lib64/libstdc++.so.6(+0x967f8) [0x7fc36d4aa7f8]
8: /lib64/libstdc++.so.6(+0x92045) [0x7fc36d4a6045]
9: /usr/bin/ceph-mon(+0x4d8da6) [0x563c51ad8da6]
10: (MDSMonitor::check_sub(Subscription*)+0x819) [0x563c51acf329]
11: (Monitor::handle_subscribe(boost::intrusive_ptr)+0xcd8)
[0x563c518c1258]
12: (Monitor::dispatch_op(boost::intrusive_ptr)+0x78d)
[0x563c518e72ed]
13: (Monitor::_ms_dispatch(Message*)+0x670) [0x563c518e8910]
14: (Dispatcher::ms_dispatch2(boost::intrusive_ptr
const&)+0x5c) [0x563c51916fdc]
15: (DispatchQueue::entry()+0x126a) [0x7fc3705c6b1a]
16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fc370676b71]
17: /lib64/libpthread.so.0(+0x814a) [0x7fc36de7c14a]
18: clone()

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io