[ceph-users] RGW pubsub deprecation

2020-11-04 Thread Yuval Lifshitz
Dear Community,
Since Nautilus, we have 2 mechanisms for notifying 3rd parties on changes
in buckets and objects: "bucket notifications" [1] and "pubsub" [2].

In "bucket notifications" (="push mode") the events are sent from the RGW
to an external entity (kafka, rabbitmq etc.), while in "pubsub" (="pull
mode") the events are synched with a special zone, where they are stored
and could be later fetched by an external app.

>From communications that I've seen so far, users preferred to use "bucket
notifications" over "pubsub". Since supporting both modes has maintenance
overhead, I was considering deprecating "pubsub".
However, before doing that I would like to see what the community has to
say!

So, if you are currently using pubsub, or plan to use it, as "pull mode"
fits your usecase better than "push mode" please chime in.

Yuval

[1] https://docs.ceph.com/en/latest/radosgw/notifications/
[2] https://docs.ceph.com/en/latest/radosgw/pubsub-module/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Fwd: File read are not completing and IO shows in bytes able to not reading from cephfs

2020-11-04 Thread Amudhan P
Update on the issue of reading file drops to bytes and error.

When new files are copied to mount it works fine and reading the same also
working with no issue.
But reading old or existing files still the same issue and below error msg
in client.
"libceph: osd1 10.0.104.1:6891 socket closed (con state CONNECTING)"


-- Forwarded message -
From: Amudhan P 
Date: Wed, Nov 4, 2020 at 6:24 PM
Subject: File read are not completing and IO shows in bytes able to not
reading from cephfs
To: ceph-users 


Hi,

In my test ceph octopus cluster I was trying to simulate a failure case of
when client mounted cephfs thru kernel client and doing  read and write
process, shutting down entire cluster with OSD flags like no down, no out,
no backfiling and no recovery.

Cluster is  4 node composed of 3 mons, 2 mgr, 2 mds, 48 OSD's.
Public IP range : 10.0.103.0 and Cluster IP range : 10.0.104.0

Write and Read got stalled after some time cluster was brought live and
healthy. But when reading file thru kernel mount read start at above
100MB/s and suddenly drops to byte and continues for long.
only error msg I could see in the client machine.

[  167.591095] ceph: loaded (mds proto 32)
[  167.600010] libceph: mon0 10.0.103.1:6789 session established
[  167.601167] libceph: client144519 fsid
f8bc7682-0d11-11eb-a332-0cc47a5ec98a
[  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
CONNECTING)

What went wrong why is this issue.?

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph flash deployment

2020-11-04 Thread Frank Schilder
That's actually an interesting question. On the 5.9 kernels cfq seems not 
available:

# cat /sys/block/sdj/queue/scheduler
[mq-deadline] kyber bfq none

What is the recommendation here?

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Seena Fallah 
Sent: 04 November 2020 19:01:18
To: Alexander E. Patrakov
Cc: ceph-users
Subject: [ceph-users] Re: Ceph flash deployment

I see in this thread that someone is saying that bluestore is only works
good with cfq scheduler:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031063.html

For readahead, do you have any measurements to see how I can measure my
workload to see if I should increase it or not?

Thanks.

On Wed, Nov 4, 2020 at 8:00 AM Alexander E. Patrakov 
wrote:

> With the latest kernel, this is not valid for all-flash clusters.
> Simply because cfq is not an option at all there, and readahead
> usefulness depends on your workload (in other words, it can help or
> hurt) and therefore cannot be included in a universally-applicable set
> of tuning recommendations. Also, look again: the title talks about
> all-flash deployments, while the context of the benchmark talks about
> 7200RPM HDDs!
>
> On Wed, Nov 4, 2020 at 12:37 AM Seena Fallah 
> wrote:
> >
> > Thanks for your useful information.
> >
> > Can you please also point to the kernel and disk configuration that are
> still valid for bluestore or not? I mean the read_ahead_kb and disk
> scheduler.
> >
> > Thanks.
> >
> > On Tue, Nov 3, 2020 at 10:55 PM Alexander E. Patrakov <
> patra...@gmail.com> wrote:
> >>
> >> On Tue, Nov 3, 2020 at 6:30 AM Seena Fallah 
> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > Does this guid is still valid for a bluestore deployment with
> nautilus or
> >> > octopus?
> >> >
> https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments
> >>
> >> Some of the guidance is of course outdated.
> >>
> >> E.g., at the time of that writing, 1x 40GbE was indeed state of the
> >> art in the networking world, but now 100GbE network cards are
> >> affordable, and with 6 NVMe drives per server, even that might be a
> >> bottleneck if the clients use a large block size (>64KB) and do an
> >> fsync() only at the end.
> >>
> >> Regarding NUMA tuning, Ceph made some progress. If it finds that your
> >> NVMe and your network card are on the same NUMA node, then, with
> >> Nautilus or later, the OSD will pin itself to that NUMA node
> >> automatically. I.e.: choose strategically which PCIe slots to use,
> >> maybe use two network cards, and you will not have to do any tuning or
> >> manual pinning.
> >>
> >> Partitioning the NVMe was also a popular advice in the past, but now
> >> that there are "osd op num shards" and "osd op num threads per shard"
> >> parameters, with sensible default values, this is something that tends
> >> not to help.
> >>
> >> Filesystem considerations in that document obviously apply only to
> >> Filestore, which is something you should not use.
> >>
> >> Large PG number per OSD helps more uniform data distribution, but
> >> actually hurts performance a little bit.
> >>
> >> The advice regarding the "performance" cpufreq governor is valid, but
> >> you might also look at (i.e. benchmark for your workload specifically)
> >> disabling the deepest idle states.
> >>
> >> --
> >> Alexander E. Patrakov
> >> CV: http://pc.cd/PLz7
>
>
>
> --
> Alexander E. Patrakov
> CV: http://pc.cd/PLz7
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD image stuck and no erros on logs

2020-11-04 Thread Salsa
Hi,

This same error keeps happening to me: after writing some amount of data to an 
RBD image it gets stuck and no read or write operation on it works. Every 
operation hangs. I cannot resize, alter features, read or write data. I can 
mount it, but using parted or fdisk hangs indefinitely. In the end all I can do 
is remove the image.

Again, I see no errors on the logs and Ceph's status is OK. I tried to alter 
some log levels, but still no helpful info.

Is there anything I should check? Rados?

--
Salsa

Sent with ProtonMail Secure Email.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph flash deployment

2020-11-04 Thread Seena Fallah
I see in this thread that someone is saying that bluestore is only works
good with cfq scheduler:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/031063.html

For readahead, do you have any measurements to see how I can measure my
workload to see if I should increase it or not?

Thanks.

On Wed, Nov 4, 2020 at 8:00 AM Alexander E. Patrakov 
wrote:

> With the latest kernel, this is not valid for all-flash clusters.
> Simply because cfq is not an option at all there, and readahead
> usefulness depends on your workload (in other words, it can help or
> hurt) and therefore cannot be included in a universally-applicable set
> of tuning recommendations. Also, look again: the title talks about
> all-flash deployments, while the context of the benchmark talks about
> 7200RPM HDDs!
>
> On Wed, Nov 4, 2020 at 12:37 AM Seena Fallah 
> wrote:
> >
> > Thanks for your useful information.
> >
> > Can you please also point to the kernel and disk configuration that are
> still valid for bluestore or not? I mean the read_ahead_kb and disk
> scheduler.
> >
> > Thanks.
> >
> > On Tue, Nov 3, 2020 at 10:55 PM Alexander E. Patrakov <
> patra...@gmail.com> wrote:
> >>
> >> On Tue, Nov 3, 2020 at 6:30 AM Seena Fallah 
> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > Does this guid is still valid for a bluestore deployment with
> nautilus or
> >> > octopus?
> >> >
> https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments
> >>
> >> Some of the guidance is of course outdated.
> >>
> >> E.g., at the time of that writing, 1x 40GbE was indeed state of the
> >> art in the networking world, but now 100GbE network cards are
> >> affordable, and with 6 NVMe drives per server, even that might be a
> >> bottleneck if the clients use a large block size (>64KB) and do an
> >> fsync() only at the end.
> >>
> >> Regarding NUMA tuning, Ceph made some progress. If it finds that your
> >> NVMe and your network card are on the same NUMA node, then, with
> >> Nautilus or later, the OSD will pin itself to that NUMA node
> >> automatically. I.e.: choose strategically which PCIe slots to use,
> >> maybe use two network cards, and you will not have to do any tuning or
> >> manual pinning.
> >>
> >> Partitioning the NVMe was also a popular advice in the past, but now
> >> that there are "osd op num shards" and "osd op num threads per shard"
> >> parameters, with sensible default values, this is something that tends
> >> not to help.
> >>
> >> Filesystem considerations in that document obviously apply only to
> >> Filestore, which is something you should not use.
> >>
> >> Large PG number per OSD helps more uniform data distribution, but
> >> actually hurts performance a little bit.
> >>
> >> The advice regarding the "performance" cpufreq governor is valid, but
> >> you might also look at (i.e. benchmark for your workload specifically)
> >> disabling the deepest idle states.
> >>
> >> --
> >> Alexander E. Patrakov
> >> CV: http://pc.cd/PLz7
>
>
>
> --
> Alexander E. Patrakov
> CV: http://pc.cd/PLz7
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Mon went down and won't come back

2020-11-04 Thread Paul Mezzanini
Hi everyone,

I figure it's time to pull in more brain power on this one.  We had an NVMe 
mostly die in one of our monitors and it caused the write latency for the 
machine to spike.  Ceph did the RightThing(tm) and when it lost quorum on that 
machine it was ignored.  I pulled the bad drive out of the array and tried to 
bring the mon and mgr back in (our monitors double-duty as managers).

The manager came up 0 problems but the monitor got stuck probing.  

I removed the bad host from the monmap and stood up a new one on an OSD node to 
get back to 3 active.  That new node added perfectly using the same methods 
I've tried on the old one.

Network appears to be clean between all hosts.  Packet captures show them 
chatting just fine.  Since we are getting ready to upgrade from RHEL7 to RHEL8 
I took this as an opportunity to reinstall the monitor as an 8 box to get that 
process rolling.  Box is now on RHEL8 with no changes to how ceph-mon is acting.

I install machines with a kickstart and use our own ansible roles to get it 95% 
into service.  I then follow the manual install instructions 
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#adding-monitors).

Time is in sync, /var/lib/ceph/mon/* is owned by the right UID, keys are in 
sync, configs are in sync.  I pulled the old mon out of "mon initial members" 
and "mon host".  `nc` can talk to all the ports in question and we've tried it 
with firewalld off as well (ditto with selinux).  Cleaned up some stale DNS and 
even tried a different IP (same DNS name). I started all of this with 14.2.12 
but .13 was released while debugging so I've got that on the broken monitor at 
the moment.

I manually start the daemon in debug mode (/usr/bin/ceph-mon -d --cluster ceph 
--id ceph-mon-02 --setuser ceph --setgroup ceph) until it's joined in then use 
the systemd scripts to start it once it's clean.  The current state is:

(Lightly sanitized output)
:snip:
2020-11-04 11:38:57.049 7f4232fb3540  0 mon.ceph-mon-02 does not exist in 
monmap, will attempt to join an existing cluster
2020-11-04 11:38:57.049 7f4232fb3540  0 using public_addr v2:Num.64:0/0 -> 
[v2:Num.64:3300/0,v1:Num.64:6789/0]
2020-11-04 11:38:57.050 7f4232fb3540  0 starting mon.ceph-mon-02 rank -1 at 
public addrs [v2:Num.64:3300/0,v1:Num.64:6789/0] at bind addrs 
[v2:Num.64:3300/0,v1:Num.64:6789/0] mon_data /var/lib/ceph/mon/ceph-ceph-mon-02 
fsid 8514c8d5-4cd3-4dee-b460-27633e3adb1a
2020-11-04 11:38:57.051 7f4232fb3540  1 mon.ceph-mon-02@-1(???) e25 preinit 
fsid 8514c8d5-4cd3-4dee-b460-27633e3adb1a
2020-11-04 11:38:57.051 7f4232fb3540  1 mon.ceph-mon-02@-1(???) e25  
initial_members ceph-mon-01,ceph-mon-03, filtering seed monmap
2020-11-04 11:38:57.051 7f4232fb3540  0 mon.ceph-mon-02@-1(???).mds e430081 new 
map
2020-11-04 11:38:57.051 7f4232fb3540  0 mon.ceph-mon-02@-1(???).mds e430081 
print_map
:snip:
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd e1198618 
crush map has features 288514119978713088, adjusting msgr requires
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd e1198618 
crush map has features 288514119978713088, adjusting msgr requires
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd e1198618 
crush map has features 3314933069571702784, adjusting msgr requires
2020-11-04 11:38:57.053 7f4232fb3540  0 mon.ceph-mon-02@-1(???).osd e1198618 
crush map has features 288514119978713088, adjusting msgr requires
2020-11-04 11:38:57.054 7f4232fb3540  1 
mon.ceph-mon-02@-1(???).paxosservice(auth 54141..54219) refresh upgraded, 
format 0 -> 3
2020-11-04 11:38:57.069 7f421d891700  1 mon.ceph-mon-02@-1(probing) e25 
handle_auth_request failed to assign global_id
 ^^^ last line repeated every few seconds until process killed

I've exhausted everything I can think of so I've just been doing the scientific 
shotgun (one slug at a time) approach to see what changes.  Does anyone else 
have any ideas?

--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu

CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: module not found

2020-11-04 Thread Nadiia Kotelnikova
I have updated dockers to version "v15.2.5", but it still did not 
resolve the issue.


Luckily, i found the solution, which is strange: i had in config 
"mgr/cephadm/log_to_file" set to true.


After i removed this option, i am able to execute commands, related to 
orchestrator:


> sudo ceph config rm mgr mgr/cephadm/log_to_file

Thanks for the support :)

On 2020-11-03 20:43, Nadiia Kotelnikova wrote:
Thanks for really fast answer, however, there is no such line. I have 
a version v15.2.4 
. 
This file is already quite modified from the version which i have.


Probably, this backport 
 
is for resolving another issue.


Unfortunately, i have only such entries in log file of the MGR:

$ sudo ceph orch ls
Error ENOENT: Module not found

log of the MGR:

debug 2020-11-03T14:49:33.848+ 7fadbd5eb700  0 log_channel(audit) 
log [DBG] : from='client.404499 -' entity='client.admin' 
cmd=[{"prefix": "orch ls", "target": ["mon-mgr", ""]}]: dispatch

debug 2020-11-03T14:49:33.852+ 7fadbcdea700 -1 no module 'cephadm'
debug 2020-11-03T14:49:33.852+ 7fadbcdea700 -1 no module 'cephadm'
debug 2020-11-03T14:49:33.852+ 7fadbcdea700 -1 mgr.server reply 
reply (2) No such file or directory Module not found


On 2020-11-03 15:19, 胡 玮文 wrote:

Sorry, it should be “cephadm enter”.


在 2020年11月3日,22:09,胡 玮文  写道:

 Hi Nadiia,

Although I don’t have this issue, I think you can apply the fix 
manually. You just need to use “cephadm exec” to get into the mgr 
container, and change one line of python code as in 
https://github.com/ceph/ceph/pull/37141/files#diff-5f6d300f6d71c1b58783257d5dc652d507376cb018f227ab6fa3521db3fc55feR467


Make backup before you do so. I have not tried this.

在 2020年11月3日,21:51,Nadiia Kotelnikova  
写道:


Hi,

i am experience the same problem. Could you please advise something 
how to resolve this issue?
The fix should be shipped with 15.2.6 version of "ceph-common" or 
ceph version?


I have my cluster in docker containers and systemd services.

How can I upgrade cluster to 15.2.6 if the command for upgrading fails?

sudo ceph orch upgrade start --ceph-version 15.2.5
Error ENOENT: Module not found


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Seriously degraded performance after update to Octopus

2020-11-04 Thread Martin Rasmus Lundquist Hansen
Thank you for the suggestion. It does indeed seem to explain why the OSD nodes 
are no longer using the Buffers for caching. 

Unfortunately, changing the value bluefs_buffered_io does not seem to make any 
difference in performance. I will keep looking for clues.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to reset Log Levels

2020-11-04 Thread Ml Ml
Is this still debug output or "normal"?:

Nov 04 10:19:39 ceph01 bash[2648]: audit
2020-11-04T09:19:38.577088+ mon.ceph03 (mon.0) 7738 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:40 ceph01 bash[2648]: cluster
2020-11-04T09:19:38.997145+ mgr.ceph03 (mgr.42824785) 212 :
cluster [DBG] pgmap v214: 2113 pgs: 1 active+clean+scrubbing, 37
active+remapped+backfill_wait, 9 active+remapped+backfilling, 2066
active+clean; 36 TiB data, 112 TiB used, 59 TiB / 172 TiB avail; 66
MiB/s rd, 64 MiB/s wr, 738 op/s; 202190/34023663 objects misplaced
(0.594%)
Nov 04 10:19:40 ceph01 bash[2648]: audit
2020-11-04T09:19:39.578221+ mon.ceph03 (mon.0) 7739 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:41 ceph01 bash[2648]: audit
2020-11-04T09:19:40.578383+ mon.ceph03 (mon.0) 7740 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:42 ceph01 bash[2648]: cluster
2020-11-04T09:19:41.003992+ mgr.ceph03 (mgr.42824785) 213 :
cluster [DBG] pgmap v215: 2113 pgs: 1 active+clean+scrubbing, 37
active+remapped+backfill_wait, 8 active+remapped+backfilling, 2067
active+clean; 36 TiB data, 112 TiB used, 59 TiB / 172 TiB avail; 56
MiB/s rd, 53 MiB/s wr, 639 op/s; 202029/34023711 objects misplaced
(0.594%)
Nov 04 10:19:42 ceph01 bash[2648]: audit
2020-11-04T09:19:41.577839+ mon.ceph03 (mon.0) 7741 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:43 ceph01 bash[2648]: debug 2020-11-04T09:19:43.139+
7f173724d700  1 mon.ceph01@1(peon).osd e638679 _set_new_cache_sizes
cache_size:1020054731 inc_alloc: 146800640 full_alloc: 163577856
kv_alloc: 704643072
Nov 04 10:19:43 ceph01 bash[2648]: audit
2020-11-04T09:19:42.578270+ mon.ceph03 (mon.0) 7742 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:44 ceph01 bash[2648]: cluster
2020-11-04T09:19:43.008288+ mgr.ceph03 (mgr.42824785) 214 :
cluster [DBG] pgmap v216: 2113 pgs: 1 active+clean+scrubbing, 37
active+remapped+backfill_wait, 8 active+remapped+backfilling, 2067
active+clean; 36 TiB data, 112 TiB used, 59 TiB / 172 TiB avail; 37
MiB/s rd, 24 MiB/s wr, 416 op/s; 202029/34023735 objects misplaced
(0.594%); 132 MiB/s, 34 objects/s recovering
Nov 04 10:19:44 ceph01 bash[2648]: audit
2020-11-04T09:19:43.578476+ mon.ceph03 (mon.0) 7743 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:45 ceph01 bash[2648]: audit
2020-11-04T09:19:44.578161+ mon.ceph03 (mon.0) 7744 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:45 ceph01 bash[2648]: cluster
2020-11-04T09:19:45.022173+ mgr.ceph03 (mgr.42824785) 215 :
cluster [DBG] pgmap v217: 2113 pgs: 1 active+clean+scrubbing, 37
active+remapped+backfill_wait, 8 active+remapped+backfilling, 2067
active+clean; 36 TiB data, 112 TiB used, 59 TiB / 172 TiB avail; 71
MiB/s rd, 20 MiB/s wr, 754 op/s; 201814/34023918 objects misplaced
(0.593%); 211 MiB/s, 55 objects/s recovering
Nov 04 10:19:46 ceph01 bash[2648]: audit
2020-11-04T09:19:45.579026+ mon.ceph03 (mon.0) 7745 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:47 ceph01 bash[2648]: audit
2020-11-04T09:19:46.579195+ mon.ceph03 (mon.0) 7746 : audit [DBG]
from='mgr.42824785 10.10.2.103:0/3293316818' entity='mgr.ceph03'
cmd=[{"prefix": "mds metadata", "who": "cephfs.ceph04.hrcvab"}]:
dispatch
Nov 04 10:19:47 ceph01 bash[2648]: cluster
2020-11-04T09:19:47.026027+ mgr.ceph03 (mgr.42824785) 216 :
cluster [DBG] pgmap v218: 2113 pgs: 1 active+clean+scrubbing, 37
active+remapped+backfill_wait, 8 active+remapped+backfilling, 2067
active+clean; 36 TiB data, 112 TiB used, 59 TiB / 172 TiB avail; 63
MiB/s rd, 17 MiB/s wr, 695 op/s; 201787/34024164 objects misplaced
(0.593%); 186 MiB/s, 48 objects/s recovering
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] File read are not completing and IO shows in bytes able to not reading from cephfs

2020-11-04 Thread Amudhan P
Hi,

In my test ceph octopus cluster I was trying to simulate a failure case of
when client mounted cephfs thru kernel client and doing  read and write
process, shutting down entire cluster with OSD flags like no down, no out,
no backfiling and no recovery.

Cluster is  4 node composed of 3 mons, 2 mgr, 2 mds, 48 OSD's.
Public IP range : 10.0.103.0 and Cluster IP range : 10.0.104.0

Write and Read got stalled after some time cluster was brought live and
healthy. But when reading file thru kernel mount read start at above
100MB/s and suddenly drops to byte and continues for long.
only error msg I could see in the client machine.

[  167.591095] ceph: loaded (mds proto 32)
[  167.600010] libceph: mon0 10.0.103.1:6789 session established
[  167.601167] libceph: client144519 fsid
f8bc7682-0d11-11eb-a332-0cc47a5ec98a
[  272.132787] libceph: osd1 10.0.104.1:6891 socket closed (con state
CONNECTING)

What went wrong why is this issue.?

regards
Amudhan P
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] bluefs_buffered_io

2020-11-04 Thread Marcel Kuiper
Hi list,

I see a few changes in the (minor) version changelogs in the default for
bluefs_buffered_io setting. Sometimes it is set to true in our version
(14.2.11) it is set to false

Can someone shed a light on this setting? I fail to find any documentation
on it. ceph config help is not entirely clear to me as well

- What does it do exactly when true
- If false does that mean that the linux buffer cache is always skipped?
And caching happens in the osd proces only?
- if enabled should we lower the osd_memory_target to leave more space for
the linux buffer cache? What would be the percentage of memory that we
then assign to osd_memory_targets

Marcel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 14.2 - some PGs stuck peering.

2020-11-04 Thread Eugen Block

Hi,

it's not really clear what happened, I would investigate the root  
cause first. Did some of the OSDs fail, if yes, why?

To increase the recovery speed you can change these values live:

osd_max_backfills
osd_recovery_max_active

Choose carefully and only increase slowly as it can easily impact client I/O.

In the subject you write that they're stuck peering but you also write:

PGs seem to be slowly migrating from peering to activating but it's  
going very slowly - approx 10PGs during last hour.


So they're not stuck, correct?

Regards,
Eugen


Zitat von m.sliwin...@lh.pl:


Hi

We have a weird issue iwth our ceph cluster - almost all PGs  
assigned to one specific pool became stuck, locking out all  
operations without reporting any errors.

Story:
We have 3 different pools, hdd-backed, ssd-backed and nvme-backed.
Pool ssh worked fine for few months.
Today one of the hosts assigned to nvme pool restarted triggering  
recovery in that pool. It wnet fast and cluster went to OK state.
During these events or shortly after them ssd pool became  
unresponsive. It was impossible to either read or write from/to it.
We decided to slowly restart fist OSDs assigned to it, thenm as it  
didn't help - all the mons, wihout breaking quorum of course.
At this moment both nvme and hdd polls are working fine, ssd one is  
stuck in recovery.
All OSDs in that ssd pool use large amount of CPU and are exchanging  
approx 1Mpps per OSD server between each other.


PGs seem to be slowly migrating from peering to activating but it's  
going very slowly - approx 10PGs during last hour.


We were using 14.2.2 OSDs when issues happened, upgrade to 14.2.13  
didn't help. We increased heartbeat grace, but it didn't change  
anything.
It doesn't seem that there's a network problem as OSDs don't report  
problems with connecting to MONs or each other. Other OSDs - nvme,  
connected to that same set of switches work without issues.


Can you help? Point me to what should i check or do? I looked  
on-line and on the group for causes of peering issues and checked  
most of them, nothing helped.
I can't use 'ceph pg 28.1cc query' as it hangs, even for PGs that  
are marked as active+clean in the results of 'ceph pg dump'


I checked status of the one of stuck PGs via ceph-objectstore-tool  
--data-path [...] --op info --pgid 28.29d for all three copies and  
got:


{
"pgid": "28.29d",
"last_update": "68160'205094",
"last_complete": "68160'205094",
"log_tail": "68062'202000",
"last_user_version": 205094,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [
{
"start": "1",
"length": "3"
}
],
"history": {
"epoch_created": 67698,
"epoch_pool_created": 67698,
"last_epoch_started": 68871,
"last_interval_started": 68851,
"last_epoch_clean": 67746,
"last_interval_clean": 67745,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 69447,
"same_interval_since": 69447,
"same_primary_since": 69411,
"last_scrub": "68062'199623",
"last_scrub_stamp": "2020-11-03 03:32:46.895988",
"last_deep_scrub": "68062'177321",
"last_deep_scrub_stamp": "2020-11-02 01:07:15.963916",
"last_clean_scrub_stamp": "2020-11-03 03:32:46.895988"
},
"stats": {
"version": "68160'205094",
"reported_seq": "378496",
"reported_epoch": "69447",
"state": "peering",
"last_fresh": "2020-11-03 20:55:39.247348",
"last_change": "2020-11-03 20:55:39.247348",
"last_active": "2020-11-03 15:26:24.270088",
"last_peered": "2020-11-03 19:04:43.152655",
"last_clean": "2020-11-03 14:45:02.988293",
"last_became_active": "2020-09-01 13:52:40.091759",
"last_became_peered": "2020-11-03 19:04:42.939991",
"last_unstale": "2020-11-03 20:55:39.247348",
"last_undegraded": "2020-11-03 20:55:39.247348",
"last_fullsized": "2020-11-03 20:55:39.247348",
"mapping_epoch": 69447,
"log_start": "68062'202000",
"ondisk_log_start": "68062'202000",
"created": 67698,
"last_epoch_clean": 67746,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "68062'199623",
"last_scrub_stamp": "2020-11-03 03:32:46.895988",
"last_deep_scrub": "68062'177321",
"last_deep_scrub_stamp": "2020-11-02 01:07:15.963916",
"last_clean_scrub_stamp": "2020-11-03 03:32:46.895988",
"log_size": 3094,
"ondisk_log_size": 3094,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": false,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 15173849600,
"num_objects": 3647,
"num_object_clones": 0,
"num_object_copies": 10941,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 3647,
"num_whiteouts": 0,
"num_read": 172836,
"num_read_kb": 6824184,
"num_write": 196190,
"num_write_kb": 21380176,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 0,
"num_bytes_recovered": 0,
"num_keys_recovered": 0,
"num_objec