[ceph-users] Re: 17.2.6 fs 'ls' ok, but 'cat' 'operation not permitted' puzzle

2023-05-02 Thread Harry G Coin
This problem of inaccessible file systems post upgrade by other than 
client.admin date back from v14 carries on through v17.  It also applies 
to any case of specifying other than the default pool names for new file 
systems.  Solved because Curt remembered link on this list.  (Thanks 
Curt!) Here's what the official ceph docs ought have provided, for 
others who hit this.  YMMV:


   IF

   you have ceph file systems which have data and meta data pools that
   were specified in the 'ceph fs new' command (meaning not left to the
   defaults which create them for you),

   OR

   you have an existing ceph file system and are upgrading to a new
   major version of ceph

   THEN

   for the documented 'ceph fs authorize...' commands to do as
   documented (and avoid strange 'operation not permitted' errors when
   doing file I/O or similar security related problems for all but such
   as the client.admin user), you must first run:

   ceph osd pool application set  cephfs
   metadata 

   and

   ceph osd pool application set  cephfs data
   

   Otherwise when the OSD's get a request to read or write data (not
   the directory info, but file data) they won't know which ceph file
   system name to look up, nevermind the names you may have chosen for
   the pools,  as the 'defaults' themselves changed in the major
   releases,  from

   data pool=fsname
   metadata pool=fsname_metadata

   to

   data pool=fsname.data and
   metadata pool=fsname.meta

   as the ceph revisions came and went.  Any setup that just used
   'client.admin' for all mounts didn't see the problem as the admin
   key gave blanket permission.

   A temporary 'fix' is to change mount requests to the 'client.admin'
   and associated key.  A less drastic but still half-fix is to change
   the osd cap for your user to just 'caps osd = "allow rw"  and delete
   "tag cephfs data="

The only documentation I could find for this upgrade security-related 
ceph-ending catastrophe was in the NFS, not cephfs docs:


https://docs.ceph.com/en/latest/cephfs/nfs/

and the Genius level much appreciated pointer from Curt here:


On 5/2/23 14:21, Curt wrote:
This thread might be of use, it's an older version of ceph 14, but 
might still apply, 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/23FDDSYBCDVMYGCUTALACPFAJYITLOHJ/ 
?


On Tue, May 2, 2023 at 11:06 PM Harry G Coin  wrote:

In 17.2.6 is there a security requirement that pool names
supporting a
ceph fs filesystem match the filesystem name.data for the data and
name.meta for the associated metadata pool? (multiple file systems
are
enabled)

I have filesystems from older versions with the data pool name
matching
the filesystem and appending _metadata for that,

and even older filesystems with the pool name as in 'library' and
'library_metadata' supporting a filesystem called 'libraryfs'

The pools all have the cephfs tag.

But using the documented:

ceph fs authorize libraryfs client.basicuser / rw

command allows the root user to mount and browse the library
directory
tree, but fails with 'operation not permitted' when even reading
any file.

However, changing the client.basicuser osd auth to 'allow rw'
instead of
'allow rw tag...' allows normal operations.

So:

[client.basicuser]
key = ==
caps mds = "allow rw fsname=libraryfs"
caps mon = "allow r fsname=libraryfs"
caps osd = "allow rw"

works, but the same with

    caps osd = "allow rw tag cephfs data=libraryfs"

leads to the 'operation not permitted' on read, or write or any
actual
access.

It remains a puzzle.  Help appreciated!

Were there upgrade instructions about that, any help pointing me
to them?

Thanks

Harry Coin
Rock Stable Systems

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: architecture help (iscsi, rbd, backups?)

2023-05-02 Thread Bailey Allison
Hey Angelo,

Ya, we are using the RBD driver for quite a few customers in production, and it 
is working quite good!

Hahahaha, I am familiar with the bug you are talking about I think, I believe 
that may be resolved by now.

I believe the driver is either out of beta now/soon to be out of beta? 

I recall watching a talk by the developers where they mentioned it should be 
out of beta soon or already is, in addition I want to say they are also/would 
also eventually be offering a paid support for it, but can't say for certain on 
that.

Otherwise though like I said we are using it for quite a few customers with 
some really good success.

Also, if you have any further questions re: CephFS and SMB please feel free to 
ask! I am sure other people might also find the same questions and answers 
helpful 

Regards,

Bailey

>-Original Message-
>From: Angelo Hongens  
>Sent: April 29, 2023 11:21 PM
>To: Bailey Allison ; ceph-users@ceph.io
>Subject: [ceph-users] Re: architecture help (iscsi, rbd, backups?)
>
>Bailey,
>
>Thanks for your extensive reply, you got me down the wormhole of CephFS and 
>SMB (and looking at a lot of 45drives videos and knowledge base, Houston 
>dashboard, >reading up on CTDB, etc), and this is a really interesting option 
>as well! Thanks for the write-up.
>
>
>By the way, are you using the RBD driver in Windows in production with your 
>customers?
>
>The binaries are still called beta, and last time I tried it in a proof of 
>concept setup (a while back), it would never connect and always crash out on 
>me. After reporting >an issue, I did not get a response for almost three 
>months before a dev responded that it was an unsupported ipv6 issue. Not a 
>problem, and all very understandable, >it's open source software written 
>mostly by volunteers, but I got a bit cautious about deploying this to 
>production ;)
>
>Angelo.





>On 27/04/2023 18:20, Bailey Allison wrote:
> Hey Angelo,
> 
> Just to make sure I'm understanding correctly, the main idea for the 
> use case is to be able to present Ceph storage to windows clients as SMB?
> 
> If so, you can absolutely use CephFS to get that done. This is 
> something we do all the time with our cluster configurations, if we're 
> looking to present ceph storage to windows clients for the use case of 
> a file server is our standard choice, and to your point of 
> security/ACLs we can make use of joining the samba server that to an 
> existing active directory, and then assigning permissions through Windows.
> 
> I will provide a high level overview of an average setup to hopefully 
> explain it better, and of course if you have any questions please let 
> me know. I understand that this is way different of a setup of what 
> you currently have planned, but it's a different choice that could 
> prove useful in your case.
> 
> Essentially how it works is we have ceph cluster with CephFS 
> configured, of which we map CephFS kernel mounts onto some gateway 
> nodes, at which point we expose to clients via CTDB with SMB shares (CTDB for 
> high availability).
> 
> i.e
> 
> ceph cluster > ceph fs > map cephfs kernel mount on linux client > 
> create smb share on top of cephfs kernel mount > connect to samba 
> share with windows clients.
> 
> The SMB gateway nodes hosting samba also can be joined to an Active 
> Directory to allow setting Windows ACL permissions to allow more in 
> depth control of ACLs.
> 
> Also I will say +1 for the RBD driver on Windows, something we also 
> make use of a lot and have a lot of success with.
> 
> Again, please let me know if you need any insight or clarification, or 
> have any further questions. Hope this is of assistance.
> 
> Regards,
> 
> Bailey
> 
> -Original Message-
>> From: Angelo Höngens 
>> Sent: April 27, 2023 6:06 PM
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] architecture help (iscsi, rbd, backups?)
>>
>> Hey guys and girls,
>>
>> I'm working on a project to build storage for one of our departments, 
>> and I
> want to ask you guys and girls for input on the high-level overview part.
> It's a long one, I hope you read along and comment.
>>
>> SUMMARY
>>
>> I made a plan last year to build a 'storage solution' including ceph 
>> and
> some windows VM's to expose the data over SMB to clients. A year later 
> I finally have the hardware, built a ceph cluster, and I'm doing 
> tests. Ceph itself runs great, but when I wanted to start exposing the 
> data using iscsi to our VMware farm, I ran into some issues. I know 
> the iscsi gateways will introduce some new performance bottlenecks, 
> but I'm seeing really slow performance, still working on that.
>>
>> But then I ran into the warning on the iscsi gateway page: "The iSCSI
> gateway is in maintenance as of November 2022. This means that it is 
> no longer in active development and will not be updated to add new features.".
> Wait, what? Why!? What does this mean? Does this mean that iSCSI is 
> now 'feature complete' and 

[ceph-users] [multisite] "bucket sync status" takes a while

2023-05-02 Thread Yixin Jin
Hi folks,

With a multi-site environment, when I create a bucket-level sync policy with a 
symmetric flow between the master zone and another zone, "bucket sync status" 
immediately shows that the sync is now enabled in the master zone. But it takes 
a while for it to show that in the other zone. I tried "period pull" at the 
other zone and "period push" at the master zone. Neither seem to make a 
difference. Is there a way to speed up this process?

Thanks,
Yixin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Laura Flores
I saw two untracked failures in the upgrade/octopus-x suite. Both failures
seem to indicate a problem with containers, unrelated to the Ceph code.
However, if anyone else can please take a look and confirm, I would
appreciate it.
upgrade/octopus-x (pacific)
https://pulpito.ceph.com/yuriw-2023-04-25_14:52:19-upgrade:octopus-x-pacific-release-distro-default-smithi
https://pulpito.ceph.com/yuriw-2023-04-26_20:20:35-upgrade:octopus-x-pacific-release-distro-default-smithi
https://pulpito.ceph.com/yuriw-2023-04-27_14:50:15-upgrade:octopus-x-pacific-release-distro-default-smithi

Failures:
1. https://tracker.ceph.com/issues/59602 -- new tracker
2. https://tracker.ceph.com/issues/59604 -- new tracker

Details:
1. upgrade:octopus-x (pacific): Error: no container with name or ID -
Ceph
2. upgrade:octopus-x (pacific): StopSignal SIGTERM failed to stop
container ceph-2ba77aa2-e491-11ed-9b00-001a4aab830c-mgr.x in 10 seconds,
resorting to SIGKILL - Ceph

On Tue, May 2, 2023 at 11:22 AM Nizamudeen A  wrote:

> dashboard approved!
>
> Regards,
> Nizam
>
> On Tue, May 2, 2023, 20:48 Yuri Weinstein  wrote:
>
> > Please review the Release Notes -
> https://github.com/ceph/ceph/pull/51301
> >
> > Still seeking approvals for:
> >
> > rados - Neha, Radek, Laura
> >   rook - Sébastien Han
> >   dashboard - Ernesto
> >
> > fs - Venky, Patrick
> > (upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8))
> >
> > ceph-volume - Guillaume
> >
> > On Tue, May 2, 2023 at 8:00 AM Casey Bodley  wrote:
> > >
> > > On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein 
> > wrote:
> > > >
> > > > Details of this release are summarized here:
> > > >
> > > > https://tracker.ceph.com/issues/59542#note-1
> > > > Release Notes - TBD
> > > >
> > > > Seeking approvals for:
> > > >
> > > > smoke - Radek, Laura
> > > > rados - Radek, Laura
> > > >   rook - Sébastien Han
> > > >   cephadm - Adam K
> > > >   dashboard - Ernesto
> > > >
> > > > rgw - Casey
> > >
> > > rgw approved
> > >
> > > > rbd - Ilya
> > > > krbd - Ilya
> > > > fs - Venky, Patrick
> > > > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> > > > upgrade/pacific-p2p - Laura
> > > > powercycle - Brad (SELinux denials)
> > > > ceph-volume - Guillaume, Adam K
> > > >
> > > > Thx
> > > > YuriW
> > > > ___
> > > > Dev mailing list -- d...@ceph.io
> > > > To unsubscribe send an email to dev-le...@ceph.io
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 17.2.6 fs 'ls' ok, but 'cat' 'operation not permitted' puzzle

2023-05-02 Thread Harry G Coin
In 17.2.6 is there a security requirement that pool names supporting a 
ceph fs filesystem match the filesystem name.data for the data and 
name.meta for the associated metadata pool? (multiple file systems are 
enabled)


I have filesystems from older versions with the data pool name matching 
the filesystem and appending _metadata for that,


and even older filesystems with the pool name as in 'library' and 
'library_metadata' supporting a filesystem called 'libraryfs'


The pools all have the cephfs tag.

But using the documented:

ceph fs authorize libraryfs client.basicuser / rw

command allows the root user to mount and browse the library directory 
tree, but fails with 'operation not permitted' when even reading any file.


However, changing the client.basicuser osd auth to 'allow rw' instead of 
'allow rw tag...' allows normal operations.


So:

[client.basicuser]
   key = ==
   caps mds = "allow rw fsname=libraryfs"
   caps mon = "allow r fsname=libraryfs"
   caps osd = "allow rw"

works, but the same with

   caps osd = "allow rw tag cephfs data=libraryfs"

leads to the 'operation not permitted' on read, or write or any actual 
access.


It remains a puzzle.  Help appreciated!

Were there upgrade instructions about that, any help pointing me to them?

Thanks

Harry Coin
Rock Stable Systems

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Nikola Ciprich
Hello Igor,

On Tue, May 02, 2023 at 05:41:04PM +0300, Igor Fedotov wrote:
> Hi Nikola,
> 
> I'd suggest to start monitoring perf counters for your osds.
> op_w_lat/subop_w_lat ones specifically. I presume they raise eventually,
> don't they?
OK, starting collecting those for all OSDs..

currently values for avgtime are around 0.0003 for subop_w_lat and 0.001-0.002
for op_w_lat

I guess it'll need some time to find some trend, so I'll check tomorrow


> 
> Does subop_w_lat grow for every OSD or just a subset of them? How large is
> the delta between the best and the worst OSDs after a one week period? How
> many "bad" OSDs are at this point?
I'll see and report

> 
> 
> And some more questions:
> 
> How large are space utilization/fragmentation for your OSDs?
OSD usage is around 16-18%. fragmentation should not be very bad, this
cluster is deployed for few months only


> 
> Is the same performance drop observed for artificial benchmarks, e.g. 4k
> random writes to a fresh RBD image using fio?
will check again when the slowdown occurs and report


> 
> Is there any RAM utilization growth for OSD processes over time? Or may be
> any suspicious growth in mempool stats?
nope, RAM usage seems to be pretty constant.

hewever, probably worh noting, historically we're using following OSD options:
ceph config set osd bluestore_rocksdb_options 
compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB
ceph config set osd bluestore_cache_autotune 0
ceph config set osd bluestore_cache_size_ssd 2G
ceph config set osd bluestore_cache_kv_ratio 0.2
ceph config set osd bluestore_cache_meta_ratio 0.8
ceph config set osd osd_min_pg_log_entries 10
ceph config set osd osd_max_pg_log_entries 10
ceph config set osd osd_pg_log_dups_tracked 10
ceph config set osd osd_pg_log_trim_min 10

so maybe I'll start resetting those to defaults (ie enabling cache autotune etc)
as a first step..


> 
> 
> As a blind and brute force approach you might also want to compact RocksDB
> through ceph-kvstore-tool and switch bluestore allocator to bitmap
> (presuming default hybrid one is effective right now). Please do one
> modification at a time to realize what action is actually helpful if any.
will do..

thanks again for your hints

BR

nik


> 
> 
> Thanks,
> 
> Igor
> 
> On 5/2/2023 11:32 AM, Nikola Ciprich wrote:
> > Hello dear CEPH users and developers,
> > 
> > we're dealing with strange problems.. we're having 12 node alma linux 9 
> > cluster,
> > initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running 
> > bunch
> > of KVM virtual machines accessing volumes using RBD.
> > 
> > everything is working well, but there is strange and for us quite serious 
> > issue
> >   - speed of write operations (both sequential and random) is constantly 
> > degrading
> >   drastically to almost unusable numbers (in ~1week it drops from ~70k 4k 
> > writes/s
> >   from 1 VM  to ~7k writes/s)
> > 
> > When I restart all OSD daemons, numbers immediately return to normal..
> > 
> > volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
> > INTEL SSDPE2KX080T8 NVMEs.
> > 
> > I've updated cluster to 17.2.6 some time ago, but the problem persists. 
> > This is
> > especially annoying in connection with https://tracker.ceph.com/issues/56896
> > as restarting OSDs is quite painfull when half of them crash..
> > 
> > I don't see anything suspicious, nodes load is quite low, no logs errors,
> > network latency and throughput is OK too
> > 
> > Anyone having simimar issue?
> > 
> > I'd like to ask for hints on what should I check further..
> > 
> > we're running lots of 14.2.x and 15.2.x clusters, none showing similar
> > issue, so I'm suspecting this is something related to quincy
> > 
> > thanks a lot in advance
> > 
> > with best regards
> > 
> > nikola ciprich
> > 
> > 
> > 
> -- 
> Igor Fedotov
> Ceph Lead Developer
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> 

-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 

[ceph-users] Flushing stops as copy-from message being throttled

2023-05-02 Thread lingu2008

Hi all,

On one server with a cache tier on Samsung PM983 SSDs for an EC base 
tier on HDDs, I find the cache tier stops flushing or evicting when the 
cache tier is near full. With quite some gdb-debugging, I find the 
problem may be with the throttling mechanism. When the write traffic is 
high, the cache tier quickly fills its maximum request count and 
throttles further requests. Then flush stops because copy-from requests 
are throttled by the cache tier OSD. Ironically, the 256 requests 
already accepted by the cache tier cannot proceed, either, because the 
cache tier is full and cannot flush/evict.


While we may advise cache tier should not go full, this deadlock 
situation is not entirely comprehensible to me because a full cache 
usually can flush/evict as long as the base tier has space.


I wonder whether there has been some specific reasons for this behavior. 
My test environment is with version 15.2.17 but the code in 17.2.2 
appears to handle this part of logic in the same way.


Cheers,

lin

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-05-02 Thread Adam King
The number of mgr daemons thing is expected. The way it works is it first
upgrades all the standby mgrs (which will be all but one) and then fails
over so the previously active mgr can be upgraded as well. After that
failover is when it's first actually running the newer cephadm code, which
is when you're hitting this issue. Are the logs still saying something
similar about how "sudo which python3" is failing? I'm thinking this might
just be a general issue with the user being used not having passwordless
sudo access, that sort of accidentally working in pacific, but now not
working any more in quincy. If the log lines confirm the same, we might
have to work on something in order to handle this case (making the sudo
optional somehow). As mentioned in the previous email, that setup wasn't
intended to be supported even in pacific, although if it did work, we could
bring something in to make it usable in quincy onward as well.

On Tue, May 2, 2023 at 10:58 AM Reza Bakhshayeshi 
wrote:

> Hi Adam,
>
> I'm still struggling with this issue. I also checked it one more time with
> newer versions, upgrading the cluster from 16.2.11 to 16.2.12 was
> successful but from 16.2.12 to 17.2.6 failed again with the same ssh errors
> (I checked
> https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#ssh-errors a
> couple of times and all keys/access are fine).
>
> [root@host1 ~]# ceph health detail
> HEALTH_ERR Upgrade: Failed to connect to host host2 at addr (x.x.x.x)
> [ERR] UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect to host host2 at
> addr (x.x.x.x)
> SSH connection failed to host2 at addr (x.x.x.x): Host(s) were marked
> offline: {'host2', 'host6', 'host9', 'host4', 'host3', 'host5', 'host1',
> 'host7', 'host8'}
>
> The interesting thing is that always (total number of mgrs) - 1 is
> upgraded, If I provision 5 MGRs then 4 of them, and for 3, 2 of them!
>
> As long as I'm in an internal environment, I also checked the process with
> Quincy cephadm binary file. FYI I'm using stretch mode on this cluster.
>
> I don't understand why Quincy MGRs cannot ssh into Pacific nodes, if you
> have any more hints I would be really glad to hear.
>
> Best regards,
> Reza
>
>
>
> On Wed, 12 Apr 2023 at 17:18, Adam King  wrote:
>
>> Ah, okay. Someone else had opened an issue about the same thing after
>> the 17.2.5 release I believe. It's changed in 17.2.6 at least to only use
>> sudo for non-root users
>> https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/cephadm/ssh.py#L148-L153.
>> But it looks like you're also using a non-root user anyway. We've required
>> passwordless sudo access for custom ssh users for a long time I think (e.g.
>> it's in pacific docs
>> https://docs.ceph.com/en/pacific/cephadm/install/#further-information-about-cephadm-bootstrap,
>> see the point on "--ssh-user"). Did this actually work for you before in
>> pacific with a non-root user that doesn't have sudo privileges? I had
>> assumed that had never worked.
>>
>> On Wed, Apr 12, 2023 at 10:38 AM Reza Bakhshayeshi 
>> wrote:
>>
>>> Thank you Adam for your response,
>>>
>>> I tried all your comments and the troubleshooting link you sent. From
>>> the Quincy mgrs containers, they can ssh into all other Pacific nodes
>>> successfully by running the exact command in the log output and vice versa.
>>>
>>> Here are some debug logs from the cephadm while updating:
>>>
>>> 2023-04-12T11:35:56.260958+ mgr.host8.jukgqm (mgr.4468627) 103 :
>>> cephadm [DBG] Opening connection to cephadmin@x.x.x.x with ssh options
>>> '-F /tmp/cephadm-conf-2bbfubub -i /tmp/cephadm-identity-7x2m8gvr'
>>> 2023-04-12T11:35:56.525091+ mgr.host8.jukgqm (mgr.4468627) 144 :
>>> cephadm [DBG] _run_cephadm : command = ls
>>> 2023-04-12T11:35:56.525406+ mgr.host8.jukgqm (mgr.4468627) 145 :
>>> cephadm [DBG] _run_cephadm : args = []
>>> 2023-04-12T11:35:56.525571+ mgr.host8.jukgqm (mgr.4468627) 146 :
>>> cephadm [DBG] mon container image my-private-repo/quay-io/ceph/ceph@sha256
>>> :1b9803c8984bef8b82f05e233e8fe8ed8f0bba8e5cc2c57f6efaccbeea682add
>>> 2023-04-12T11:35:56.525619+ mgr.host8.jukgqm (mgr.4468627) 147 :
>>> cephadm [DBG] args: --image 
>>> my-private-repo/quay-io/ceph/ceph@sha256:1b9803c8984bef8b82f05e233e8fe8ed8f0bba8e5cc2c57f6efaccbeea682add
>>> ls
>>> 2023-04-12T11:35:56.525738+ mgr.host8.jukgqm (mgr.4468627) 148 :
>>> cephadm [DBG] Running command: sudo which python3
>>> 2023-04-12T11:35:56.534227+ mgr.host8.jukgqm (mgr.4468627) 149 :
>>> cephadm [DBG] Connection to host1 failed. Process exited with non-zero exit
>>> status 3
>>> 2023-04-12T11:35:56.534275+ mgr.host8.jukgqm (mgr.4468627) 150 :
>>> cephadm [DBG] _reset_con close host1
>>> 2023-04-12T11:35:56.540135+ mgr.host8.jukgqm (mgr.4468627) 158 :
>>> cephadm [DBG] Host "host1" marked as offline. Skipping gather facts refresh
>>> 2023-04-12T11:35:56.540178+ mgr.host8.jukgqm (mgr.4468627) 159 :
>>> cephadm [DBG] Host "host1" marked as offline. Skipping network refresh
>>> 

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Nizamudeen A
dashboard approved!

Regards,
Nizam

On Tue, May 2, 2023, 20:48 Yuri Weinstein  wrote:

> Please review the Release Notes - https://github.com/ceph/ceph/pull/51301
>
> Still seeking approvals for:
>
> rados - Neha, Radek, Laura
>   rook - Sébastien Han
>   dashboard - Ernesto
>
> fs - Venky, Patrick
> (upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8))
>
> ceph-volume - Guillaume
>
> On Tue, May 2, 2023 at 8:00 AM Casey Bodley  wrote:
> >
> > On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein 
> wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/59542#note-1
> > > Release Notes - TBD
> > >
> > > Seeking approvals for:
> > >
> > > smoke - Radek, Laura
> > > rados - Radek, Laura
> > >   rook - Sébastien Han
> > >   cephadm - Adam K
> > >   dashboard - Ernesto
> > >
> > > rgw - Casey
> >
> > rgw approved
> >
> > > rbd - Ilya
> > > krbd - Ilya
> > > fs - Venky, Patrick
> > > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> > > upgrade/pacific-p2p - Laura
> > > powercycle - Brad (SELinux denials)
> > > ceph-volume - Guillaume, Adam K
> > >
> > > Thx
> > > YuriW
> > > ___
> > > Dev mailing list -- d...@ceph.io
> > > To unsubscribe send an email to dev-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Janek Bevendorff

Hi Patrick,


Please be careful resetting the journal. It was not necessary. You can
try to recover the missing inode using cephfs-data-scan [2].


Yes. I did that very reluctantly after trying everything else as a last 
resort. But since it only gave me another error, I restored the previous 
state. Downgrading to the previous version only came to mind minutes 
before Dan wrote that there's a new assertion in 16.2.12 (I didn't 
expect a corruption issue to be "fixable" like that).




Thanks for the report. Unfortunately this looks like a false positive.
You're not using snapshots, right?


Or fortunately for me? We have an automated snapshot schedule which 
creates snapshots of certain top-level directories daily. Our main 
folder is /storage, which had this issue.



In any case, if you can reproduce it again with:


ceph config mds debug_mds 20
ceph config mds debug_ms 1


I'll try that tomorrow and let you know, thanks!


and upload the logs using ceph-post-file [1], that would be helpful to
understand what happened.

After that you can disable the check as Dan pointed out:

ceph config set mds mds_abort_on_newly_corrupt_dentry false
ceph config set mds mds_go_bad_corrupt_dentry false

NOTE FOR OTHER READERS OF THIS MAIL: it is not recommended to blindly
set these configs as the MDS is trying to catch legitimate metadata
corruption.

[1] https://docs.ceph.com/en/quincy/man/8/ceph-post-file/
[2] https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/



--

Bauhaus-Universität Weimar
Bauhausstr. 9a, R308
99423 Weimar, Germany

Phone: +49 3643 58 3577
www.webis.de
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Balancing Reads in Ceph

2023-05-02 Thread Alan Nair
Hi.
I am currently using Ceph for replicated storage to store many objects across 5 
nodes with 3x replication.
When I generate ~1000 read requests to a single object, they all get serviced 
by the same primary OSD. I would like to balance the reads across the replicas.
So I use the following:

auto read_op = rados_create_read_op();
rados_read_op_read(read_op, offset, outSize, buffer, _read, );
err = rados_read_op_operate(read_op, pool->ioctx, keyName.c_str(), 
LIBRADOS_OPERATION_BALANCE_READS);

However, this does not seem to balance the reads across replicas. I do not see 
what I am doing wrong in the above code.
Could you please guide me on this?

ceph-mon and ceph-osd are run on Ubuntu 22.04 installed via apt-get update ceph 
ceph-mds ceph-volume

If I should ask this question somewhere else, please point me in the right 
direction.

Thanks and regards,
Alan.
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th' ann an Oilthigh 
Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Patrick Donnelly
On Tue, May 2, 2023 at 10:31 AM Janek Bevendorff
 wrote:
>
> Hi,
>
> After a patch version upgrade from 16.2.10 to 16.2.12, our rank 0 MDS
> fails start start. After replaying the journal, it just crashes with
>
> [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry
> #0x1/storage [2,head] auth (dversion lock)
>
> Immediately after the upgrade, I had it running shortly, but then it
> decided to crash for unknown reasons and I cannot get it back up.
>
> We have five ranks in total, the other four seem to be fine. I backed up
> the journal and tried to run cephfs-journal-tool --rank=cephfs.storage:0
> event recover_dentries summary, but it never finishes only eats up a lot
> of RAM. I stopped it after an hour and 50GB RAM.
>
> Resetting the journal makes the MDS crash with a missing inode error on
> another top-level directory, so I re-imported the backed-up journal. Is
> there any way to recover from this without rebuilding the whole file system?

Please be careful resetting the journal. It was not necessary. You can
try to recover the missing inode using cephfs-data-scan [2].

Thanks for the report. Unfortunately this looks like a false positive.
You're not using snapshots, right?

In any case, if you can reproduce it again with:

> ceph config mds debug_mds 20
> ceph config mds debug_ms 1

and upload the logs using ceph-post-file [1], that would be helpful to
understand what happened.

After that you can disable the check as Dan pointed out:

ceph config set mds mds_abort_on_newly_corrupt_dentry false
ceph config set mds mds_go_bad_corrupt_dentry false

NOTE FOR OTHER READERS OF THIS MAIL: it is not recommended to blindly
set these configs as the MDS is trying to catch legitimate metadata
corruption.

[1] https://docs.ceph.com/en/quincy/man/8/ceph-post-file/
[2] https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I use not-replicated pool (replication 1 or raid-0)

2023-05-02 Thread mhnx
Thank you for the explanation Frank.

I also agree with you, Ceph is not designed for this kind of use case
but I tried to continue what I know.
My idea was exactly what you described, I was trying to automate
cleaning or recreating on any failure.

As you can see below, rep1 pool is very fast:
- Create: time for i in {1..9}; do head -c 1K randfile$i; done
replication 2 : 31m59.917s
replication 1 : 7m6.046s

- Delete: time rm -rf testdir/
replication 2 : 11m56.994s
replication 1 : 0m40.756s
-

I started learning DRBD, I will also check BeeGFS thanks for the advice.

Regards.

Frank Schilder , 1 May 2023 Pzt, 10:27 tarihinde şunu yazdı:
>
> I think you misunderstood Janne's reply. The main statement is at the end, 
> ceph is not designed for an "I don't care about data" use case. If you need 
> speed for temporary data where you can sustain data loss, go for something 
> simpler. For example, we use beegfs with great success for a burst buffer for 
> an HPC cluster. It is very lightweight and will pull out all performance your 
> drives can offer. In case of disaster it is easily possible to clean up. 
> Beegfs does not care about lost data, such data will simply become 
> inaccessible while everything else just moves on. It will not try to 
> self-heal either. It doesn't even scrub data, so no competition of users with 
> admin IO.
>
> Its pretty much your use case. We clean it up every 6-8 weeks and if 
> something breaks we just redeploy the whole thing from scratch. Performance 
> is great and its a very simple and economic system to administrate. No need 
> for the whole ceph daemon engine with large RAM requirements and extra admin 
> daemons.
>
> Use ceph for data you want to survive a nuclear blast. Don't use it for 
> things its not made for and then complain.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: mhnx 
> Sent: Saturday, April 29, 2023 5:48 AM
> To: Janne Johansson
> Cc: Ceph Users
> Subject: [ceph-users] Re: How can I use not-replicated pool (replication 1 or 
> raid-0)
>
> Hello Janne, thank you for your response.
>
> I understand your advice and be sure that I've designed too many EC
> pools and I know the mess. This is not an option because I need SPEED.
>
> Please let me tell you, my hardware first to meet the same vision.
> Server: R620
> Cpu: 2 x Xeon E5-2630 v2 @ 2.60GHz
> Ram: 128GB - DDR3
> Disk1: 20x Samsung SSD 860 2TB
> Disk2: 10x Samsung SSD 870 2TB
>
> My ssds does not have PLP. Because of that, every ceph write also
> waits for TRIM. I want to know how much latency we are talking about
> because I'm thinking of adding PLP NVME for wal+db cache to gain some
> speed.
> As you can see, I even try to gain from every TRIM command.
> Currently I'm testing replication 2 pool and even this speed is not
> enough for my use case.
> Now I'm trying to boost the deletion speed because I'm writing and
> deleting files all the time and this never ends.
> I write this mail because replication 1 will decrease the deletion
> speed but still I'm trying to tune some MDS+ODS parameters to increase
> delete speed.
>
> Any help and idea will be great for me. Thanks.
> Regards.
>
>
>
> Janne Johansson , 12 Nis 2023 Çar, 10:10
> tarihinde şunu yazdı:
> >
> > Den mån 10 apr. 2023 kl 22:31 skrev mhnx :
> > > Hello.
> > > I have a 10 node cluster. I want to create a non-replicated pool
> > > (replication 1) and I want to ask some questions about it:
> > >
> > > Let me tell you my use case:
> > > - I don't care about losing data,
> > > - All of my data is JUNK and these junk files are usually between 1KB to 
> > > 32MB.
> > > - These files will be deleted in 5 days.
> > > - Writable space and I/O speed is more important.
> > > - I have high Write/Read/Delete operations, minimum 200GB a day.
> >
> > That is "only" 18MB/s which should easily be doable even with
> > repl=2,3,4. or EC. This of course depends on speed of drives, network,
> > cpus and all that, but in itself it doesn't seem too hard to achieve
> > in terms of average speeds. We have EC8+3 rgw backed by some 12-14 OSD
> > hosts with hdd and nvme (for wal+db) that can ingest over 1GB/s if you
> > parallelize the rgw streams, so 18MB/s seems totally doable with 10
> > decent machines. Even with replication.
> >
> > > I'm afraid that, in any failure, I won't be able to access the whole
> > > cluster. Losing data is okay but I have to ignore missing files,
> >
> > Even with repl=1, in case of a failure, the cluster will still aim at
> > fixing itself rather than ignoring currently lost data and moving on,
> > so any solution that involves "forgetting" about lost data would need
> > a ceph operator telling the cluster to ignore all the missing parts
> > and to recreate the broken PGs. This would not be automatic.
> >
> >
> > --
> > May the most significant bit of your 

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Yuri Weinstein
Please review the Release Notes - https://github.com/ceph/ceph/pull/51301

Still seeking approvals for:

rados - Neha, Radek, Laura
  rook - Sébastien Han
  dashboard - Ernesto

fs - Venky, Patrick
(upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8))

ceph-volume - Guillaume

On Tue, May 2, 2023 at 8:00 AM Casey Bodley  wrote:
>
> On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59542#note-1
> > Release Notes - TBD
> >
> > Seeking approvals for:
> >
> > smoke - Radek, Laura
> > rados - Radek, Laura
> >   rook - Sébastien Han
> >   cephadm - Adam K
> >   dashboard - Ernesto
> >
> > rgw - Casey
>
> rgw approved
>
> > rbd - Ilya
> > krbd - Ilya
> > fs - Venky, Patrick
> > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> > upgrade/pacific-p2p - Laura
> > powercycle - Brad (SELinux denials)
> > ceph-volume - Guillaume, Adam K
> >
> > Thx
> > YuriW
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Janek Bevendorff

Thanks!

I tried downgrading to 16.2.10 and was able to get it running again, but 
after a reboot, got a warning that two of the OSDs on that host had 
broken Bluestore compression. Restarting the two OSDs again got rid of 
it, but that's still a bit concerning.



On 02/05/2023 16:48, Dan van der Ster wrote:

Hi Janek,

That assert is part of a new corruption check added in 16.2.12 -- see
https://github.com/ceph/ceph/commit/1771aae8e79b577acde749a292d9965264f20202

The abort is controlled by a new option:

+Option("mds_abort_on_newly_corrupt_dentry", Option::TYPE_BOOL,
Option::LEVEL_ADVANCED)
+.set_default(true)
+.set_description("MDS will abort if dentry is detected newly corrupted."),

So in theory you could switch that off, but it is concerning that the
metadata is corrupted already.
I'm cc'ing Patrick who has been working on this issue.

Cheers, Dan

__
Clyso GmbH | https://www.clyso.com

On Tue, May 2, 2023 at 7:32 AM Janek Bevendorff
 wrote:

Hi,

After a patch version upgrade from 16.2.10 to 16.2.12, our rank 0 MDS
fails start start. After replaying the journal, it just crashes with

[ERR] : MDS abort because newly corrupt dentry to be committed: [dentry
#0x1/storage [2,head] auth (dversion lock)

Immediately after the upgrade, I had it running shortly, but then it
decided to crash for unknown reasons and I cannot get it back up.

We have five ranks in total, the other four seem to be fine. I backed up
the journal and tried to run cephfs-journal-tool --rank=cephfs.storage:0
event recover_dentries summary, but it never finishes only eats up a lot
of RAM. I stopped it after an hour and 50GB RAM.

Resetting the journal makes the MDS crash with a missing inode error on
another top-level directory, so I re-imported the backed-up journal. Is
there any way to recover from this without rebuilding the whole file system?

Thanks
Janek


Here's the full crash log:


May 02 16:16:53 xxx077 ceph-mds[3047358]:-29>
2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.1711712 Finished
replaying journal
May 02 16:16:53 xxx077 ceph-mds[3047358]:-28>
2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.1711712 making mds
journal writeable
May 02 16:16:53 xxx077 ceph-mds[3047358]:-27>
2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.journaler.mdlog(ro)
set_writeable
May 02 16:16:53 xxx077 ceph-mds[3047358]:-26>
2023-05-02T16:16:52.761+0200 7f51f878b700  2 mds.0.1711712 i am not
alone, moving to state resolve
May 02 16:16:53 xxx077 ceph-mds[3047358]:-25>
2023-05-02T16:16:52.761+0200 7f51f878b700  3 mds.0.1711712 request_state
up:resolve
May 02 16:16:53 xxx077 ceph-mds[3047358]:-24>
2023-05-02T16:16:52.761+0200 7f51f878b700  5 mds.beacon.xxx077
set_want_state: up:replay -> up:resolve
May 02 16:16:53 xxx077 ceph-mds[3047358]:-23>
2023-05-02T16:16:52.761+0200 7f51f878b700  5 mds.beacon.xxx077 Sending
beacon up:resolve seq 15
May 02 16:16:53 xxx077 ceph-mds[3047358]:-22>
2023-05-02T16:16:52.761+0200 7f51f878b700 10 monclient:
_send_mon_message to mon.xxx056 at v2:141.54.133.56:3300/0
May 02 16:16:53 xxx077 ceph-mds[3047358]:-21>
2023-05-02T16:16:53.113+0200 7f51fef98700 10 monclient: tick
May 02 16:16:53 xxx077 ceph-mds[3047358]:-20>
2023-05-02T16:16:53.113+0200 7f51fef98700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after
2023-05-02T16:16:23.118186+0200)
May 02 16:16:53 xxx077 ceph-mds[3047358]:-19>
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.xxx077 Updating MDS map
to version 1711713 from mon.1
May 02 16:16:53 xxx077 ceph-mds[3047358]:-18>
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712
handle_mds_map i am now mds.0.1711712
May 02 16:16:53 xxx077 ceph-mds[3047358]:-17>
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712
handle_mds_map state change up:replay --> up:resolve
May 02 16:16:53 xxx077 ceph-mds[3047358]:-16>
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 resolve_start
May 02 16:16:53 xxx077 ceph-mds[3047358]:-15>
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 reopen_log
May 02 16:16:53 xxx077 ceph-mds[3047358]:-14>
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 recovery set
is 1,2,3,4
May 02 16:16:53 xxx077 ceph-mds[3047358]:-13>
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 recovery set
is 1,2,3,4
May 02 16:16:53 xxx077 ceph-mds[3047358]:-12>
2023-05-02T16:16:53.373+0200 7f5202fa0700 10 monclient: get_auth_request
con 0x5574fe74c400 auth_method 0
May 02 16:16:53 xxx077 ceph-mds[3047358]:-11>
2023-05-02T16:16:53.373+0200 7f52037a1700 10 monclient: get_auth_request
con 0x5574fe40fc00 auth_method 0
May 02 16:16:53 xxx077 ceph-mds[3047358]:-10>
2023-05-02T16:16:53.373+0200 7f520279f700 10 monclient: get_auth_request
con 0x5574f932fc00 auth_method 0
May 02 16:16:53 xxx077 ceph-mds[3047358]: -9>
2023-05-02T16:16:53.373+0200 7f520279f700 10 monclient: get_auth_request
con 0x5574ffce2000 auth_method 

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Gregory Farnum
On Tue, May 2, 2023 at 7:54 AM Igor Fedotov  wrote:
>
>
> On 5/2/2023 11:32 AM, Nikola Ciprich wrote:
> > I've updated cluster to 17.2.6 some time ago, but the problem persists. 
> > This is
> > especially annoying in connection with https://tracker.ceph.com/issues/56896
> > as restarting OSDs is quite painfull when half of them crash..
> > with best regards
> >
> Feel free to set osd_fast_shutdown_timeout to zero to workaround the
> above. IMO this assertion is a nonsence and I don't see any usage of
> this timeout parameter other than just throw an assertion.

This was added by Gabi in
https://github.com/ceph/ceph/commit/9b2a64a5f6ea743b2a4f4c2dbd703248d88b2a96;
presumably he has insight.

I wonder if it's just a debug config so we can see slow shutdowns in
our test runs? In which case it should certainly default to 0 and get
set for those test suites.
-Greg

>
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Casey Bodley
On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59542#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Radek, Laura
> rados - Radek, Laura
>   rook - Sébastien Han
>   cephadm - Adam K
>   dashboard - Ernesto
>
> rgw - Casey

rgw approved

> rbd - Ilya
> krbd - Ilya
> fs - Venky, Patrick
> upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> upgrade/pacific-p2p - Laura
> powercycle - Brad (SELinux denials)
> ceph-volume - Guillaume, Adam K
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-05-02 Thread Reza Bakhshayeshi
Hi Adam,

I'm still struggling with this issue. I also checked it one more time with
newer versions, upgrading the cluster from 16.2.11 to 16.2.12 was
successful but from 16.2.12 to 17.2.6 failed again with the same ssh errors
(I checked
https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#ssh-errors a
couple of times and all keys/access are fine).

[root@host1 ~]# ceph health detail
HEALTH_ERR Upgrade: Failed to connect to host host2 at addr (x.x.x.x)
[ERR] UPGRADE_OFFLINE_HOST: Upgrade: Failed to connect to host host2 at
addr (x.x.x.x)
SSH connection failed to host2 at addr (x.x.x.x): Host(s) were marked
offline: {'host2', 'host6', 'host9', 'host4', 'host3', 'host5', 'host1',
'host7', 'host8'}

The interesting thing is that always (total number of mgrs) - 1 is
upgraded, If I provision 5 MGRs then 4 of them, and for 3, 2 of them!

As long as I'm in an internal environment, I also checked the process with
Quincy cephadm binary file. FYI I'm using stretch mode on this cluster.

I don't understand why Quincy MGRs cannot ssh into Pacific nodes, if you
have any more hints I would be really glad to hear.

Best regards,
Reza



On Wed, 12 Apr 2023 at 17:18, Adam King  wrote:

> Ah, okay. Someone else had opened an issue about the same thing after
> the 17.2.5 release I believe. It's changed in 17.2.6 at least to only use
> sudo for non-root users
> https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/cephadm/ssh.py#L148-L153.
> But it looks like you're also using a non-root user anyway. We've required
> passwordless sudo access for custom ssh users for a long time I think (e.g.
> it's in pacific docs
> https://docs.ceph.com/en/pacific/cephadm/install/#further-information-about-cephadm-bootstrap,
> see the point on "--ssh-user"). Did this actually work for you before in
> pacific with a non-root user that doesn't have sudo privileges? I had
> assumed that had never worked.
>
> On Wed, Apr 12, 2023 at 10:38 AM Reza Bakhshayeshi 
> wrote:
>
>> Thank you Adam for your response,
>>
>> I tried all your comments and the troubleshooting link you sent. From the
>> Quincy mgrs containers, they can ssh into all other Pacific nodes
>> successfully by running the exact command in the log output and vice versa.
>>
>> Here are some debug logs from the cephadm while updating:
>>
>> 2023-04-12T11:35:56.260958+ mgr.host8.jukgqm (mgr.4468627) 103 :
>> cephadm [DBG] Opening connection to cephadmin@x.x.x.x with ssh options
>> '-F /tmp/cephadm-conf-2bbfubub -i /tmp/cephadm-identity-7x2m8gvr'
>> 2023-04-12T11:35:56.525091+ mgr.host8.jukgqm (mgr.4468627) 144 :
>> cephadm [DBG] _run_cephadm : command = ls
>> 2023-04-12T11:35:56.525406+ mgr.host8.jukgqm (mgr.4468627) 145 :
>> cephadm [DBG] _run_cephadm : args = []
>> 2023-04-12T11:35:56.525571+ mgr.host8.jukgqm (mgr.4468627) 146 :
>> cephadm [DBG] mon container image my-private-repo/quay-io/ceph/ceph@sha256
>> :1b9803c8984bef8b82f05e233e8fe8ed8f0bba8e5cc2c57f6efaccbeea682add
>> 2023-04-12T11:35:56.525619+ mgr.host8.jukgqm (mgr.4468627) 147 :
>> cephadm [DBG] args: --image 
>> my-private-repo/quay-io/ceph/ceph@sha256:1b9803c8984bef8b82f05e233e8fe8ed8f0bba8e5cc2c57f6efaccbeea682add
>> ls
>> 2023-04-12T11:35:56.525738+ mgr.host8.jukgqm (mgr.4468627) 148 :
>> cephadm [DBG] Running command: sudo which python3
>> 2023-04-12T11:35:56.534227+ mgr.host8.jukgqm (mgr.4468627) 149 :
>> cephadm [DBG] Connection to host1 failed. Process exited with non-zero exit
>> status 3
>> 2023-04-12T11:35:56.534275+ mgr.host8.jukgqm (mgr.4468627) 150 :
>> cephadm [DBG] _reset_con close host1
>> 2023-04-12T11:35:56.540135+ mgr.host8.jukgqm (mgr.4468627) 158 :
>> cephadm [DBG] Host "host1" marked as offline. Skipping gather facts refresh
>> 2023-04-12T11:35:56.540178+ mgr.host8.jukgqm (mgr.4468627) 159 :
>> cephadm [DBG] Host "host1" marked as offline. Skipping network refresh
>> 2023-04-12T11:35:56.540408+ mgr.host8.jukgqm (mgr.4468627) 160 :
>> cephadm [DBG] Host "host1" marked as offline. Skipping device refresh
>> 2023-04-12T11:35:56.540490+ mgr.host8.jukgqm (mgr.4468627) 161 :
>> cephadm [DBG] Host "host1" marked as offline. Skipping osdspec preview
>> refresh
>> 2023-04-12T11:35:56.540527+ mgr.host8.jukgqm (mgr.4468627) 162 :
>> cephadm [DBG] Host "host1" marked as offline. Skipping autotune
>> 2023-04-12T11:35:56.540978+ mgr.host8.jukgqm (mgr.4468627) 163 :
>> cephadm [DBG] Connection to host1 failed. Process exited with non-zero exit
>> status 3
>> 2023-04-12T11:35:56.796966+ mgr.host8.jukgqm (mgr.4468627) 728 :
>> cephadm [ERR] Upgrade: Paused due to UPGRADE_OFFLINE_HOST: Upgrade: Failed
>> to connect to host host1 at addr (x.x.x.x)
>>
>> As I can see here, it turns out sudo is added to the code to be able to
>> continue:
>>
>>
>> https://github.com/ceph/ceph/blob/v17.2.5/src/pybind/mgr/cephadm/ssh.py#L143
>>
>> I cannot privilege the cephadmin user to run sudo commands for some
>> policy reasons, could this be the root cause of the 

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Yuri Weinstein
Venky, I did plan to cherry-pick this PR if you approve this (this PR
was used for a rerun)

On Tue, May 2, 2023 at 7:51 AM Venky Shankar  wrote:
>
> Hi Yuri,
>
> On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59542#note-1
> > Release Notes - TBD
> >
> > Seeking approvals for:
> >
> > smoke - Radek, Laura
> > rados - Radek, Laura
> >   rook - Sébastien Han
> >   cephadm - Adam K
> >   dashboard - Ernesto
> >
> > rgw - Casey
> > rbd - Ilya
> > krbd - Ilya
> > fs - Venky, Patrick
>
> There are a couple of new failures which are qa/test related - I'll
> have a look at those (they _do not_ look serious).
>
> Also, Yuri, do you plan to merge
>
> https://github.com/ceph/ceph/pull/51232
>
> into the pacific-release branch although it's tagged with one of your
> other pacific runs?
>
> > upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> > upgrade/pacific-p2p - Laura
> > powercycle - Brad (SELinux denials)
> > ceph-volume - Guillaume, Adam K
> >
> > Thx
> > YuriW
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Cheers,
> Venky
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Venky Shankar
Hi Yuri,

On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59542#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Radek, Laura
> rados - Radek, Laura
>   rook - Sébastien Han
>   cephadm - Adam K
>   dashboard - Ernesto
>
> rgw - Casey
> rbd - Ilya
> krbd - Ilya
> fs - Venky, Patrick

There are a couple of new failures which are qa/test related - I'll
have a look at those (they _do not_ look serious).

Also, Yuri, do you plan to merge

https://github.com/ceph/ceph/pull/51232

into the pacific-release branch although it's tagged with one of your
other pacific runs?

> upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> upgrade/pacific-p2p - Laura
> powercycle - Brad (SELinux denials)
> ceph-volume - Guillaume, Adam K
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Igor Fedotov



On 5/2/2023 11:32 AM, Nikola Ciprich wrote:

I've updated cluster to 17.2.6 some time ago, but the problem persists. This is
especially annoying in connection with https://tracker.ceph.com/issues/56896
as restarting OSDs is quite painfull when half of them crash..
with best regards

Feel free to set osd_fast_shutdown_timeout to zero to workaround the 
above. IMO this assertion is a nonsence and I don't see any usage of 
this timeout parameter other than just throw an assertion.



--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Dan van der Ster
Hi Janek,

That assert is part of a new corruption check added in 16.2.12 -- see
https://github.com/ceph/ceph/commit/1771aae8e79b577acde749a292d9965264f20202

The abort is controlled by a new option:

+Option("mds_abort_on_newly_corrupt_dentry", Option::TYPE_BOOL,
Option::LEVEL_ADVANCED)
+.set_default(true)
+.set_description("MDS will abort if dentry is detected newly corrupted."),

So in theory you could switch that off, but it is concerning that the
metadata is corrupted already.
I'm cc'ing Patrick who has been working on this issue.

Cheers, Dan

__
Clyso GmbH | https://www.clyso.com

On Tue, May 2, 2023 at 7:32 AM Janek Bevendorff
 wrote:
>
> Hi,
>
> After a patch version upgrade from 16.2.10 to 16.2.12, our rank 0 MDS
> fails start start. After replaying the journal, it just crashes with
>
> [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry
> #0x1/storage [2,head] auth (dversion lock)
>
> Immediately after the upgrade, I had it running shortly, but then it
> decided to crash for unknown reasons and I cannot get it back up.
>
> We have five ranks in total, the other four seem to be fine. I backed up
> the journal and tried to run cephfs-journal-tool --rank=cephfs.storage:0
> event recover_dentries summary, but it never finishes only eats up a lot
> of RAM. I stopped it after an hour and 50GB RAM.
>
> Resetting the journal makes the MDS crash with a missing inode error on
> another top-level directory, so I re-imported the backed-up journal. Is
> there any way to recover from this without rebuilding the whole file system?
>
> Thanks
> Janek
>
>
> Here's the full crash log:
>
>
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-29>
> 2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.1711712 Finished
> replaying journal
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-28>
> 2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.1711712 making mds
> journal writeable
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-27>
> 2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.journaler.mdlog(ro)
> set_writeable
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-26>
> 2023-05-02T16:16:52.761+0200 7f51f878b700  2 mds.0.1711712 i am not
> alone, moving to state resolve
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-25>
> 2023-05-02T16:16:52.761+0200 7f51f878b700  3 mds.0.1711712 request_state
> up:resolve
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-24>
> 2023-05-02T16:16:52.761+0200 7f51f878b700  5 mds.beacon.xxx077
> set_want_state: up:replay -> up:resolve
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-23>
> 2023-05-02T16:16:52.761+0200 7f51f878b700  5 mds.beacon.xxx077 Sending
> beacon up:resolve seq 15
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-22>
> 2023-05-02T16:16:52.761+0200 7f51f878b700 10 monclient:
> _send_mon_message to mon.xxx056 at v2:141.54.133.56:3300/0
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-21>
> 2023-05-02T16:16:53.113+0200 7f51fef98700 10 monclient: tick
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-20>
> 2023-05-02T16:16:53.113+0200 7f51fef98700 10 monclient:
> _check_auth_rotating have uptodate secrets (they expire after
> 2023-05-02T16:16:23.118186+0200)
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-19>
> 2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.xxx077 Updating MDS map
> to version 1711713 from mon.1
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-18>
> 2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712
> handle_mds_map i am now mds.0.1711712
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-17>
> 2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712
> handle_mds_map state change up:replay --> up:resolve
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-16>
> 2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 resolve_start
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-15>
> 2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 reopen_log
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-14>
> 2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 recovery set
> is 1,2,3,4
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-13>
> 2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 recovery set
> is 1,2,3,4
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-12>
> 2023-05-02T16:16:53.373+0200 7f5202fa0700 10 monclient: get_auth_request
> con 0x5574fe74c400 auth_method 0
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-11>
> 2023-05-02T16:16:53.373+0200 7f52037a1700 10 monclient: get_auth_request
> con 0x5574fe40fc00 auth_method 0
> May 02 16:16:53 xxx077 ceph-mds[3047358]:-10>
> 2023-05-02T16:16:53.373+0200 7f520279f700 10 monclient: get_auth_request
> con 0x5574f932fc00 auth_method 0
> May 02 16:16:53 xxx077 ceph-mds[3047358]: -9>
> 2023-05-02T16:16:53.373+0200 7f520279f700 10 monclient: get_auth_request
> con 0x5574ffce2000 auth_method 0
> May 02 16:16:53 xxx077 ceph-mds[3047358]: -8>
> 2023-05-02T16:16:53.377+0200 7f5202fa0700  5 mds.beacon.xxx077 received
> beacon reply 

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Igor Fedotov

Hi Nikola,

I'd suggest to start monitoring perf counters for your osds. 
op_w_lat/subop_w_lat ones specifically. I presume they raise eventually, 
don't they?


Does subop_w_lat grow for every OSD or just a subset of them? How large 
is the delta between the best and the worst OSDs after a one week 
period? How many "bad" OSDs are at this point?



And some more questions:

How large are space utilization/fragmentation for your OSDs?

Is the same performance drop observed for artificial benchmarks, e.g. 4k 
random writes to a fresh RBD image using fio?


Is there any RAM utilization growth for OSD processes over time? Or may 
be any suspicious growth in mempool stats?



As a blind and brute force approach you might also want to compact 
RocksDB through ceph-kvstore-tool and switch bluestore allocator to 
bitmap (presuming default hybrid one is effective right now). Please do 
one modification at a time to realize what action is actually helpful if 
any.



Thanks,

Igor

On 5/2/2023 11:32 AM, Nikola Ciprich wrote:

Hello dear CEPH users and developers,

we're dealing with strange problems.. we're having 12 node alma linux 9 cluster,
initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running bunch
of KVM virtual machines accessing volumes using RBD.

everything is working well, but there is strange and for us quite serious issue
  - speed of write operations (both sequential and random) is constantly 
degrading
  drastically to almost unusable numbers (in ~1week it drops from ~70k 4k 
writes/s
  from 1 VM  to ~7k writes/s)

When I restart all OSD daemons, numbers immediately return to normal..

volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
INTEL SSDPE2KX080T8 NVMEs.

I've updated cluster to 17.2.6 some time ago, but the problem persists. This is
especially annoying in connection with https://tracker.ceph.com/issues/56896
as restarting OSDs is quite painfull when half of them crash..

I don't see anything suspicious, nodes load is quite low, no logs errors,
network latency and throughput is OK too

Anyone having simimar issue?

I'd like to ask for hints on what should I check further..

we're running lots of 14.2.x and 15.2.x clusters, none showing similar
issue, so I'm suspecting this is something related to quincy

thanks a lot in advance

with best regards

nikola ciprich




--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Janek Bevendorff

Hi,

After a patch version upgrade from 16.2.10 to 16.2.12, our rank 0 MDS 
fails start start. After replaying the journal, it just crashes with


[ERR] : MDS abort because newly corrupt dentry to be committed: [dentry 
#0x1/storage [2,head] auth (dversion lock)


Immediately after the upgrade, I had it running shortly, but then it 
decided to crash for unknown reasons and I cannot get it back up.


We have five ranks in total, the other four seem to be fine. I backed up 
the journal and tried to run cephfs-journal-tool --rank=cephfs.storage:0 
event recover_dentries summary, but it never finishes only eats up a lot 
of RAM. I stopped it after an hour and 50GB RAM.


Resetting the journal makes the MDS crash with a missing inode error on 
another top-level directory, so I re-imported the backed-up journal. Is 
there any way to recover from this without rebuilding the whole file system?


Thanks
Janek


Here's the full crash log:


May 02 16:16:53 xxx077 ceph-mds[3047358]:    -29> 
2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.1711712 Finished 
replaying journal
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -28> 
2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.1711712 making mds 
journal writeable
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -27> 
2023-05-02T16:16:52.761+0200 7f51f878b700  1 mds.0.journaler.mdlog(ro) 
set_writeable
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -26> 
2023-05-02T16:16:52.761+0200 7f51f878b700  2 mds.0.1711712 i am not 
alone, moving to state resolve
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -25> 
2023-05-02T16:16:52.761+0200 7f51f878b700  3 mds.0.1711712 request_state 
up:resolve
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -24> 
2023-05-02T16:16:52.761+0200 7f51f878b700  5 mds.beacon.xxx077 
set_want_state: up:replay -> up:resolve
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -23> 
2023-05-02T16:16:52.761+0200 7f51f878b700  5 mds.beacon.xxx077 Sending 
beacon up:resolve seq 15
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -22> 
2023-05-02T16:16:52.761+0200 7f51f878b700 10 monclient: 
_send_mon_message to mon.xxx056 at v2:141.54.133.56:3300/0
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -21> 
2023-05-02T16:16:53.113+0200 7f51fef98700 10 monclient: tick
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -20> 
2023-05-02T16:16:53.113+0200 7f51fef98700 10 monclient: 
_check_auth_rotating have uptodate secrets (they expire after 
2023-05-02T16:16:23.118186+0200)
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -19> 
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.xxx077 Updating MDS map 
to version 1711713 from mon.1
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -18> 
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 
handle_mds_map i am now mds.0.1711712
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -17> 
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 
handle_mds_map state change up:replay --> up:resolve
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -16> 
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 resolve_start
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -15> 
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 reopen_log
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -14> 
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 recovery set 
is 1,2,3,4
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -13> 
2023-05-02T16:16:53.373+0200 7f51fff9a700  1 mds.0.1711712 recovery set 
is 1,2,3,4
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -12> 
2023-05-02T16:16:53.373+0200 7f5202fa0700 10 monclient: get_auth_request 
con 0x5574fe74c400 auth_method 0
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -11> 
2023-05-02T16:16:53.373+0200 7f52037a1700 10 monclient: get_auth_request 
con 0x5574fe40fc00 auth_method 0
May 02 16:16:53 xxx077 ceph-mds[3047358]:    -10> 
2023-05-02T16:16:53.373+0200 7f520279f700 10 monclient: get_auth_request 
con 0x5574f932fc00 auth_method 0
May 02 16:16:53 xxx077 ceph-mds[3047358]: -9> 
2023-05-02T16:16:53.373+0200 7f520279f700 10 monclient: get_auth_request 
con 0x5574ffce2000 auth_method 0
May 02 16:16:53 xxx077 ceph-mds[3047358]: -8> 
2023-05-02T16:16:53.377+0200 7f5202fa0700  5 mds.beacon.xxx077 received 
beacon reply up:resolve seq 15 rtt 0.616008
May 02 16:16:53 xxx077 ceph-mds[3047358]: -7> 
2023-05-02T16:16:53.393+0200 7f51fff9a700  5 mds.xxx077 handle_mds_map 
old map epoch 1711713 <= 1711713, discarding
May 02 16:16:53 xxx077 ceph-mds[3047358]: -6> 
2023-05-02T16:16:53.393+0200 7f51fff9a700  5 mds.xxx077 handle_mds_map 
old map epoch 1711713 <= 1711713, discarding
May 02 16:16:53 xxx077 ceph-mds[3047358]: -5> 
2023-05-02T16:16:53.393+0200 7f51fff9a700  5 mds.xxx077 handle_mds_map 
old map epoch 1711713 <= 1711713, discarding
May 02 16:16:53 xxx077 ceph-mds[3047358]: -4> 
2023-05-02T16:16:53.393+0200 7f51fff9a700  5 mds.xxx077 handle_mds_map 
old map epoch 1711713 <= 1711713, discarding
May 02 16:16:53 xxx077 ceph-mds[3047358]: -3> 
2023-05-02T16:16:53.545+0200 7f51fff9a700 -1 

[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-05-02 Thread Eugen Block

Hi,

disclaimer: I haven't used LRC in a real setup yet, so there might be  
some misunderstandings on my side. But I tried to play around with one  
of my test clusters (Nautilus). Because I'm limited in the number of  
hosts (6 across 3 virtual DCs) I tried two different profiles with  
lower numbers to get a feeling for how that works.


# first attempt
ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc k=4  
m=2 l=3 crush-failure-domain=host


For every third OSD one parity chunk is added, so 2 more chunks to  
store ==> 8 chunks in total. Since my failure-domain is host and I  
only have 6 I get incomplete PGs.


# second attempt
ceph:~ # ceph osd erasure-code-profile set LRCprofile plugin=lrc k=2  
m=2 l=2 crush-failure-domain=host


This gives me 6 chunks in total to store across 6 hosts which works:

ceph:~ # ceph pg ls-by-pool lrcpool
PG   OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS*  
LOG STATESINCE VERSION REPORTED UPACTING
 SCRUB_STAMPDEEP_SCRUB_STAMP
50.0   10 0   0   619   0  0
1 active+clean   72s 18410'1 18415:54   [27,13,0,2,25,7]p27
[27,13,0,2,25,7]p27 2023-05-02 14:53:54.322135 2023-05-02  
14:53:54.322135
50.1   00 0   0 0   0  0
0 active+clean6m 0'0 18414:26 [27,33,22,6,13,34]p27  
[27,33,22,6,13,34]p27 2023-05-02 14:53:54.322135 2023-05-02  
14:53:54.322135
50.2   00 0   0 0   0  0
0 active+clean6m 0'0 18413:25   [1,28,14,4,31,21]p1
[1,28,14,4,31,21]p1 2023-05-02 14:53:54.322135 2023-05-02  
14:53:54.322135
50.3   00 0   0 0   0  0
0 active+clean6m 0'0 18413:24   [8,16,26,33,7,25]p8
[8,16,26,33,7,25]p8 2023-05-02 14:53:54.322135 2023-05-02  
14:53:54.322135


After stopping all OSDs on one host I was still able to read and write  
into the pool, but after stopping a second host one PG from that pool  
went "down". That I don't fully understand yet, but I just started to  
look into it.
With your setup (12 hosts) I would recommend to not utilize all of  
them so you have capacity to recover, let's say one "spare" host per  
DC, leaving 9 hosts in total. A profile with k=3 m=3 l=2 could make  
sense here, resulting in 9 total chunks (one more parity chunks for  
every other OSD), min_size 4. But as I wrote, it probably doesn't have  
the resiliency for a DC failure, so that needs some further  
investigation.


Regards,
Eugen

Zitat von Michel Jouvin :


Hi,

No... our current setup is 3 datacenters with the same  
configuration, i.e. 1 mon/mgr + 4 OSD servers with 16 OSDs each.  
Thus the total of 12 OSDs servers. As with LRC plugin, k+m must be a  
multiple of l, I found that k=9/m=66/l=5 with  
crush-locality=datacenter was achieving my goal of being resilient  
to a datacenter failure. Because I had this, I considered that  
lowering the crush failure domain to osd was not a major issue in my  
case (as it would not be worst than a datacenter failure if all the  
shards are on the same server in a datacenter) and was working  
around the lack of hosts for k=9/m=6 (15 OSDs).


May be it helps, if I give the erasure code profile used:

crush-device-class=hdd
crush-failure-domain=osd
crush-locality=datacenter
crush-root=default
k=9
l=5
m=6
plugin=lrc

The previously mentioned strange number for min_size for the pool  
created with this profile has vanished after Quincy upgrade as this  
parameter is no longer in the CRUH map rule! and the `ceph osd pool  
get` command reports the expected number (10):


-


ceph osd pool get fink-z1.rgw.buckets.data min_size

min_size: 10


Cheers,

Michel

Le 29/04/2023 à 20:36, Curt a écrit :

Hello,

What is your current setup, 1 server pet data center with 12 osd  
each? What is your current crush rule and LRC crush rule?



On Fri, Apr 28, 2023, 12:29 Michel Jouvin  
 wrote:


   Hi,

   I think I found a possible cause of my PG down but still
   understand why.
   As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9,
   m=6) but I have only 12 OSD servers in the cluster. To workaround the
   problem I defined the failure domain as 'osd' with the reasoning
   that as
   I was using the LRC plugin, I had the warranty that I could loose
   a site
   without impact, thus the possibility to loose 1 OSD server. Am I
   wrong?

   Best regards,

   Michel

   Le 24/04/2023 à 13:24, Michel Jouvin a écrit :
   > Hi,
   >
   > I'm still interesting by getting feedback from those using the LRC
   > plugin about the right way to configure it... Last week I upgraded
   > from Pacific to Quincy (17.2.6) with cephadm which is doing the
   > upgrade host by host, checking if an OSD is ok to stop before
   actually
   > upgrading it. I had the surprise to see 1 or 2 PGs down at some
   points
   > in the upgrade 

[ceph-users] Re: Memory leak in MGR after upgrading to pacific.

2023-05-02 Thread Gary Molenkamp
To follow up on this issue,  I saw the additional comments on 
https://tracker.ceph.com/issues/59580 regarding mgr caps.
By setting the mgr user caps back to the default, I was able to reduce 
the memory leak from several 100MB/h to just a few MB/hr.


As the other commenter had posted, in order for zabbix to access OSD 
data via RESTful, the mgr caps were set to:
     ceph auth caps mgr.controller04.lvhgea mon 'allow *' osd 'allow *' 
mds 'allow *'


Gary


On 2023-04-27 08:38, Gary Molenkamp wrote:

Good morning,

After upgrading from Octopus (15.2.17) to Pacific (16.2.12) two days 
ago, I'm noticing that the MGR daemons keep failing over to standby 
and then back every 24hrs.   Watching the output of 'ceph orch ps' I 
can see that the memory consumption of the mgr is steadily growing 
until it becomes unresponsive.


When the mgr becomes unresponsive, tasks such as RESTful calls start 
to fail, and the standby eventually takes over after ~20 minutes. I've 
included a log of memory consumption (in 10 minute intervals) at the 
end of this message. While the cluster recovers during this issue, the 
loss of usage data during the outage, and the fact its occurring is 
problematic.  Any assistance would be appreciated.


Note, this is a cluster that has been upgraded from an original jewel 
based ceph using filestore, through bluestore conversion, container 
conversion, and now to Pacific.    The data below shows memory use 
with three mgr modules enabled:  cephadm, restful, iostat.   By 
disabling iostat, I can reduce the rate of memory consumption 
increasing to about 200MB/hr.


Thanks
Gary.



--
Gary Molenkamp  Science Technology Services
Systems Administrator   University of Western Ontario
molen...@uwo.ca http://sts.sci.uwo.ca
(519) 661-2111 x86882   (519) 661-3566
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD mirroring, asking for clarification

2023-05-02 Thread Eugen Block

Hi,

while your assumptions are correct (you can use the rest of the pool  
for other non-mirrored images), at least I'm not aware of any  
limitations, can I ask for the motivation behind this question? Mixing  
different use-cases doesn't seem like a good idea to me. There's  
always a chance that a client with caps for that pool deletes or  
modifies images or even the entire pool. Why not simply create a  
different pool and separate those clients?


Thanks,
Eugen

Zitat von wodel youchi :


Hi,

When using rbd mirroring, the mirroring concerns the images only, not the
whole pool? So, we don't need to have a dedicated pool in the destination
site to be mirrored, the only obligation is that the mirrored pools must
have the same name.

In other words, We create two pools with the same name, one on the source
site the other on the destination site, we create the mirror link (one way
or two ways replication), then we choose what images to sync.

Both pools can be used simultaneously on both sites, it's the mirrored
images that cannot be used simultaneously, only promoted ones.

Is this correct?

Regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-02 Thread Frank Schilder
Hi Arnaud,

thanks, that's a good one. The inode in question should be in cache at this 
time. It actually accepts the hex-code given in the log message and is really 
fast.

I hope I remember that for next time.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: MARTEL Arnaud 
Sent: Tuesday, May 2, 2023 11:20 AM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: client isn't responding to mclientcaps(revoke), 
pending pAsLsXsFsc issued pAsLsXsFsc

Hi,

Or you can query the MDS(s) with:
ceph tell mds.* dump inode  2>/dev/null | grep path

for example:
user@server:~$ ceph tell mds.* dump inode 1099836155033 2>/dev/null | grep path
"path": "/ec42/default/joliot/gipsi/gpu_burn.sif",
"stray_prior_path": "",


Arnaud


Le 01/05/2023 15:07, « Loic Tortay » mailto:tor...@cc.in2p3.fr>> a écrit :


On 01/05/2023 11:35, Frank Schilder wrote:
> Hi all,
>
> I think we might be hitting a known problem 
> (https://tracker.ceph.com/issues/57244 
> ). I don't want to fail the mds yet, 
> because we have troubles with older kclients that miss the mds restart and 
> hold on to cache entries referring to the killed instance, leading to hanging 
> jobs on our HPC cluster.
>
> I have seen this issue before and there was a process in D-state that 
> dead-locked itself. Usually, killing this process succeeded and resolved the 
> issue. However, this time I can't find such a process.
>
> The tracker mentions that one can delete the file/folder. I have the inode 
> number, but really don't want to start a find on a 1.5PB file system. Is 
> there a better way to find what path is causing the issue (ask the MDS 
> directly, look at a cache dump, or similar)? Is there an alternative to 
> deletion or MDS fail?
>
Hello,
If you have the inode number, you can retrieve the name with something like:
rados getxattr -p $POOL ${ino}. parent | \
ceph-dencoder type inode_backtrace_t import - decode dump_json | \
jq -M '[.ancestors[].dname]' | tr -d '[[",\]]' | \
awk 't!=""{t=$1 "/" t;}t==""{t=$1;}END{print t}'


Where $POOL is the "default pool" name (for files) or the metadata pool
name (for directories) and $ino is the inode number (in hexadecimal).




Loïc.
--
| Loīc Tortay mailto:tor...@cc.in2p3.fr>> - IN2P3 
Computing Centre |
___
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-02 Thread MARTEL Arnaud
Hi,

Or you can query the MDS(s) with:
ceph tell mds.* dump inode  2>/dev/null | grep path

for example:
user@server:~$ ceph tell mds.* dump inode 1099836155033 2>/dev/null | grep path
"path": "/ec42/default/joliot/gipsi/gpu_burn.sif",
"stray_prior_path": "",


Arnaud


Le 01/05/2023 15:07, « Loic Tortay » mailto:tor...@cc.in2p3.fr>> a écrit :


On 01/05/2023 11:35, Frank Schilder wrote:
> Hi all,
> 
> I think we might be hitting a known problem 
> (https://tracker.ceph.com/issues/57244 
> ). I don't want to fail the mds yet, 
> because we have troubles with older kclients that miss the mds restart and 
> hold on to cache entries referring to the killed instance, leading to hanging 
> jobs on our HPC cluster.
> 
> I have seen this issue before and there was a process in D-state that 
> dead-locked itself. Usually, killing this process succeeded and resolved the 
> issue. However, this time I can't find such a process.
> 
> The tracker mentions that one can delete the file/folder. I have the inode 
> number, but really don't want to start a find on a 1.5PB file system. Is 
> there a better way to find what path is causing the issue (ask the MDS 
> directly, look at a cache dump, or similar)? Is there an alternative to 
> deletion or MDS fail?
> 
Hello,
If you have the inode number, you can retrieve the name with something like:
rados getxattr -p $POOL ${ino}. parent | \
ceph-dencoder type inode_backtrace_t import - decode dump_json | \
jq -M '[.ancestors[].dname]' | tr -d '[[",\]]' | \
awk 't!=""{t=$1 "/" t;}t==""{t=$1;}END{print t}'


Where $POOL is the "default pool" name (for files) or the metadata pool 
name (for directories) and $ino is the inode number (in hexadecimal).




Loïc.
-- 
| Loīc Tortay mailto:tor...@cc.in2p3.fr>> - IN2P3 
Computing Centre |
___
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Nikola Ciprich
Hello dear CEPH users and developers,

we're dealing with strange problems.. we're having 12 node alma linux 9 cluster,
initially installed CEPH 15.2.16, then upgraded to 17.2.5. It's running bunch
of KVM virtual machines accessing volumes using RBD.

everything is working well, but there is strange and for us quite serious issue
 - speed of write operations (both sequential and random) is constantly 
degrading
 drastically to almost unusable numbers (in ~1week it drops from ~70k 4k 
writes/s
 from 1 VM  to ~7k writes/s)

When I restart all OSD daemons, numbers immediately return to normal..

volumes are stored on replicated pool of 4 replicas, on top of 7*12 = 84
INTEL SSDPE2KX080T8 NVMEs.

I've updated cluster to 17.2.6 some time ago, but the problem persists. This is
especially annoying in connection with https://tracker.ceph.com/issues/56896
as restarting OSDs is quite painfull when half of them crash..

I don't see anything suspicious, nodes load is quite low, no logs errors,
network latency and throughput is OK too

Anyone having simimar issue?

I'd like to ask for hints on what should I check further..

we're running lots of 14.2.x and 15.2.x clusters, none showing similar
issue, so I'm suspecting this is something related to quincy

thanks a lot in advance

with best regards

nikola ciprich



-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:+420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PVE CEPH OSD heartbeat show

2023-05-02 Thread Fabian Grünbichler
On May 1, 2023 9:30 pm, Peter wrote:
> Hi Fabian,
> 
> Thank you for your prompt response. It's crucial to understand how things 
> work, and I appreciate your assistance.
> 
> After replacing the switch for our Ceph environment, we experienced three 
> days of normalcy before the issue recurred this morning. I noticed that the 
> TCP in/out became unstable, and TCP errors occurred simultaneously. The UDP 
> in/out values were 70K and 150K, respectively, while the errors peaked at 
> around 50K per second.
> 
> I reviewed the Proxmox documentation and found that it is recommended to 
> separate the cluster network and storage network. Currently, we have more 
> than 20 Ceph nodes across five different locations, and only one location has 
> experienced this issue. We are fortunate that it has not happened in other 
> areas. While we plan to separate the network soon, I was wondering if there 
> are any temporary solutions or configurations that could limit the UDP 
> triggering and resolve the "corosync" issue.

the only real solution is separating the links. you can try to
prioritize Corosync traffic (UDP on ports 540X) on your switches to
avoid the links going over the threshold where Corosync marks them as
down. links going down could cause them to start flapping (if they are
not really down, but just the Corosync heartbeat timing out
occasionally) and trigger an increased amount of traffic cause of
retransmits and resync operations trying to reestablish the cluster
membership, that could then in turn also affect other traffic going over
the same links.

> I appreciate your help in this matter and look forward to your response.
> 
> Peter
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io