[ceph-users] Re: ceph orch command hung

2023-09-12 Thread Eugen Block
No, it's a flag you (or someone else?) set before shutting down the  
cluster, look at your initial email, there were multiple flags set:


pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
flag(s) set

When you bring your cluster back online you should unset those flags.

Zitat von kgh0201...@gmail.com:


Dear Eugen,


you should unpause your cluster (ceph osd unpause) so all services can


Thank you for your advice.
Running "ceph osd unpause" command fixed this problem.
Is this comand required to use the ceph orch command like "ceph orch  
host ls" ?


Sincerely,
Taku Izumi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-12 Thread Igor Fedotov

HI All,

as promised here is a postmortem analysis on what happened.

the following ticket (https://tracker.ceph.com/issues/62815) with 
accompanying materials provide a low-level overview on the issue.


In a few words it is as follows:

Default hybrid allocator (as well as AVL one it's based on) could take 
dramatically long time to allocate pretty large (hundreds of MBs) 
64K-aligned chunks for BlueFS. At the original cluster it was exposed as 
20-30 sec OSD stalls.


This is apparently not specific to the recent 16.2.14 Pacific release as 
I saw that at least once before but 
https://github.com/ceph/ceph/pull/51773 made it more likely to pop up. 
RocksDB could preallocate huge WALs in a single short from now on .


The issue is definitely bound to aged/fragmented main OSD volumes which 
colocate DB ones. I don't expect it to pop up for standalone DB/WALs.


As already mentioned in this thread the proposed work-around is to 
switch bluestore_allocator to bitmap. This might cause minor overall 
performance drop so I'm not sure one should apply this unconditionally.


I'd like to ask for apologies for the inconvenience this could result. 
We're currently working on a proper fix...


Thanks,

Igor

On 07/09/2023 10:05, J-P Methot wrote:

Hi,

We're running latest Pacific on our production cluster and we've been 
seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed 
out after 15.00954s' error. We have reasons to believe this 
happens each time the RocksDB compaction process is launched on an 
OSD. My question is, does the cluster detecting that an OSD has timed 
out interrupt the compaction process? This seems to be what's 
happening, but it's not immediately obvious. We are currently facing 
an infinite loop of random OSDs timing out and if the compaction 
process is interrupted without finishing, it may explain that.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-12 Thread Konstantin Shalygin
Hi Igor,

> On 12 Sep 2023, at 15:28, Igor Fedotov  wrote:
> 
> Default hybrid allocator (as well as AVL one it's based on) could take 
> dramatically long time to allocate pretty large (hundreds of MBs) 64K-aligned 
> chunks for BlueFS. At the original cluster it was exposed as 20-30 sec OSD 
> stalls.

For the chunks, this mean bluestore min alloc size?
This cluster was deployed pre Pacific (64k) and not redeployed to Pacific 
default (4k)?


Thanks,
k
Sent from my iPhone

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-12 Thread Igor Fedotov

Hey Konstantin,

forget to mention - indeed clusters having 4K bluestore min alloc size 
are more likely to be exposed to the issue. The key point is the 
difference between bluestore and bluefs allocation sizes. The issue 
likely to pop-up when user and DB data are collocated but different 
allocation units are in use. As a result allocator needs to locate 
properly aligned chunks for BlueFS among a bunch of inappropriate 
misaligned chunks. Which might be ineffective in the current 
implementation and cause the slowdown.



Thanks,

Igor

On 12/09/2023 15:47, Konstantin Shalygin wrote:

Hi Igor,


On 12 Sep 2023, at 15:28, Igor Fedotov  wrote:

Default hybrid allocator (as well as AVL one it's based on) could take 
dramatically long time to allocate pretty large (hundreds of MBs) 64K-aligned 
chunks for BlueFS. At the original cluster it was exposed as 20-30 sec OSD 
stalls.

For the chunks, this mean bluestore min alloc size?
This cluster was deployed pre Pacific (64k) and not redeployed to Pacific 
default (4k)?


Thanks,
k
Sent from my iPhone



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrading OS [and ceph release] nondestructively for oldish Ceph cluster

2023-09-12 Thread Ackermann, Christoph
Hello Sam,

i did start with an Ceph Jewel and Centos 7 (POC) cluster in mid 2017 now
successfully running latest Quincy version 17.2.6 in production.  BUT, we
had to do a recreation of all OSDs (DB/WAL)  from Filstore to Bluestore and
later once again for Centos 8 host migration.  :-/

Major step stones: Jewel > Luminous > Nautilus > Octopus (on Centos 8 later
Rocky 8) > Quincy (non-cephadm) > Quincy (cephadm)

- Change from Centos 7 to Centos 8 by complete new install one-by-one host
and temporary use of Centos 8 VMs for mon/mgr/mds
- Upgrade from Centos 8  Rocky 8 via  Upgrade script (ceph-volume package
was removed, so hat to reinstall)
- After adoption to "cephadm" Cluster we need to run cephadm
check-host ... manually/rc.local
after host reboot

I think it's quite tricky to go to Rocky 9 hosts directly because of
missing RPM's in https://download.ceph.com/rpm-pacific/el9/

Maybe that helps to find a proper upgrade path.

Christoph


Am Do., 7. Sept. 2023 um 17:53 Uhr schrieb Sam Skipsey :

> Hello all,
>
> We've had a Nautilus [latest releases] cluster for some years now, and are
> planning the upgrade process - both moving off Centos7 [ideally to a RHEL9
> compatible spin like Alma 9 or Rocky 9] and also moving to a newer Ceph
> release [ideally Pacific or higher to avoid too many later upgrades needed].
>
> As far as ceph release upgrades go, I understand the process in general.
>
> What I'm less certain about (and more nervous about from a potential data
> loss perspective) is the OS upgrade.
> For Ceph bluestore OSDs, I assume all the relevant metadata is on the OSD
> disk [or on the separate disk configured for RocksDB etc if you have nvme],
> and none is on the OS itself?
> For Mons and Mgrs, what stuff do I need to retain across the OS upgrade to
> have things "just work" [since they're relatively stateless, I assume
> mostly the /etc/ceph/ stuff and ceph cluster keys?]
> For the MDS, I assume it's similar to MGRS? The MDS, IIRC, mainly works as
> a caching layer so I assume there's not much state that can be lost
> permanently?
>
> Has anyone gone through this process who would be happy to share their
> experience? (There's not a lot on this on the wider internet - lots on
> upgrading ceph, much less on the OS)
>
> Sam
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS daemons don't report any more

2023-09-12 Thread Frank Schilder
Hi Patrick,

I'm not sure that its exactly the same issue. I observed that "ceph tell 
mds.xyz session ls" had all counters 0. On Friday before we had a power loss on 
a rack that took out a JBOD with a few meta-data disks and I suspect that the 
reporting of zeroes started after this crash. No hard evidence though.

I uploaded all logs with a bit of explanation to tag 
1c022c43-04a7-419d-bdb0-e33c97ef06b8.

I don't have any more detail than that recorded. It was 3 other MDSes that 
restarted on fail of rank 0, not 5 as I wrote before. The readme.txt contains 
pointers to find the info in the logs.

We didn't have any user complaints. Therefore, I'm reasonably confident that 
the file system was actually accessible the whole time (from Friday afternoon 
until Sunday night when I restarted everything).

I hope you can find something useful.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Patrick Donnelly 
Sent: Monday, September 11, 2023 7:51 PM
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: MDS daemons don't report any more

Hello Frank,

On Mon, Sep 11, 2023 at 8:39 AM Frank Schilder  wrote:
>
> Update: I did a systematic fail of all MDSes that didn't report, starting 
> with the stand-by daemons and continuing from high to low ranks. One by one 
> they started showing up again with version and stats and the fail went as 
> usual with one exception: rank 0.

It might be https://tracker.ceph.com/issues/24403

> The moment I failed rank 0 it took 5 other MDSes down with it.

Can you be more precise about what happened? Can you share logs?

> This is, in fact, the second time I have seen such an event, failing an MDS 
> crashes others. Given the weird observation in my previous e-mail together 
> with what I saw when restarting everything, does this indicate a problem with 
> data integrity or is this an annoying yet harmless bug?

It sounds like an annoyance but certainly one we'd like to track down.
Keep in mind that the "fix" for https://tracker.ceph.com/issues/24403
is not going to Octopus. You need to upgrade and Pacific will soon be
EOL.

> Thanks for any help!
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Frank Schilder 
> Sent: Sunday, September 10, 2023 12:39 AM
> To: ceph-users@ceph.io
> Subject: [ceph-users] MDS daemons don't report any more
>
> Hi all, I make a weird observation. 8 out of 12 MDS daemons seem not to 
> report to the cluster any more:
>
> # ceph fs status
> con-fs2 - 1625 clients
> ===
> RANK  STATE MDS   ACTIVITY DNSINOS
>  0active  ceph-16  Reqs:0 /s 0  0
>  1active  ceph-09  Reqs:  128 /s  4251k  4250k
>  2active  ceph-17  Reqs:0 /s 0  0
>  3active  ceph-15  Reqs:0 /s 0  0
>  4active  ceph-24  Reqs:  269 /s  3567k  3567k
>  5active  ceph-11  Reqs:0 /s 0  0
>  6active  ceph-14  Reqs:0 /s 0  0
>  7active  ceph-23  Reqs:0 /s 0  0
> POOL   TYPE USED  AVAIL
>con-fs2-meta1 metadata  2169G  7081G
>con-fs2-meta2   data   0   7081G
> con-fs2-data   data1248T  4441T
> con-fs2-data-ec-ssddata 705G  22.1T
>con-fs2-data2   data3172T  4037T
> STANDBY MDS
>   ceph-08
>   ceph-10
>   ceph-12
>   ceph-13
> VERSION   
>DAEMONS
>   None
> ceph-16, ceph-17, ceph-15, ceph-11, ceph-14, ceph-23, ceph-10, ceph-12
> ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus 
> (stable)ceph-09, ceph-24, ceph-08, ceph-13
>
> Version is "none" for these and there are no stats. Ceph versions reports 
> only 4 MDSes out of the 12. 8 are not shown at all:
>
> [root@gnosis ~]# ceph versions
> {
> "mon": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 5
> },
> "mgr": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 5
> },
> "osd": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 1282
> },
> "mds": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 4
> },
> "overall": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 1296
> }
> }
>
> Ceph status reports everything as up and OK:
>
> [root@gnosis ~]# ceph status
>   cluster:
> id: e4ece518-f2cb-4708-b00f-b6bf511e91d9
> health: HEALTH_OK
>
>   services:
> mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 2w)
> mgr: ceph-03(active, since 61s), standbys: 

[ceph-users] Re: [ceph v16.2.10] radosgw crash

2023-09-12 Thread Tobias Urdin
Hello,

That was solved in 16.2.11 in tracker [1] with fix [2].

Best regards
Tobias

[1] https://tracker.ceph.com/issues/55765
[2] https://github.com/ceph/ceph/pull/47194/commits

> On 12 Sep 2023, at 05:29, Louis Koo  wrote:
> 
> radosgw crash again with:
> ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific 
> (stable)
> 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fdffc5d4ce0]
> 2: (rgw::ARN::ARN(rgw_bucket const&)+0x42) [0x7fe00727dbc2]
> 3: (verify_bucket_permission(DoutPrefixProvider const*, perm_state_base*, 
> rgw_bucket const&, RGWAccessControlPolicy*, RGWAccessControlPolicy*, 
> boost::optional const&, std::vector std::allocator > const&, std::vector std::allocator > const&, unsigned long)+0xa2) 
> [0x7fe0072ce412]
> 4: (verify_bucket_permission(DoutPrefixProvider const*, req_state*, unsigned 
> long)+0x83) [0x7fe0072cf243]
> 5: (RGWListBucket::verify_permission(optional_yield)+0x12e) [0x7fe0074a2d3e]
> 6: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, 
> req_state*, optional_yield, bool)+0x81b) [0x7fe00714effb]
> 7: (process_request(rgw::sal::RGWRadosStore*, RGWREST*, RGWRequest*, 
> std::__cxx11::basic_string, std::allocator 
> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSink*, 
> optional_yield, rgw::dmclock::Scheduler*, std::__cxx11::basic_string std::char_traits, std::allocator >*, 
> std::chrono::duration >*, 
> int*)+0x2891) [0x7fe0071531b1]
> 8: /lib64/libradosgw.so.2(+0x440703) [0x7fe0070d5703]
> 9: make_fcontext()
> NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot create new OSDs - ceph version 17.2.6 (810db68029296377607028a6c6da1ec06f5a2b27) quincy (stable)

2023-09-12 Thread Konold, Martin

Hi Igor,

I recreated the log with full debugging enabled.

https://www.konsec.com/download/full-debug-20-ceph-osd.43.log.gz

and another without the debug settings

https://www.konsec.com/download/failed-ceph-osd.43.log.gz

I hope you can draw some conclusions from it and I am looking forward to 
your response.


Regards
ppa. Martin Konold

--
Martin Konold - Prokurist, CTO
KONSEC GmbH -⁠ make things real
Amtsgericht Stuttgart, HRB 23690
Geschäftsführer: Andreas Mack
Im Köller 3, 70794 Filderstadt, Germany

On 2023-09-11 22:08, Igor Fedotov wrote:

Hi Martin,

could you please share the full existing log and also set
debug_bluestore and debug_bluefs to 20 and collect new osd startup
log.


Thanks,

Igor

On 11/09/2023 20:53, Konold, Martin wrote:

Hi,

I want to create a new OSD on a 4TB Samsung MZ1L23T8HBLA-00A07 
enterprise nvme device in a hyper-converged proxmox 8 environment.


Creating the OSD works but it cannot be initialized and therefore not 
started.


In the log I see an entry about a failed assert.

./src/os/bluestore/fastbmap_allocator_impl.cc: 405: FAILED 
ceph_assert((aligned_extent.length % l0_granularity) == 0)


Is this the culprit?

In addition at the end of the logfile there is a failed mount and a 
failed osd init mentioned.


2023-09-11T16:30:04.708+0200 7f99aa28f3c0 -1 bluefs _check_allocations 
OP_FILE_UPDATE_INC invalid extent 1: 0x14~1: duplicate 
reference, ino 30
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 -1 bluefs mount failed to 
replay log: (14) Bad address

2023-09-11T16:30:04.708+0200 7f99aa28f3c0 20 bluefs _stop_alloc
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 -1 
bluestore(/var/lib/ceph/osd/ceph-43) _open_bluefs failed bluefs mount: 
(14) Bad address
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 10 bluefs 
maybe_verify_layout no memorized_layout in bluefs superblock
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 -1 
bluestore(/var/lib/ceph/osd/ceph-43) _open_db failed to prepare db 
environment:
2023-09-11T16:30:04.708+0200 7f99aa28f3c0  1 bdev(0x5565c261fc00 
/var/lib/ceph/osd/ceph-43/block) close
2023-09-11T16:30:04.940+0200 7f99aa28f3c0 -1 osd.43 0 OSD:init: unable 
to mount object store
2023-09-11T16:30:04.940+0200 7f99aa28f3c0 -1  ** ERROR: osd init 
failed: (5) Input/output error


I verified that the hardware of the new nvme is working fine.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw: strong consistency for (bucket) policy settings?

2023-09-12 Thread Matthias Ferdinand
On Mon, Sep 11, 2023 at 02:37:59PM -0400, Matt Benjamin wrote:
> Yes, it's also strongly consistent.  It's also last writer wins, though, so
> two clients somehow permitted to contend for updating policy could
> overwrite each other's changes, just as with objects.

Hi, thank you for confirming this!

Matthias
> 
> On Mon, Sep 11, 2023 at 2:21 PM Matthias Ferdinand 
> wrote:
> 
> > Hi,
> >
> > while I don't currently use rgw, I still am curious about consistency
> > guarantees.
> >
> > Usually, S3 has strong read-after-write consistency guarantees (for
> > requests that do not overlap). According to
> > https://docs.ceph.com/en/latest/dev/radosgw/bucket_index/
> > in Ceph this is also true for per-object ACLs.
> >
> > Is there also a strong consistency guarantee for (bucket) policies? The
> > documentation at
> > https://docs.ceph.com/en/latest/radosgw/bucketpolicy/
> > apparently does not say anything about this.
> >
> > How would multiple rgw instances synchronize a policy change? Is this
> > effective immediate with strong consistency or is there some propagation
> > delay (hopefully on with some upper bound)?
> >
> >
> > Best regards
> > Matthias
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> 
> -- 
> 
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://www.redhat.com/en/technologies/storage
> 
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Separating Mons and OSDs in Ceph Cluster

2023-09-12 Thread Joachim Kraftmayer - ceph ambassador

Another the possibility is also the ceph mon discovery via DNS:

https://docs.ceph.com/en/quincy/rados/configuration/mon-lookup-dns/#looking-up-monitors-through-dns

Regards, Joachim

___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 11.09.23 um 09:32 schrieb Robert Sander:

Hi,

On 9/9/23 09:34, Ramin Najjarbashi wrote:


The primary goal is to deploy new Monitors on different servers without
causing service interruptions or disruptions to data availability.


Just do that. New MONs will be added to the mon map which will be 
distributed to all running components. All OSDs will immediately know 
about the new MONs.


The same goes when removing an old MON.

After that you have to update the ceph.conf on each host to make the 
change "reboot safe".


No need to restart any other component including OSDs.

Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Awful new dashboard in Reef

2023-09-12 Thread Nizamudeen A
Thank you Nicola,

We are collecting these feedbacks. For a while we weren't focusing on the
mobile view
of the dashboard. If there are users using those, we'll look into it as
well. Will let everyone know
soon with the improvements in the UI.

Regards,
Nizam

On Mon, Sep 11, 2023 at 2:23 PM Nicola Mori  wrote:

> Hi Nizam,
>
> many thanks for the tip. And sorry for the quite rude subject of my
> post, I really appreciate the dashboard revamp effort but I was
> frustrated about the malfunctioning and missing features. By the way,
> one of the things that really need to be improved is the support for
> mobile devices, the dashboard on my phone is quite unusable, both the
> old and the new one (although the old gets better when browsing in
> desktop mode).
> Thanks again,
>
> Nicola
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io