[ceph-users] Libvirt and Ceph: libvirtd tries to open random RBD images

2023-12-01 Thread Jayanth Reddy
Hello Users,
We're using libvirt with KVM and the orchestrator is Cloudstack. I raised
the issue already at Cloudstack at
https://github.com/apache/cloudstack/issues/8211 but appears to be at
libvirtd. Did the same in libvirt ML at
https://lists.libvirt.org/archives/list/us...@lists.libvirt.org/thread/SA2I4QZGVVEIKPJU7E2KAFYYFZLJZDMV/
but I'm now here looking for answers.

Below is our environment & issue description:

Ceph: v17.2.0
Pool: replicated
Number of block images in this pool: more than 1250

# virsh pool-info c15508c7-5c2c-317f-aa2e-29f307771415
Name:   c15508c7-5c2c-317f-aa2e-29f307771415
UUID:   c15508c7-5c2c-317f-aa2e-29f307771415
State:  running
Persistent: no
Autostart:  no
Capacity:   1.25 PiB
Allocation: 489.52 TiB
Available:  787.36 TiB

# kvm --version
QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.27)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

# libvirtd --version
libvirtd (libvirt) 6.0.0

It appears that one of our Cloudstack KVM clusters having 8 hosts is having
the issue. We have HCI on these 8 hosts and there are around 700+ VMs
running. But strange enough, there are these logs like below on hosts.


Oct 25 13:38:11 hv-01 libvirtd[9464]: failed to open the RBD image
'087bb114-448a-41d2-9f5d-6865b62eed15': No such file or directory
Oct 25 20:35:22 hv-01 libvirtd[9464]: failed to open the RBD image
'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory
Oct 26 09:48:33 hv-01 libvirtd[9464]: failed to open the RBD image
'a3fe82f8-afc9-4604-b55e-91b676514a18': No such file or directory

We've got DNS servers on which there is an`A` record resolving to all the
IPv4 Addresses of 5 monitors and there have not been any issues with the
DNS resolution. But the issue of "failed to open the RBD image
'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory" gets
more weird because the VM that is making use of that RBD image lets say
"087bb114-448a-41d2-9f5d-6865b62eed15" is running on an altogether
different host like "hv-06". On further inspection of that specific Virtual
Machine, it has been running on that host "hv-06" for more than 4 months or
so. Fortunately, the Virtual Machine has no issues and has been running
since then. There are absolutely no issues with any of the Virtual Machines
because of these warnings.

>From libvirtd mailing lists, one of the community members helped me
understand that libvirt only tries to get the info of the images and
doesn't open for reading or writing. All hosts where there is libvirtd
tries doing the same. We manually did "virsh pool-refresh" which CloudStack
itself takes care of at regular intervals and the warning messages still
appear. Please help me find the cause and let me know if further
information is needed.

Thanks,
Jayanth Reddy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph 17.2.7 to 18.2.0 issues

2023-12-01 Thread pclark6063
Hi All,

Recently I upgraded my cluster from Quincy to Reef. Everything appeared to go 
smoothly and without any issues arising.
I was forced to poweroff the cluster, performing the ususal procedures 
beforehand and everything appears to have come back fine. Every service reports 
green across the board except

If i try to copy any files from a cephfs mountpoint whether kernel or fuse the 
actual copy will hang. ls/stat etc all work which indicates metadata appears 
fine but copying always hangs.

I can copy objects direct using the rados toolset which indicates the 
underlying data exists.

The system itself reports no errors and thinks its healthy. 

The entire cluster and cephfs clients are all Rocky9.

Any advice would be much appreciatd. I'd find this easier to deal with if the 
cluster actually gave me an error
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Compilation failure when building Ceph on Ubuntu

2023-12-01 Thread Yong Yuan
Hi,

I'm trying to build a DEBUG version of Ceph Reef on a virtual Ubuntu-LTS
22.04 running on Lima by following the README on Ceph's github repo. The
build failed and the last CMake error was ""g++-11: error: unrecognized
command-line option '-Wimplicit-const-int-float-conversion'". Does anyone
know what I can do to fix the compilation error? I could try different gcc
versions, but I'd assume Ceph's build scripts would install and verify all
the dependencies. Thanks,

The system configuration is as follows:

> uname -a
Linux lima-ceph-dev 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49
UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy

I followed the instructions on the README in Ceph's github repo:
https://github.com/ceph/ceph, and the command ./do_cmake.sh failed at step
[137/2150] that builds frontend dashboard with the error message "ninja:
build stopped: subcommand failed." The last error logged in the file
CMakeError.log has to do with "g++-11: error: unrecognized command-line
option '-Wimplicit-const-int-float-conversion'".

Below the last error message on the CMakeError.log:

Performing C++ SOURCE FILE Test
COMPILER_SUPPORTS_WARN_IMPLICIT_CONST_INT_FLOAT_CONVERSION failed with the
following output:
Change Dir: /home/dyuan.linux/ceph/build/CMakeFiles/CMakeTmp

Run Build Command(s):/usr/bin/ninja cmTC_bab6d && [1/2] Building CXX object
CMakeFiles/cmTC_bab6d.dir/src.cxx.o
FAILED: CMakeFiles/cmTC_bab6d.dir/src.cxx.o
/usr/bin/g++-11
-DCOMPILER_SUPPORTS_WARN_IMPLICIT_CONST_INT_FLOAT_CONVERSION  -fPIE
-Wimplicit-const-int-float-conversion -std=c++20 -o
CMakeFiles/cmTC_bab6d.dir/src.cxx.o -c
/home/dyuan.linux/ceph/build/CMakeFiles/CMakeTmp/src.cxx
g++-11: error: unrecognized command-line option
'-Wimplicit-const-int-float-conversion'
ninja: build stopped: subcommand failed.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stray host/daemon

2023-12-01 Thread Jeremy Hansen
Found my previous post regarding this issue.

Fixed by restarting mgr daemons.

-jeremy

> On Friday, Dec 01, 2023 at 3:04 AM, Me  (mailto:jer...@skidrow.la)> wrote:
> I think I ran in to this before but I forget the fix:
>
> HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm
> [WRN] CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by 
> cephadm
> stray host cn06.ceph.fu.intra has 1 stray daemons: ['mon.cn03']
>
>
> Pacific 16.2.11
>
> How do I clear this?
>
> Thanks
> -jeremy
>
>
>


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd dump_historic_ops

2023-12-01 Thread E Taka
This small (Bash) wrapper around the "ceph daemon" command, especially the
auto-completeion with the TAB key, ist quite helpful, IMHO:
https://github.com/test-erik/ceph-daemon-wrapper

Am Fr., 1. Dez. 2023 um 15:03 Uhr schrieb Phong Tran Thanh <
tranphong...@gmail.com>:

> It works!!!
>
> Thanks Kai Stian Olstad
>
> Vào Th 6, 1 thg 12, 2023 vào lúc 17:06 Kai Stian Olstad <
> ceph+l...@olstad.com> đã viết:
>
> > On Fri, Dec 01, 2023 at 04:33:20PM +0700, Phong Tran Thanh wrote:
> > >I have a problem with my osd, i want to show dump_historic_ops of osd
> > >I follow the guide:
> > >
> >
> https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops
> > >But when i run command
> > >
> > >ceph daemon osd.8 dump_historic_ops show the error, the command run on
> > node
> > >with osd.8
> > >Can't get admin socket path: unable to get conf option admin_socket for
> > >osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
> > >types are: auth, mon, osd, mds, mgr, client\n"
> > >
> > >I am running ceph cluster reef version by cephadmin install
> > >
> > >What should I do?
> >
> > The easiest is use tell, then you can run it on any node that have access
> > to ceph.
> >
> >  ceph tell osd.8 dump_historic_ops
> >
> >
> >  ceph tell osd.8 help
> > will give you all you can do with tell.
> >
> > --
> > Kai Stian Olstad
> >
>
>
> --
> Trân trọng,
>
> 
>
> *Tran Thanh Phong*
>
> Email: tranphong...@gmail.com
> Skype: tranphong079
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-12-01 Thread Igor Fedotov

Hi Yuri,

Looks like that's not THAT critical and complicated as it's been thought 
originally. User has to change bluefs_shared_alloc_size to be exposed to 
the issue. So hopefully I'll submit a patch on Monday to close this gap 
and we'll be able to proceed.



Thanks,

Igor

On 01/12/2023 18:16, Yuri Weinstein wrote:

Venky, pls review the test results for smoke and fs after the PRs were merged.

Radek, Igor, Adam - any updates on https://tracker.ceph.com/issues/63618?

Thx

On Thu, Nov 30, 2023 at 8:08 AM Yuri Weinstein  wrote:

The fs PRs:
https://github.com/ceph/ceph/pull/54407
https://github.com/ceph/ceph/pull/54677
were approved/tested and ready for merge.

What is the status/plan for https://tracker.ceph.com/issues/63618?

On Wed, Nov 29, 2023 at 10:51 AM Igor Fedotov  wrote:

https://tracker.ceph.com/issues/63618 to be considered as a blocker for
the next Reef release.

On 07/11/2023 00:30, Yuri Weinstein wrote:

Details of this release are summarized here:

https://tracker.ceph.com/issues/63443#note-1

Seeking approvals/reviews for:

smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
rados - Neha, Radek, Travis, Ernesto, Adam King
rgw - Casey
fs - Venky
orch - Adam King
rbd - Ilya
krbd - Ilya
upgrade/quincy-x (reef) - Laura PTL
powercycle - Brad
perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)

Please reply to this email with approval and/or trackers of known
issues/PRs to address them.

TIA
YuriW
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph osd slow ops

2023-12-01 Thread Josh Baergen
Given that this is s3, are the slow ops on index or data OSDs? (You
mentioned HDD but I don't want to assume that meant that the osd you
mentioned
is data)

Josh

On Fri, Dec 1, 2023 at 7:05 AM VÔ VI  wrote:
>
> Hi Stefan,
>
> I am running replicate x3 with a failure domain as host and setting
> min_size pool is 1. Because my cluster s3 traffic real time and can't stop
> or block IO, the data may be lost but IO alway available. I hope my cluster
> can run with two nodes unavailable.
> After that two nodes is down at the same time, and then nodes up, client IO
> and recover running in the same time, and some disk warning is slowops,
> what is the problem, may be my disk is overload, but the disk utilization
> only 60 -80%
>
> Thanks Stefan
>
> Vào Th 6, 1 thg 12, 2023 vào lúc 16:40 Stefan Kooman  đã
> viết:
>
> > On 01-12-2023 08:45, VÔ VI wrote:
> > > Hi community,
> > >
> > > My cluster running with 10 nodes and 2 nodes goes down, sometimes the log
> > > shows the slow ops, what is the root cause?
> > > My osd is HDD and block.db and wal is 500GB SSD per osd.
> > >
> > > Health check update: 13 slow ops, oldest one blocked for 167 sec, osd.10
> > > has slow ops (SLOW_OPS)
> >
> > Most likely you have a crush rule that spreads objects over hosts as a
> > failure domain. For size=3, min_size=2 (default for replicated pools)
> > you might end up in a situation where two of the nodes that are offline
> > have PGs where min_size=2 requirement is not fulfilled, and will hence
> > by inactive and slow ops will occur.
> >
> > When host is your failure domain, you should not reboot more than one at
> > the same time. If the hosts are somehow organized (different racks,
> > datacenters) you could make a higher level bucket and put your hosts
> > there. And create a crush rule using that bucket type as failure domain,
> > and have your pools use that.
> >
> > Gr. Stefan
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to identify the index pool real usage?

2023-12-01 Thread Anthony D'Atri
>> 
>> Today we had a big issue with slow ops on the nvme drives which holding
>> the index pool.
>> 
>> Why the nvme shows full if on ceph is barely utilized? Which one I should
>> belive?
>> 
>> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme
>> drive has 4x osds on it):

Why split each device into 4 very small OSDs?  You're losing a lot of capacity 
to overhead.

>> 
>> ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META  
>> AVAIL%USE   VAR   PGS  STATUS
>> 195   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   656 MiB  
>> 400 GiB  10.47  0.21   64  up
>> 252   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   845 MiB  
>> 401 GiB  10.35  0.21   64  up
>> 253   nvme  0.43660   1.0  447 GiB   46 GiB  229 MiB   45 GiB   662 MiB  
>> 401 GiB  10.26  0.21   66  up
>> 254   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.3 GiB  
>> 401 GiB  10.26  0.21   65  up
>> 255   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   1.2 GiB  
>> 400 GiB  10.58  0.21   64  up
>> 288   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.2 GiB  
>> 401 GiB  10.25  0.21   64  up
>> 289   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   641 MiB  
>> 401 GiB  10.33  0.21   64  up
>> 290   nvme  0.43660   1.0  447 GiB   45 GiB  229 MiB   44 GiB   668 MiB  
>> 402 GiB  10.14  0.21   65  up
>> 
>> However in nvme list it says full:
>> Node SN   ModelNamespace Usage   
>>Format   FW Rev
>>   
>> --- -
>> --  

>> /dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92 1   1.92  TB 
>> /   1.92  TB512   B +  0 B   GPK6
>> /dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92 1   1.92  TB 
>> /   1.92  TB512   B +  0 B   GPK6

That command isn't telling you what you think it is.  It has no awareness of 
actual data, it's looking at NVMe namespaces.

>> 
>> With some other node the test was like:
>> 
>>  *   if none of the disk full, no slow ops.
>>  *   If 1x disk full and the other not, has slow ops but not too much
>>  *   if none of the disk full, no slow ops.
>> 
>> The full disks are very highly utilized during recovery and they are
>> holding back the operations from the other nvmes.
>> 
>> What's the reason that even if the pgs are the same in the cluster +/-1
>> regarding space they are not equally utilized.
>> 
>> Thank you
>> 
>> 
>> 
>> 
>> This message is confidential and is for the sole use of the intended
>> recipient(s). It may also be privileged or otherwise protected by copyright
>> or other legal rules. If you have received it by mistake please let us know
>> by reply email and delete it from your system. It is prohibited to copy
>> this message or disclose its content to anyone. Any confidentiality or
>> privilege is not waived or lost by any mistaken delivery or unauthorized
>> disclosure of the message. All messages sent to and from Agoda may be
>> monitored to ensure compliance with company policies, to protect the
>> company's interests and to remove potential malware. Electronic messages
>> may be intercepted, amended, lost or deleted, or contain viruses.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to identify the index pool real usage?

2023-12-01 Thread David C.
Hi,

It looks like a trim/discard problem.

I would try my luck by activating the discard on a disk, to validate.

I have no feedback on the reliability of the bdev_*_discard parameters.
Maybe dig a little deeper into the subject or if anyone has any feedback...



Cordialement,

*David CASIER*





Le ven. 1 déc. 2023 à 16:15, Szabo, Istvan (Agoda) 
a écrit :

> Hi,
>
> Today we had a big issue with slow ops on the nvme drives which holding
> the index pool.
>
> Why the nvme shows full if on ceph is barely utilized? Which one I should
> belive?
>
> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme
> drive has 4x osds on it):
>
> ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META
>   AVAIL%USE   VAR   PGS  STATUS
> 195   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   656
> MiB  400 GiB  10.47  0.21   64  up
> 252   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   845
> MiB  401 GiB  10.35  0.21   64  up
> 253   nvme  0.43660   1.0  447 GiB   46 GiB  229 MiB   45 GiB   662
> MiB  401 GiB  10.26  0.21   66  up
> 254   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.3
> GiB  401 GiB  10.26  0.21   65  up
> 255   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   1.2
> GiB  400 GiB  10.58  0.21   64  up
> 288   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.2
> GiB  401 GiB  10.25  0.21   64  up
> 289   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   641
> MiB  401 GiB  10.33  0.21   64  up
> 290   nvme  0.43660   1.0  447 GiB   45 GiB  229 MiB   44 GiB   668
> MiB  402 GiB  10.14  0.21   65  up
>
> However in nvme list it says full:
> Node SN   Model
> Namespace Usage  Format   FW Rev
>  
>  -
> --  
> /dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92
>  1   1.92  TB /   1.92  TB512   B +  0 B   GPK6
> /dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92
>  1   1.92  TB /   1.92  TB512   B +  0 B   GPK6
>
> With some other node the test was like:
>
>   *   if none of the disk full, no slow ops.
>   *   If 1x disk full and the other not, has slow ops but not too much
>   *   if none of the disk full, no slow ops.
>
> The full disks are very highly utilized during recovery and they are
> holding back the operations from the other nvmes.
>
> What's the reason that even if the pgs are the same in the cluster +/-1
> regarding space they are not equally utilized.
>
> Thank you
>
>
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-12-01 Thread Yuri Weinstein
Venky, pls review the test results for smoke and fs after the PRs were merged.

Radek, Igor, Adam - any updates on https://tracker.ceph.com/issues/63618?

Thx

On Thu, Nov 30, 2023 at 8:08 AM Yuri Weinstein  wrote:
>
> The fs PRs:
> https://github.com/ceph/ceph/pull/54407
> https://github.com/ceph/ceph/pull/54677
> were approved/tested and ready for merge.
>
> What is the status/plan for https://tracker.ceph.com/issues/63618?
>
> On Wed, Nov 29, 2023 at 10:51 AM Igor Fedotov  wrote:
> >
> > https://tracker.ceph.com/issues/63618 to be considered as a blocker for
> > the next Reef release.
> >
> > On 07/11/2023 00:30, Yuri Weinstein wrote:
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/63443#note-1
> > >
> > > Seeking approvals/reviews for:
> > >
> > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> > > rados - Neha, Radek, Travis, Ernesto, Adam King
> > > rgw - Casey
> > > fs - Venky
> > > orch - Adam King
> > > rbd - Ilya
> > > krbd - Ilya
> > > upgrade/quincy-x (reef) - Laura PTL
> > > powercycle - Brad
> > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> > >
> > > Please reply to this email with approval and/or trackers of known
> > > issues/PRs to address them.
> > >
> > > TIA
> > > YuriW
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to identify the index pool real usage?

2023-12-01 Thread Szabo, Istvan (Agoda)
Hi,

Today we had a big issue with slow ops on the nvme drives which holding the 
index pool.

Why the nvme shows full if on ceph is barely utilized? Which one I should 
belive?

When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme drive 
has 4x osds on it):

ID   CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META  
AVAIL%USE   VAR   PGS  STATUS
195   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   656 MiB  
400 GiB  10.47  0.21   64  up
252   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   845 MiB  
401 GiB  10.35  0.21   64  up
253   nvme  0.43660   1.0  447 GiB   46 GiB  229 MiB   45 GiB   662 MiB  
401 GiB  10.26  0.21   66  up
254   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.3 GiB  
401 GiB  10.26  0.21   65  up
255   nvme  0.43660   1.0  447 GiB   47 GiB  161 MiB   46 GiB   1.2 GiB  
400 GiB  10.58  0.21   64  up
288   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   44 GiB   1.2 GiB  
401 GiB  10.25  0.21   64  up
289   nvme  0.43660   1.0  447 GiB   46 GiB  161 MiB   45 GiB   641 MiB  
401 GiB  10.33  0.21   64  up
290   nvme  0.43660   1.0  447 GiB   45 GiB  229 MiB   44 GiB   668 MiB  
402 GiB  10.14  0.21   65  up

However in nvme list it says full:
Node SN   Model
Namespace Usage  Format   FW Rev
   
- --  
/dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92 
1   1.92  TB /   1.92  TB512   B +  0 B   GPK6
/dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92 
1   1.92  TB /   1.92  TB512   B +  0 B   GPK6

With some other node the test was like:

  *   if none of the disk full, no slow ops.
  *   If 1x disk full and the other not, has slow ops but not too much
  *   if none of the disk full, no slow ops.

The full disks are very highly utilized during recovery and they are holding 
back the operations from the other nvmes.

What's the reason that even if the pgs are the same in the cluster +/-1 
regarding space they are not equally utilized.

Thank you




This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Duplicated device IDs

2023-12-01 Thread Nicola Mori

Dear Ceph users,

I am replacing some small disks on one of my hosts with bigger ones. I 
delete the OSD from the web UI, preserving the ID for replacement, then 
after the rebalancing is finished I change the disk and the cluster 
automatically re-creates the OSD with the same ID. Then I adjust the 
CRUSH weight.
Everything works fine except for the handling of the device ID of some 
of the the new disks. As you can see below there are 5 IDs associated to 
2 devices and 2 OSDs, while these are actually different disks since 
OSDs see different and corrects sizes.


[ceph: root@bofur /]# ceph device ls-by-host romolo
DEVICE DEV  DAEMONS 
EXPECTED FAILURE
AMCC_9650SE-16M_DISK_82723576349B5E000984  sdc  osd.42 

AMCC_9650SE-16M_DISK_83214021349B63000A50  sdd  osd.56 

AMCC_9650SE-16M_DISK_83450671349B680004B3  sdf  osd.68 

AMCC_9650SE-16M_DISK_83471183349B680021DA  sde  osd.65 

AMCC_9650SE-16M_DISK_9QG58JCX349B59EE  sdb  osd.13 

AMCC_9650SE-16M_DISK_AF248795608D6A16  sdq  osd.62 

AMCC_9650SE-16M_DISK_J0210858  sdi sdn  osd.105 osd.20 

AMCC_9650SE-16M_DISK_J0210926  sdg sdl  osd.36 osd.5 

AMCC_9650SE-16M_DISK_N0ECFHAL  sdj sdo  osd.25 osd.60 

AMCC_9650SE-16M_DISK_N0R5P9WT  sdk sdp  osd.51 osd.70 

AMCC_9650SE-16M_DISK_PBGDG6EE  sdh sdm  osd.45 osd.9 


SanDisk_SSD_PLUS_21089P443002  sda  mon.romolo

I really don't understand what happened, if I did something wrong, or 
how to fix this.

Any help is greatly appreciated.

Nicola


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph osd slow ops

2023-12-01 Thread VÔ VI
Hi Stefan,

I am running replicate x3 with a failure domain as host and setting
min_size pool is 1. Because my cluster s3 traffic real time and can't stop
or block IO, the data may be lost but IO alway available. I hope my cluster
can run with two nodes unavailable.
After that two nodes is down at the same time, and then nodes up, client IO
and recover running in the same time, and some disk warning is slowops,
what is the problem, may be my disk is overload, but the disk utilization
only 60 -80%

Thanks Stefan

Vào Th 6, 1 thg 12, 2023 vào lúc 16:40 Stefan Kooman  đã
viết:

> On 01-12-2023 08:45, VÔ VI wrote:
> > Hi community,
> >
> > My cluster running with 10 nodes and 2 nodes goes down, sometimes the log
> > shows the slow ops, what is the root cause?
> > My osd is HDD and block.db and wal is 500GB SSD per osd.
> >
> > Health check update: 13 slow ops, oldest one blocked for 167 sec, osd.10
> > has slow ops (SLOW_OPS)
>
> Most likely you have a crush rule that spreads objects over hosts as a
> failure domain. For size=3, min_size=2 (default for replicated pools)
> you might end up in a situation where two of the nodes that are offline
> have PGs where min_size=2 requirement is not fulfilled, and will hence
> by inactive and slow ops will occur.
>
> When host is your failure domain, you should not reboot more than one at
> the same time. If the hosts are somehow organized (different racks,
> datacenters) you could make a higher level bucket and put your hosts
> there. And create a crush rule using that bucket type as failure domain,
> and have your pools use that.
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd dump_historic_ops

2023-12-01 Thread Phong Tran Thanh
It works!!!

Thanks Kai Stian Olstad

Vào Th 6, 1 thg 12, 2023 vào lúc 17:06 Kai Stian Olstad <
ceph+l...@olstad.com> đã viết:

> On Fri, Dec 01, 2023 at 04:33:20PM +0700, Phong Tran Thanh wrote:
> >I have a problem with my osd, i want to show dump_historic_ops of osd
> >I follow the guide:
> >
> https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops
> >But when i run command
> >
> >ceph daemon osd.8 dump_historic_ops show the error, the command run on
> node
> >with osd.8
> >Can't get admin socket path: unable to get conf option admin_socket for
> >osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
> >types are: auth, mon, osd, mds, mgr, client\n"
> >
> >I am running ceph cluster reef version by cephadmin install
> >
> >What should I do?
>
> The easiest is use tell, then you can run it on any node that have access
> to ceph.
>
>  ceph tell osd.8 dump_historic_ops
>
>
>  ceph tell osd.8 help
> will give you all you can do with tell.
>
> --
> Kai Stian Olstad
>


-- 
Trân trọng,


*Tran Thanh Phong*

Email: tranphong...@gmail.com
Skype: tranphong079
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-01 Thread Frank Schilder
Hi Xiubo,

I uploaded a test script with session output showing the issue. When I look at 
your scripts, I can't see the stat-check on the second host anywhere. Hence, I 
don't really know what you are trying to compare.

If you want me to run your test scripts on our system for comparison, please 
include the part executed on the second host explicitly in an ssh-command. 
Running your scripts alone in their current form will not reproduce the issue.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Monday, November 27, 2023 3:59 AM
To: Frank Schilder; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent


On 11/24/23 21:37, Frank Schilder wrote:
> Hi Xiubo,
>
> thanks for the update. I will test your scripts in our system next week. 
> Something important: running both scripts on a single client will not produce 
> a difference. You need 2 clients. The inconsistency is between clients, not 
> on the same client. For example:

Frank,

Yeah, I did this with 2 different kclients.

Thanks

> Setup: host1 and host2 with a kclient mount to a cephfs under /mnt/kcephfs
>
> Test 1
> - on host1: execute shutil.copy2
> - execute ls -l /mnt/kcephfs/ on host1 and host2: same result
>
> Test 2
> - on host1: shutil.copy
> - execute ls -l /mnt/kcephfs/ on host1 and host2: file size=0 on host 2 while 
> correct on host 1
>
> Your scripts only show output of one host, but the inconsistency requires two 
> hosts for observation. The stat information is updated on host1, but not 
> synchronized to host2 in the second test. In case you can't reproduce that, I 
> will append results from our system to the case.
>
> Also it would be important to know the python and libc versions. We observe 
> this only for newer versions of both.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Xiubo Li 
> Sent: Thursday, November 23, 2023 3:47 AM
> To: Frank Schilder; Gregory Farnum
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent
>
> I just raised one tracker to follow this:
> https://tracker.ceph.com/issues/63510
>
> Thanks
>
> - Xiubo
>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Stray host/daemon

2023-12-01 Thread Jeremy Hansen
I think I ran in to this before but I forget the fix:

HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by 
cephadm
stray host cn06.ceph.fu.intra has 1 stray daemons: ['mon.cn03']

Pacific 16.2.11

How do I clear this?

Thanks
-jeremy



signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd dump_historic_ops

2023-12-01 Thread Kai Stian Olstad

On Fri, Dec 01, 2023 at 04:33:20PM +0700, Phong Tran Thanh wrote:

I have a problem with my osd, i want to show dump_historic_ops of osd
I follow the guide:
https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops
But when i run command

ceph daemon osd.8 dump_historic_ops show the error, the command run on node
with osd.8
Can't get admin socket path: unable to get conf option admin_socket for
osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
types are: auth, mon, osd, mds, mgr, client\n"

I am running ceph cluster reef version by cephadmin install

What should I do?


The easiest is use tell, then you can run it on any node that have access to 
ceph.

ceph tell osd.8 dump_historic_ops


ceph tell osd.8 help
will give you all you can do with tell.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd dump_historic_ops

2023-12-01 Thread Robert Sander

On 12/1/23 10:33, Phong Tran Thanh wrote:


ceph daemon osd.8 dump_historic_ops show the error, the command run on node
with osd.8
Can't get admin socket path: unable to get conf option admin_socket for
osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
types are: auth, mon, osd, mds, mgr, client\n"

I am running ceph cluster reef version by cephadmin install


When the daemons run in containers managed by the cephadm orchestrator 
the socket file has a different location and the command line tool ceph 
(run outisde the container) does not find it automatically.


You can run

# ceph daemon /var/run/ceph/$FSID/ceph-osd.$OSDID.asok dump_historic_ops

to use the socket outside the container.

Or you enter the container with

# cephadm enter --name osd.$OSDID

and then execute

# ceph daemon osd.$OSDID dump_historic_ops

inside the container.

$FSID is the UUID of the Ceph cluster, $OSDID is the OSD id.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About ceph osd slow ops

2023-12-01 Thread Stefan Kooman

On 01-12-2023 08:45, VÔ VI wrote:

Hi community,

My cluster running with 10 nodes and 2 nodes goes down, sometimes the log
shows the slow ops, what is the root cause?
My osd is HDD and block.db and wal is 500GB SSD per osd.

Health check update: 13 slow ops, oldest one blocked for 167 sec, osd.10
has slow ops (SLOW_OPS)


Most likely you have a crush rule that spreads objects over hosts as a 
failure domain. For size=3, min_size=2 (default for replicated pools) 
you might end up in a situation where two of the nodes that are offline 
have PGs where min_size=2 requirement is not fulfilled, and will hence 
by inactive and slow ops will occur.


When host is your failure domain, you should not reboot more than one at 
the same time. If the hosts are somehow organized (different racks, 
datacenters) you could make a higher level bucket and put your hosts 
there. And create a crush rule using that bucket type as failure domain, 
and have your pools use that.


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph osd dump_historic_ops

2023-12-01 Thread Phong Tran Thanh
Hi community,

I have a problem with my osd, i want to show dump_historic_ops of osd
I follow the guide:
https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops
But when i run command

ceph daemon osd.8 dump_historic_ops show the error, the command run on node
with osd.8
Can't get admin socket path: unable to get conf option admin_socket for
osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
types are: auth, mon, osd, mds, mgr, client\n"

I am running ceph cluster reef version by cephadmin install

What should I do?

Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io