[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-25 Thread Radoslaw Zarzynski
rados approved

On Thu, Aug 24, 2023 at 12:33 AM Laura Flores  wrote:

> Rados summary is here:
> https://tracker.ceph.com/projects/rados/wiki/PACIFIC#Pacific-v16214-httpstrackercephcomissues62527note-1
>
> Most are known, except for two new trackers I raised:
>
>1. https://tracker.ceph.com/issues/62557 - rados/dashboard: Teuthology
>test failure due to "MDS_CLIENTS_LAGGY" warning - Ceph - RADOS
>2. https://tracker.ceph.com/issues/62559 - rados/cephadm/dashboard:
>test times out due to host stuck in maintenance mode - Ceph - Orchestrator
>
> #1 is related to a similar issue we saw where the MDS_CLIENTS_LAGGY
> warning was coming up in the Jenkins api check, where these kinds of
> conditions are expected. In that case, I would call #1 more of a test
> issue, and say that the fix is to whitelist the warning for that test.
> Would be good to have someone from CephFS weigh in though-- @Patrick
> Donnelly  @Dhairya Parmar 
>
> #2 looks new to me. @Adam King  can you take a look
> and see if it's something to be concerned about? The same test failed for a
> different reason in the rerun, so the failure did not reproduce.
>
> On Wed, Aug 23, 2023 at 1:08 PM Laura Flores  wrote:
>
>> Thanks Yuri! I will take a look for rados and get back to this thread.
>>
>> On Wed, Aug 23, 2023 at 9:41 AM Yuri Weinstein 
>> wrote:
>>
>>> Details of this release are summarized here:
>>>
>>> https://tracker.ceph.com/issues/62527#note-1
>>> Release Notes - TBD
>>>
>>> Seeking approvals for:
>>>
>>> smoke - Venky
>>> rados - Radek, Laura
>>>   rook - Sébastien Han
>>>   cephadm - Adam K
>>>   dashboard - Ernesto
>>>
>>> rgw - Casey
>>> rbd - Ilya
>>> krbd - Ilya
>>> fs - Venky, Patrick
>>>
>>> upgrade/pacific-p2p - Laura
>>> powercycle - Brad (SELinux denials)
>>>
>>>
>>> Thx
>>> YuriW
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage 
>>
>> Chicago, IL
>>
>> lflo...@ibm.com | lflo...@redhat.com 
>> M: +17087388804
>>
>>
>>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] What does 'removed_snaps_queue' [d5~3] means?

2023-08-25 Thread Work Ceph
Hello guys,
We are facing/seeing an unexpected mark in one of our pools. Do you guys
know what does "removed_snaps_queue" it mean? We see some notation such as
"d5~3" after this tag. What does it mean? We tried to look into the docs,
but could not find anything meaningful.

We are running Ceph Octopus on top of Ubuntu 18.04.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm to setup wal/db on nvme

2023-08-25 Thread Anthony D'Atri


> Thank you for reply,
> 
> I have created two class SSD and NvME and assigned them to crush maps.

You don't have enough drives to keep them separate.  Set the NVMe drives back 
to "ssd" and just make one pool.

> 
> $ ceph osd crush rule ls
> replicated_rule
> ssd_pool
> nvme_pool
> 
> 
> Running benchmarks on nvme is the worst performing. SSD showing much better
> results compared to NvME.

You have more SATA SSDs and thus more OSDs, than NVMe SSDs.


> NvME model is Samsung_SSD_980_PRO_1TB

Client-grade, don't expect much from it.


> 
>  NvME pool benchmark with 3x replication
> 
> # rados -p test-nvme -t 64 -b 4096 bench 10 write
> hints = 1
> Maintaining 64 concurrent writes of 4096 bytes to objects of size 4096 for
> up to 10 seconds or 0 objects
> Object prefix: benchmark_data_os-ctrl1_1931595
>  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
>0   0 0 0 0 0   -
> 0
>1  64  5541  5477   21.3917   21.3945   0.0134898
> 0.0116529
>2  64 11209 11145   21.7641   22.1406  0.00939951
> 0.0114506
>3  64 17036 16972   22.0956   22.7617  0.00938263
> 0.0112938
>4  64 23187 23123   22.5776   24.0273  0.00863939
> 0.0110473
>5  64 29753 29689   23.1911   25.6484  0.00925603
> 0.0107662
>6  64 36222 36158   23.5369   25.2695   0.0100759
> 0.010606
>7  63 42997 42934   23.9551   26.4688  0.00902186
> 0.0104246
>8  64 49859 49795   24.3102   26.8008  0.00884379
> 0.0102765
>9  64 56429 56365   24.4601   25.6641  0.00989885
> 0.0102124
>   10  31 62727 62696   24.4869   24.7305   0.0115833
> 0.0102027
> Total time run: 10.0064
> Total writes made:  62727
> Write size: 4096
> Object size:4096
> Bandwidth (MB/sec): 24.4871
> Stddev Bandwidth:   1.85423
> Max bandwidth (MB/sec): 26.8008   <   Only 26MB/s for nvme
> disk
> Min bandwidth (MB/sec): 21.3945
> Average IOPS:   6268
> Stddev IOPS:474.683
> Max IOPS:   6861
> Min IOPS:   5477
> Average Latency(s): 0.0102022
> Stddev Latency(s):  0.00170505
> Max latency(s): 0.0365743
> Min latency(s): 0.00641319
> Cleaning up (deleting benchmark objects)
> Removed 62727 objects
> Clean up completed and total clean up time :8.23223
> 
> 
> 
> ### SSD pool benchmark
> 
> (venv-openstack) root@os-ctrl1:~# rados -p test-ssd -t 64 -b 4096 bench 10
> write
> hints = 1
> Maintaining 64 concurrent writes of 4096 bytes to objects of size 4096 for
> up to 10 seconds or 0 objects
> Object prefix: benchmark_data_os-ctrl1_1933383
>  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
>0   0 0 0 0 0   -
> 0
>1  63 43839 43776   170.972   171 0.000991462
> 0.00145833
>2  64 92198 92134   179.921   188.898  0.00211419
> 0.001387
>3  64141917141853   184.675   194.215  0.00106326
> 0.00135174
>4  63193151193088   188.534   200.137  0.00179379
> 0.00132423
>5  63243104243041   189.847   195.129 0.000831263
> 0.00131512
>6  63291045290982   189.413187.27  0.00120208
> 0.00131807
>7  64341295341231   190.391   196.285  0.00102127
> 0.00131137
>8  63393336393273   191.999   203.289 0.000958149
> 0.00130041
>9  63442459442396   191.983   191.887  0.00123453
> 0.00130053
> Total time run: 10.0008
> Total writes made:  488729
> Write size: 4096
> Object size:4096
> Bandwidth (MB/sec): 190.894
> Stddev Bandwidth:   9.35224
> Max bandwidth (MB/sec): 203.289
> Min bandwidth (MB/sec): 171
> Average IOPS:   48868
> Stddev IOPS:2394.17
> Max IOPS:   52042
> Min IOPS:   43776
> Average Latency(s): 0.00130796
> Stddev Latency(s):  0.000604629
> Max latency(s): 0.0268462
> Min latency(s): 0.000628738
> Cleaning up (deleting benchmark objects)
> Removed 488729 objects
> Clean up completed and total clean up time :8.84114
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Aug 23, 2023 at 1:25 PM Adam King  wrote:
> 
>> this should be possible by specifying a "data_devices" and "db_devices"
>> fields in the OSD spec file each with different filters. There's some
>> examples in the docs
>> https://docs.ceph.com/en/latest/cephadm/services/osd/#the-simple-case that
>> show roughly how that's done, and some other sections (
>> https://docs.ceph.com/en/latest/cephadm/services/osd/#filters) that go
>> more in depth on the different filtering options available so you can try
>> and find one that works for your disks. You can check the output of "ceph
>> orch device ls --format json | jq" to see things like what cephadm
>> considers the 

[ceph-users] Re: A couple OSDs not starting after host reboot

2023-08-25 Thread Eugen Block

Hi,
one thing coming to mind is maybe the device names have changed from  
/dev/sdX to /dev/sdY? Something like that has been reported a couple  
of times in the last months.


Zitat von Alison Peisker :


Hi all,

We rebooted all the nodes in our 17.2.5 cluster after performing  
kernel updates, but 2 of the OSDs on different nodes are not coming  
back up. This is a production cluster using cephadm.


The error message from the OSD log is ceph-osd[87340]:  ** ERROR:  
unable to open OSD superblock on /var/lib/ceph/osd/ceph-665: (2) No  
such file or directory


The error message from ceph-volume is 2023-08-23T16:12:43.452-0500  
7f0cad968600  2  
bluestore(/dev/mapper/ceph--febad5a5--ba44--41aa--a39e--b9897f757752-osd--block--87e548f4--b9b5--4ed8--aca8--de703a341a50) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed  
input


We tried restarting the daemons and rebooting the node again, but  
still see the same error.

Has anyone experienced this issue before? How do we fix this?

Thanks,
Alison
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] A couple OSDs not starting after host reboot

2023-08-25 Thread Alison Peisker
Hi all,

We rebooted all the nodes in our 17.2.5 cluster after performing kernel 
updates, but 2 of the OSDs on different nodes are not coming back up. This is a 
production cluster using cephadm.

The error message from the OSD log is ceph-osd[87340]:  ** ERROR: unable to 
open OSD superblock on /var/lib/ceph/osd/ceph-665: (2) No such file or directory

The error message from ceph-volume is 2023-08-23T16:12:43.452-0500 7f0cad968600 
 2 
bluestore(/dev/mapper/ceph--febad5a5--ba44--41aa--a39e--b9897f757752-osd--block--87e548f4--b9b5--4ed8--aca8--de703a341a50)
 _read_bdev_label unable to decode label at offset 102: void 
bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) 
decode past end of struct encoding: Malformed input

We tried restarting the daemons and rebooting the node again, but still see the 
same error.
Has anyone experienced this issue before? How do we fix this?

Thanks,
Alison
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm to setup wal/db on nvme

2023-08-25 Thread Satish Patel
Thank you for reply,

I have created two class SSD and NvME and assigned them to crush maps.

$ ceph osd crush rule ls
replicated_rule
ssd_pool
nvme_pool


Running benchmarks on nvme is the worst performing. SSD showing much better
results compared to NvME. NvME model is Samsung_SSD_980_PRO_1TB

 NvME pool benchmark with 3x replication

# rados -p test-nvme -t 64 -b 4096 bench 10 write
hints = 1
Maintaining 64 concurrent writes of 4096 bytes to objects of size 4096 for
up to 10 seconds or 0 objects
Object prefix: benchmark_data_os-ctrl1_1931595
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
0
1  64  5541  5477   21.3917   21.3945   0.0134898
0.0116529
2  64 11209 11145   21.7641   22.1406  0.00939951
0.0114506
3  64 17036 16972   22.0956   22.7617  0.00938263
0.0112938
4  64 23187 23123   22.5776   24.0273  0.00863939
0.0110473
5  64 29753 29689   23.1911   25.6484  0.00925603
0.0107662
6  64 36222 36158   23.5369   25.2695   0.0100759
 0.010606
7  63 42997 42934   23.9551   26.4688  0.00902186
0.0104246
8  64 49859 49795   24.3102   26.8008  0.00884379
0.0102765
9  64 56429 56365   24.4601   25.6641  0.00989885
0.0102124
   10  31 62727 62696   24.4869   24.7305   0.0115833
0.0102027
Total time run: 10.0064
Total writes made:  62727
Write size: 4096
Object size:4096
Bandwidth (MB/sec): 24.4871
Stddev Bandwidth:   1.85423
Max bandwidth (MB/sec): 26.8008   <   Only 26MB/s for nvme
disk
Min bandwidth (MB/sec): 21.3945
Average IOPS:   6268
Stddev IOPS:474.683
Max IOPS:   6861
Min IOPS:   5477
Average Latency(s): 0.0102022
Stddev Latency(s):  0.00170505
Max latency(s): 0.0365743
Min latency(s): 0.00641319
Cleaning up (deleting benchmark objects)
Removed 62727 objects
Clean up completed and total clean up time :8.23223



### SSD pool benchmark

(venv-openstack) root@os-ctrl1:~# rados -p test-ssd -t 64 -b 4096 bench 10
write
hints = 1
Maintaining 64 concurrent writes of 4096 bytes to objects of size 4096 for
up to 10 seconds or 0 objects
Object prefix: benchmark_data_os-ctrl1_1933383
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
0   0 0 0 0 0   -
0
1  63 43839 43776   170.972   171 0.000991462
 0.00145833
2  64 92198 92134   179.921   188.898  0.00211419
 0.001387
3  64141917141853   184.675   194.215  0.00106326
 0.00135174
4  63193151193088   188.534   200.137  0.00179379
 0.00132423
5  63243104243041   189.847   195.129 0.000831263
 0.00131512
6  63291045290982   189.413187.27  0.00120208
 0.00131807
7  64341295341231   190.391   196.285  0.00102127
 0.00131137
8  63393336393273   191.999   203.289 0.000958149
 0.00130041
9  63442459442396   191.983   191.887  0.00123453
 0.00130053
Total time run: 10.0008
Total writes made:  488729
Write size: 4096
Object size:4096
Bandwidth (MB/sec): 190.894
Stddev Bandwidth:   9.35224
Max bandwidth (MB/sec): 203.289
Min bandwidth (MB/sec): 171
Average IOPS:   48868
Stddev IOPS:2394.17
Max IOPS:   52042
Min IOPS:   43776
Average Latency(s): 0.00130796
Stddev Latency(s):  0.000604629
Max latency(s): 0.0268462
Min latency(s): 0.000628738
Cleaning up (deleting benchmark objects)
Removed 488729 objects
Clean up completed and total clean up time :8.84114








On Wed, Aug 23, 2023 at 1:25 PM Adam King  wrote:

> this should be possible by specifying a "data_devices" and "db_devices"
> fields in the OSD spec file each with different filters. There's some
> examples in the docs
> https://docs.ceph.com/en/latest/cephadm/services/osd/#the-simple-case that
> show roughly how that's done, and some other sections (
> https://docs.ceph.com/en/latest/cephadm/services/osd/#filters) that go
> more in depth on the different filtering options available so you can try
> and find one that works for your disks. You can check the output of "ceph
> orch device ls --format json | jq" to see things like what cephadm
> considers the model, size etc. for the devices to be for use in the
> filtering.
>
> On Wed, Aug 23, 2023 at 1:13 PM Satish Patel  wrote:
>
>> Folks,
>>
>> I have 3 nodes with each having 1x NvME (1TB) and 3x 2.9TB SSD. Trying to
>> build ceph storage using cephadm on Ubuntu 22.04 distro.
>>
>> If I want to use NvME for Journaling (WAL/DB) for my SSD based OSDs then
>> how does cephadm handle it?
>>
>> Trying to find a document where I can tell cephadm to deploy 

[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-25 Thread Ilya Dryomov
On Fri, Aug 25, 2023 at 5:26 PM Laura Flores  wrote:
>
> All known issues in pacific p2p and smoke. @Ilya Dryomov
>  and @Casey Bodley  may want to
> double-check that the two for pacific p2p are acceptable, but they are
> known.
>
> pacific p2p:
> - TestClsRbd.mirror_snapshot failure in pacific p2p - Ceph - RBD
>https://tracker.ceph.com/issues/62586

I confirm that this is a known, expected failure.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-25 Thread Yuri Weinstein
Josh, Neha

We are looking good but missing some clarifications (see Laura's replies
below)

If you approve 16.2.14 as is we can start building today.

PLMK

On Fri, Aug 25, 2023 at 8:37 AM Yuri Weinstein  wrote:

> Thx Laura
>
> On the issue related to the smoke suite, pls see
> https://tracker.ceph.com/issues/62508
> @Venky Shankar  ^
>
> On Fri, Aug 25, 2023 at 8:25 AM Laura Flores  wrote:
>
>> All known issues in pacific p2p and smoke. @Ilya Dryomov
>>  and @Casey Bodley  may want to
>> double-check that the two for pacific p2p are acceptable, but they are
>> known.
>>
>> pacific p2p:
>> - TestClsRbd.mirror_snapshot failure in pacific p2p - Ceph - RBD
>>https://tracker.ceph.com/issues/62586
>> - "[ FAILED ] CmpOmap.cmp_vals_u64_invalid_default" in
>> upgrade:pacific-p2p-pacific - Ceph - RGW
>>https://tracker.ceph.com/issues/52590
>>
>> smoke
>> - cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an
>> application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
>>   https://tracker.ceph.com/issues/59192
>>
>> On Fri, Aug 25, 2023 at 9:41 AM Laura Flores  wrote:
>>
>>> Wanted to bring this tracker to attention:
>>> https://tracker.ceph.com/issues/58946
>>>
>>> Users have recently reported experiencing this bug in v16.2.13. There is
>>> a main fix available, but it's currently undergoing testing. @Adam King
>>>  what are your thoughts on getting this fix into
>>> 16.2.14?
>>>
>>> On Fri, Aug 25, 2023 at 9:12 AM Travis Nielsen 
>>> wrote:
>>>
 Approved for rook.

 For future approvals, Blaine or I could approve, as Seb is on another
 project now.

 Thanks,
 Travis

 On Fri, Aug 25, 2023 at 7:06 AM Venky Shankar 
 wrote:

> On Fri, Aug 25, 2023 at 7:17 AM Patrick Donnelly 
> wrote:
> >
> > On Wed, Aug 23, 2023 at 10:41 AM Yuri Weinstein 
> wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/62527#note-1
> > > Release Notes - TBD
> > >
> > > Seeking approvals for:
> > >
> > > smoke - Venky
> > > rados - Radek, Laura
> > >   rook - Sébastien Han
> > >   cephadm - Adam K
> > >   dashboard - Ernesto
> > >
> > > rgw - Casey
> > > rbd - Ilya
> > > krbd - Ilya
> > > fs - Venky, Patrick
> >
> > approved
> >
> > https://tracker.ceph.com/projects/cephfs/wiki/Pacific#2023-August-22
>
> You beat me to this. Thanks, Patrick.
>
> >
> >
> > --
> > Patrick Donnelly, Ph.D.
> > He / Him / His
> > Red Hat Partner Engineer
> > IBM, Inc.
> > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Cheers,
> Venky
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
 ___
 Dev mailing list -- d...@ceph.io
 To unsubscribe send an email to dev-le...@ceph.io

>>>
>>>
>>> --
>>>
>>> Laura Flores
>>>
>>> She/Her/Hers
>>>
>>> Software Engineer, Ceph Storage 
>>>
>>> Chicago, IL
>>>
>>> lflo...@ibm.com | lflo...@redhat.com 
>>> M: +17087388804
>>>
>>>
>>>
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage 
>>
>> Chicago, IL
>>
>> lflo...@ibm.com | lflo...@redhat.com 
>> M: +17087388804
>>
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-25 Thread Yuri Weinstein
Thx Laura

On the issue related to the smoke suite, pls see
https://tracker.ceph.com/issues/62508
@Venky Shankar  ^

On Fri, Aug 25, 2023 at 8:25 AM Laura Flores  wrote:

> All known issues in pacific p2p and smoke. @Ilya Dryomov
>  and @Casey Bodley  may want to
> double-check that the two for pacific p2p are acceptable, but they are
> known.
>
> pacific p2p:
> - TestClsRbd.mirror_snapshot failure in pacific p2p - Ceph - RBD
>https://tracker.ceph.com/issues/62586
> - "[ FAILED ] CmpOmap.cmp_vals_u64_invalid_default" in
> upgrade:pacific-p2p-pacific - Ceph - RGW
>https://tracker.ceph.com/issues/52590
>
> smoke
> - cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an
> application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
>   https://tracker.ceph.com/issues/59192
>
> On Fri, Aug 25, 2023 at 9:41 AM Laura Flores  wrote:
>
>> Wanted to bring this tracker to attention:
>> https://tracker.ceph.com/issues/58946
>>
>> Users have recently reported experiencing this bug in v16.2.13. There is
>> a main fix available, but it's currently undergoing testing. @Adam King
>>  what are your thoughts on getting this fix into
>> 16.2.14?
>>
>> On Fri, Aug 25, 2023 at 9:12 AM Travis Nielsen 
>> wrote:
>>
>>> Approved for rook.
>>>
>>> For future approvals, Blaine or I could approve, as Seb is on another
>>> project now.
>>>
>>> Thanks,
>>> Travis
>>>
>>> On Fri, Aug 25, 2023 at 7:06 AM Venky Shankar 
>>> wrote:
>>>
 On Fri, Aug 25, 2023 at 7:17 AM Patrick Donnelly 
 wrote:
 >
 > On Wed, Aug 23, 2023 at 10:41 AM Yuri Weinstein 
 wrote:
 > >
 > > Details of this release are summarized here:
 > >
 > > https://tracker.ceph.com/issues/62527#note-1
 > > Release Notes - TBD
 > >
 > > Seeking approvals for:
 > >
 > > smoke - Venky
 > > rados - Radek, Laura
 > >   rook - Sébastien Han
 > >   cephadm - Adam K
 > >   dashboard - Ernesto
 > >
 > > rgw - Casey
 > > rbd - Ilya
 > > krbd - Ilya
 > > fs - Venky, Patrick
 >
 > approved
 >
 > https://tracker.ceph.com/projects/cephfs/wiki/Pacific#2023-August-22

 You beat me to this. Thanks, Patrick.

 >
 >
 > --
 > Patrick Donnelly, Ph.D.
 > He / Him / His
 > Red Hat Partner Engineer
 > IBM, Inc.
 > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
 > ___
 > ceph-users mailing list -- ceph-users@ceph.io
 > To unsubscribe send an email to ceph-users-le...@ceph.io



 --
 Cheers,
 Venky
 ___
 Dev mailing list -- d...@ceph.io
 To unsubscribe send an email to dev-le...@ceph.io

>>> ___
>>> Dev mailing list -- d...@ceph.io
>>> To unsubscribe send an email to dev-le...@ceph.io
>>>
>>
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage 
>>
>> Chicago, IL
>>
>> lflo...@ibm.com | lflo...@redhat.com 
>> M: +17087388804
>>
>>
>>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-25 Thread Laura Flores
All known issues in pacific p2p and smoke. @Ilya Dryomov
 and @Casey Bodley  may want to
double-check that the two for pacific p2p are acceptable, but they are
known.

pacific p2p:
- TestClsRbd.mirror_snapshot failure in pacific p2p - Ceph - RBD
   https://tracker.ceph.com/issues/62586
- "[ FAILED ] CmpOmap.cmp_vals_u64_invalid_default" in
upgrade:pacific-p2p-pacific - Ceph - RGW
   https://tracker.ceph.com/issues/52590

smoke
- cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do not have an
application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
  https://tracker.ceph.com/issues/59192

On Fri, Aug 25, 2023 at 9:41 AM Laura Flores  wrote:

> Wanted to bring this tracker to attention:
> https://tracker.ceph.com/issues/58946
>
> Users have recently reported experiencing this bug in v16.2.13. There is a
> main fix available, but it's currently undergoing testing. @Adam King
>  what are your thoughts on getting this fix into
> 16.2.14?
>
> On Fri, Aug 25, 2023 at 9:12 AM Travis Nielsen 
> wrote:
>
>> Approved for rook.
>>
>> For future approvals, Blaine or I could approve, as Seb is on another
>> project now.
>>
>> Thanks,
>> Travis
>>
>> On Fri, Aug 25, 2023 at 7:06 AM Venky Shankar 
>> wrote:
>>
>>> On Fri, Aug 25, 2023 at 7:17 AM Patrick Donnelly 
>>> wrote:
>>> >
>>> > On Wed, Aug 23, 2023 at 10:41 AM Yuri Weinstein 
>>> wrote:
>>> > >
>>> > > Details of this release are summarized here:
>>> > >
>>> > > https://tracker.ceph.com/issues/62527#note-1
>>> > > Release Notes - TBD
>>> > >
>>> > > Seeking approvals for:
>>> > >
>>> > > smoke - Venky
>>> > > rados - Radek, Laura
>>> > >   rook - Sébastien Han
>>> > >   cephadm - Adam K
>>> > >   dashboard - Ernesto
>>> > >
>>> > > rgw - Casey
>>> > > rbd - Ilya
>>> > > krbd - Ilya
>>> > > fs - Venky, Patrick
>>> >
>>> > approved
>>> >
>>> > https://tracker.ceph.com/projects/cephfs/wiki/Pacific#2023-August-22
>>>
>>> You beat me to this. Thanks, Patrick.
>>>
>>> >
>>> >
>>> > --
>>> > Patrick Donnelly, Ph.D.
>>> > He / Him / His
>>> > Red Hat Partner Engineer
>>> > IBM, Inc.
>>> > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>>> > ___
>>> > ceph-users mailing list -- ceph-users@ceph.io
>>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Venky
>>> ___
>>> Dev mailing list -- d...@ceph.io
>>> To unsubscribe send an email to dev-le...@ceph.io
>>>
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
>>
>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-25 Thread Laura Flores
Wanted to bring this tracker to attention:
https://tracker.ceph.com/issues/58946

Users have recently reported experiencing this bug in v16.2.13. There is a
main fix available, but it's currently undergoing testing. @Adam King
 what are your thoughts on getting this fix into 16.2.14?

On Fri, Aug 25, 2023 at 9:12 AM Travis Nielsen  wrote:

> Approved for rook.
>
> For future approvals, Blaine or I could approve, as Seb is on another
> project now.
>
> Thanks,
> Travis
>
> On Fri, Aug 25, 2023 at 7:06 AM Venky Shankar  wrote:
>
>> On Fri, Aug 25, 2023 at 7:17 AM Patrick Donnelly 
>> wrote:
>> >
>> > On Wed, Aug 23, 2023 at 10:41 AM Yuri Weinstein 
>> wrote:
>> > >
>> > > Details of this release are summarized here:
>> > >
>> > > https://tracker.ceph.com/issues/62527#note-1
>> > > Release Notes - TBD
>> > >
>> > > Seeking approvals for:
>> > >
>> > > smoke - Venky
>> > > rados - Radek, Laura
>> > >   rook - Sébastien Han
>> > >   cephadm - Adam K
>> > >   dashboard - Ernesto
>> > >
>> > > rgw - Casey
>> > > rbd - Ilya
>> > > krbd - Ilya
>> > > fs - Venky, Patrick
>> >
>> > approved
>> >
>> > https://tracker.ceph.com/projects/cephfs/wiki/Pacific#2023-August-22
>>
>> You beat me to this. Thanks, Patrick.
>>
>> >
>> >
>> > --
>> > Patrick Donnelly, Ph.D.
>> > He / Him / His
>> > Red Hat Partner Engineer
>> > IBM, Inc.
>> > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>> --
>> Cheers,
>> Venky
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
>>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-25 Thread Travis Nielsen
Approved for rook.

For future approvals, Blaine or I could approve, as Seb is on another
project now.

Thanks,
Travis

On Fri, Aug 25, 2023 at 7:06 AM Venky Shankar  wrote:

> On Fri, Aug 25, 2023 at 7:17 AM Patrick Donnelly 
> wrote:
> >
> > On Wed, Aug 23, 2023 at 10:41 AM Yuri Weinstein 
> wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/62527#note-1
> > > Release Notes - TBD
> > >
> > > Seeking approvals for:
> > >
> > > smoke - Venky
> > > rados - Radek, Laura
> > >   rook - Sébastien Han
> > >   cephadm - Adam K
> > >   dashboard - Ernesto
> > >
> > > rgw - Casey
> > > rbd - Ilya
> > > krbd - Ilya
> > > fs - Venky, Patrick
> >
> > approved
> >
> > https://tracker.ceph.com/projects/cephfs/wiki/Pacific#2023-August-22
>
> You beat me to this. Thanks, Patrick.
>
> >
> >
> > --
> > Patrick Donnelly, Ph.D.
> > He / Him / His
> > Red Hat Partner Engineer
> > IBM, Inc.
> > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Cheers,
> Venky
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-25 Thread Venky Shankar
On Fri, Aug 25, 2023 at 7:17 AM Patrick Donnelly  wrote:
>
> On Wed, Aug 23, 2023 at 10:41 AM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/62527#note-1
> > Release Notes - TBD
> >
> > Seeking approvals for:
> >
> > smoke - Venky
> > rados - Radek, Laura
> >   rook - Sébastien Han
> >   cephadm - Adam K
> >   dashboard - Ernesto
> >
> > rgw - Casey
> > rbd - Ilya
> > krbd - Ilya
> > fs - Venky, Patrick
>
> approved
>
> https://tracker.ceph.com/projects/cephfs/wiki/Pacific#2023-August-22

You beat me to this. Thanks, Patrick.

>
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Red Hat Partner Engineer
> IBM, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: lun allocation failure

2023-08-25 Thread Eugen Block

On a Pacific cluster I have the same error message:

---snip---
023-08-25T11:56:47.222407+02:00 ses7-host1 conmon[1383161]: debug  
(LUN.add_dev_to_lio) Adding image 'iscsi-pool/image3' to LIO backstore  
rbd
2023-08-25T11:56:47.714375+02:00 ses7-host1 kernel: [12861743.121824]  
rbd: rbd1: capacity 5368709120 features 0x3d

2023-08-25T11:56:47.746390+02:00 ses7-host1 conmon[1383161]: /dev/rbd1
2023-08-25T11:56:47.764072+02:00 ses7-host1 conmon[1383161]: debug  
failed to add iscsi-pool/image3 to LIO - error([Errno 22] Invalid  
argument)
2023-08-25T11:56:47.764563+02:00 ses7-host1 conmon[1383161]: debug LUN  
alloc problem - failed to add iscsi-pool/image3 to LIO - error([Errno  
22] Invalid argument)
2023-08-25T11:56:47.766292+02:00 ses7-host1 kernel: [12861743.172626]  
target_core_rbd: RBD: emulate_legacy_capacity must be disabled for  
RBD_FEATURE_OBJECT_MAP images
2023-08-25T11:56:47.766304+02:00 ses7-host1 kernel: [12861743.172653]  
target_core_rbd: RBD: emulate_legacy_capacity must be disabled for  
RBD_FEATURE_OBJECT_MAP images
2023-08-25T11:56:47.770242+02:00 ses7-host1 conmon[1383161]: debug  
:::127.0.0.1 - - [25/Aug/2023 09:56:47] "#033[35m#033[1mPUT  
/api/_disk/iscsi-pool/image3 HTTP/1.1#033[0m" 500 -
2023-08-25T11:56:47.770640+02:00 ses7-host1 conmon[1383161]:  
:::127.0.0.1 - - [25/Aug/2023 09:56:47] "#033[35m#033[1mPUT  
/api/_disk/iscsi-pool/image3 HTTP/1.1#033[0m" 500 -
2023-08-25T11:56:47.772916+02:00 ses7-host1 conmon[1383161]: debug  
_disk change on localhost failed with 500
2023-08-25T11:56:47.773753+02:00 ses7-host1 conmon[1383161]: debug  
:::192.168.168.81 - - [25/Aug/2023 09:56:47] "#033[35m#033[1mPUT  
/api/disk/iscsi-pool/image3 HTTP/1.1#033[0m" 500 -
2023-08-25T11:56:47.776186+02:00 ses7-host1 conmon[1741104]: iscsi  
REST API failed PUT req status: 500
2023-08-25T11:56:47.777479+02:00 ses7-host1 conmon[1741104]: Error  
while calling Task(ns=iscsi/target/edit, md={'target_iqn':  
'iqn.2001-07.com.ceph:1692955254223'})
2023-08-25T11:56:47.777689+02:00 ses7-host1 conmon[1741104]: Traceback  
(most recent call last):
2023-08-25T11:56:47.777900+02:00 ses7-host1 conmon[1741104]:   File  
"/usr/share/ceph/mgr/dashboard/controllers/iscsi.py", line 789, in  
_create
2023-08-25T11:56:47.778333+02:00 ses7-host1 conmon[1741104]:  
controls=controls)
2023-08-25T11:56:47.778725+02:00 ses7-host1 conmon[1741104]:   File  
"/usr/share/ceph/mgr/dashboard/rest_client.py", line 535, in  
func_wrapper

2023-08-25T11:56:47.778879+02:00 ses7-host1 conmon[1741104]: **kwargs)
2023-08-25T11:56:47.779103+02:00 ses7-host1 conmon[1741104]:   File  
"/usr/share/ceph/mgr/dashboard/services/iscsi_client.py", line 126, in  
create_disk

2023-08-25T11:56:47.779264+02:00 ses7-host1 conmon[1741104]: 'wwn': wwn
2023-08-25T11:56:47.779412+02:00 ses7-host1 conmon[1741104]:   File  
"/usr/share/ceph/mgr/dashboard/rest_client.py", line 324, in __call__
2023-08-25T11:56:47.779549+02:00 ses7-host1 conmon[1741104]: data,  
raw_content, headers)
2023-08-25T11:56:47.779715+02:00 ses7-host1 conmon[1741104]:   File  
"/usr/share/ceph/mgr/dashboard/rest_client.py", line 453, in do_request

2023-08-25T11:56:47.779861+02:00 ses7-host1 conmon[1741104]: resp.content)
2023-08-25T11:56:47.780025+02:00 ses7-host1 conmon[1741104]:  
dashboard.rest_client.RequestException: iscsi REST API failed request  
with status code 500
2023-08-25T11:56:47.780189+02:00 ses7-host1 conmon[1741104]:  
(b'{"message":"disk create/update failed on ses7-host1. LUN all'
2023-08-25T11:56:47.780332+02:00 ses7-host1 conmon[1741104]:   
b'ocation failure"}\n')

---snip---

I'm wondering why on Pacific I have the rbd image mapped while on Reef  
I don't. But isn't iscsi-gw deprecated [1]?


The iSCSI gateway is in maintenance as of November 2022. This means  
that it is no longer in active development and will not be updated  
to add new features.


Not sure if it makes sense to dig deeper here...

[1] https://docs.ceph.com/en/quincy/rbd/iscsi-overview/

Zitat von Eugen Block :


Hi,

that's quite interesting, I tried to reproduce with 18.2.0 but it  
worked for me. The cluster runs on openSUSE Leap 15.4. There are two  
things that seem to differ in my attempt.

1. I had to run 'modprobe iscsi_target_mod' to get rid of this error message:

(b'{"message":"iscsi target \'init\' process failed for  
iqn.2001-07.com.ceph:'

b'1692952282206 - Could not load module: iscsi_target_mod"}\n')

2. I don't have tcmu runner up which is reported every 10 seconds  
but it seems to work anyway:


debug there is no tcmu-runner data available

In a Pacific test cluster I do see that a tcmu runner is deployed  
alongside the iscsi gw. Anyway, this is my output:


---snip---
[...]
  o- disks  
. [10G, Disks:  
1]
  | o- test-pool  
. 

[ceph-users] Re: lun allocation failure

2023-08-25 Thread Eugen Block

Hi,

that's quite interesting, I tried to reproduce with 18.2.0 but it  
worked for me. The cluster runs on openSUSE Leap 15.4. There are two  
things that seem to differ in my attempt.

1. I had to run 'modprobe iscsi_target_mod' to get rid of this error message:

(b'{"message":"iscsi target \'init\' process failed for iqn.2001-07.com.ceph:'
b'1692952282206 - Could not load module: iscsi_target_mod"}\n')

2. I don't have tcmu runner up which is reported every 10 seconds but  
it seems to work anyway:


debug there is no tcmu-runner data available

In a Pacific test cluster I do see that a tcmu runner is deployed  
alongside the iscsi gw. Anyway, this is my output:


---snip---
[...]
  o- disks  
. [10G, Disks:  
1]
  | o- test-pool  
. [test-pool  
(10G)]
  |   o- image1  
.. [test-pool/image1 (Unknown,  
10G)]
  o- iscsi-targets  
... [DiscoveryAuth: None, Targets:  
1]
o- iqn.2001-07.com.ceph:1692952282206  
  
[Auth: None, Gateways: 1]
  o- disks  
.. [Disks:  
1]
  | o- test-pool/image1  
. [Owner:  
reef01, Lun: 0]
  o- gateways  
 [Up: 1/1, Portals:  
1]
  | o- reef01  
 [>IP>  
(UP)]
  o- host-groups  
.. [Groups :  
0]
  o- hosts  
.. [Auth: ACL_DISABLED, Hosts:  
0]

---snip---

This is my exact ceph version:

"ceph version 18.2.0  
(5dd24139a1eada541a3bc16b6941c5dde975e26d) reef (stable)": 3



Zitat von Opánszki Gábor :


Hi folks,

we deployed new reef cluster to our lab.

all of the nodes are up and running, but we can't allocate lun to target.

on the gui we got "disk create/update failed on ceph-iscsigw0. LUN  
allocation failure" message.


We created images on gui

do you have any idea?

Thanks

root@ceph-mgr0:~# ceph -s
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
  cluster:
    id: ad0aede2-4100-11ee-bc14-1c40244f5c21
    health: HEALTH_OK

  services:
    mon: 5 daemons, quorum  
ceph-mgr0,ceph-mgr1,ceph-osd5,ceph-osd7,ceph-osd6 (age 28h)
    mgr: ceph-mgr0.sapbav(active, since 45h), standbys:  
ceph-mgr1.zwzyuc

    osd: 44 osds: 44 up (since 4h), 44 in (since 4h)
    tcmu-runner: 1 portal active (1 hosts)

  data:
    pools:   5 pools, 3074 pgs
    objects: 27 objects, 453 KiB
    usage:   30 GiB used, 101 TiB / 101 TiB avail
    pgs: 3074 active+clean

  io:
    client:   2.7 KiB/s rd, 2 op/s rd, 0 op/s wr

root@ceph-mgr0:~#

root@ceph-mgr0:~# rados lspools
.mgr
ace1
1T-r3-01
ace0
x
root@ceph-mgr0:~# rbd ls 1T-r3-01
111

bb
pool2
teszt
root@ceph-mgr0:~# rbd ls x
x-a
root@ceph-mgr0:~#

root@ceph-mgr0:~# rbd info 1T-r3-01/111
rbd image '111':
    size 1 GiB in 256 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 5f927ce161de
    block_name_prefix: rbd_data.5f927ce161de
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:
    flags:
    create_timestamp: Thu Aug 24 17:33:37 2023
    access_timestamp: Thu Aug 24 17:33:37 2023
    modify_timestamp: Thu Aug 24 17:33:37 2023
root@ceph-mgr0:~# rbd info 1T-r3-01/
rbd image '':
    size 1 GiB in 256 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 5f926a0e299f
    block_name_prefix: rbd_data.5f926a0e299f
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:
    flags:
    create_timestamp: Thu Aug 24 17:18:06 2023
    access_timestamp: Thu Aug 24 17:18:06 2023
    modify_timestamp: Thu Aug 24 17:18:06 2023
root@ceph-mgr0:~# rbd info x/x-a
rbd image 'x-a':
    size 1 GiB in 256 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 5f922dbdf6c6
    block_name_prefix: rbd_data.5f922dbdf6c6
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
    op_features:
    flags:
    create_timestamp: Thu Aug 24 17:48:28 2023
    access_timestamp: Thu Aug 24 17:48:28 2023
    modify_timestamp: Thu Aug 24 17:48:28 2023
root@ceph-mgr0:~#

root@ceph-mgr0:~# ceph orch ls --service_type iscsi
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH 

[ceph-users] Re: rgw replication sync issue

2023-08-25 Thread Eugen Block

Hi,

according to [1] the error 125 means there was a race condition:


failed to sync bucket instance: (125) Operation canceled
A racing condition exists between writes to the same RADOS object.


Can you rewrite just the affected object? Not sure about the other  
error, maybe try rewriting that object as well? But I'm not sure how  
that would lead to a 25 TB difference. Or could this condition impact  
the entire sync? Hopefully someone with more multisite knowledge can  
comment. Is ceph healthy? No inactive PGs or anything?


[1]  
https://www.ibm.com/docs/en/storage-ceph/6?topic=gateway-error-code-definitions-ceph-object


Zitat von ankit raikwar :


Hello Users,

  We have the environment as below. Both environments are  
the zones of one RGW multisite zonegroup, whereas the DC zone is the  
primary and the DR zone is the secondary at this point.


DC

 Ceph Version: 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)  
quincy (stable)

 Number of rgw daemons : 25

DR
 Ceph Version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)  
quincy (stable)

 Number of rgw daemons : 25

Environment description:
   Both the mentioned zones are in production and the  
RGW multisite bandwidth is over MPLS of around 3 Gbps.


Issue description :
   We have enabled the multisite between DC-DR almost  
around a month ago. The total data at the DC zone is around 159 TiB  
and the sync has been going as expected . But when the sync had gone  
around 120 TiB we saw the speed drastically fell low, the ideal was  
around 2 Gbps, and it fell way below 10 Mbps though the link is not  
saturated. After checking "# radosgw-admin sync status " the output  
says "metadata is caught up with master" and "data is caught up with  
source" but with almost 25 TB data behind as compared to DC. It also  
looks like the sync status of the bucket " radosgw-admin bucket sync  
status --bucket=" still bucket is behind shards.  
Attaching the log and the output below.


   The possibility of issuing a resync of the data  
from the beginning is quite low and not feasible in our case. The "#  
radosgw-admin sync error list" output is also attached with some  
information redacted and we see errors.


radosgw-sync status


 radosgw-admin sync status
  realm 6a7fab77-64e3-453e-b54b-066bc8af2f00 (realm0)
  zonegroup be660604-d853-4f8e-a576-579cae2e07c2 (zg0)
   zone d06a8dd3-5bcb-486c-945b-2a98969ccd5f (fbd)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: d09d3d16-8601-448b-bf3d-609b8a29647d (ahd)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source

 radosgw-admin bucket sync status --bucket=

 realm 6a7fab77-64e3-453e-b54b-066bc8af2f00 (realm0)
  zonegroup be660604-d853-4f8e-a576-579cae2e07c2 (zg0)
   zone d06a8dd3-5bcb-486c-945b-2a98969ccd5f (fbd)
 bucket :tc**rc-b1[d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1])

source zone d09d3d16-8601-448b-bf3d-609b8a29647d (ahd)
  source bucket  
:tc***arc-b1[d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1])

full sync: 14/9221 shards
full sync: 49448693 objects completed
incremental sync: 9207/9221 shards
bucket is behind on 25 shards
behind shards:  
[9,111,590,826,1774,2968,3132,3382,3386,3409,3685,3820,4174,4544,4708,4811,5733,6285,6558,7288,7417,7443,7876,8151,8878]


Error:  radosgw-admin sync error list

 "id": "1_1690799008.725414_3926410.1",
"section": "data",
"name":  
"bucket0:d09d3d16-8601-448b-bf3d-609b8a29647d.89871.1:1949",

"timestamp": "2023-07-31T10:23:28.725414Z",
"info": {
"source_zone": "d09d3d16-8601-448b-bf3d-609b8a29647d",
"error_code": 125,
"message": "failed to sync bucket instance:  
(125) Operation canceled"


 "id": "1_1690804503.144829_3759212.1",
"section": "data",
"name":  
"bucket1:d09d3d16-8601-448b-bf3d-609b8a29647d.38987.1:1232/S01/1/120/2b7ea802-efad-41d3-9d90-9**523.txt",

"timestamp": "2023-07-31T11:54:53.233451Z",
"info": {
"source_zone": "d09d3d16-8601-448b-bf3d-609b8a29647d",
"error_code": 5,
"message": "failed to sync object(5) Input/output error"



Thanks
Ankit
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 

[ceph-users] Re: Can ceph-volume manage the LVs optionally used for DB / WAL at all?

2023-08-25 Thread Eugen Block

Hi,

I'm still not sure if we're on the same page.

By looking at  
https://docs.ceph.com/en/latest/man/8/ceph-volume/#cmdoption-ceph-volume-lvm-prepare-block.db it seems that ceph-volume wants an LV or partition. So it's apparently not just taking a VG itself? Also if there were multiple VGs / devices , I likely would need to at least pick  
those.


ceph-volume creates all required VGs/LVs automatically, and the OSD  
creation happens in batch mode, for example when run by cephadm:


ceph-volume lvm batch --yes /dev/sdb /dev/sdc /dev/sdd

In a non-cephadm deployment you can fiddle with ceph-volume manually,  
where you also can deploy single OSDs, with or without providing your  
own pre-built VGs/LVs. In a cephadm deployment manually creating OSDs  
will result in "stray daemons not managed by cephadm" warnings.
Before we upgraded to Pacific we did manage our block.db devices  
manually with pre-built LVs, e.g.:


$ lvcreate -L 30G -n bluefsdb-30 ceph-journals
$ ceph-volume lvm create --data /dev/sdh --block.db ceph-journals/bluefsdb-30

Sorry for the confusion. I was not talking about any migrations,  
just the initial creation of spinning rust OSDs with DB or WAL on  
fast storage.


So the question is, is your cluster (or multiple clusters) managed  
cephadm? If so, you don't need to worry about ceph-volume, it will be  
handled for you in batch mode (you can inspect the ceph-volume.log  
afterwards). You just need to provide a yaml file that fits your needs  
with regards to block.db and data devices.


Zitat von Christian Rohmann :


On 11.08.23 16:06, Eugen Block wrote:
if you deploy OSDs from scratch you don't have to create LVs  
manually, that is handled entirely by ceph-volume (for example on  
cephadm based clusters you only provide a drivegroup definition).


By looking at  
https://docs.ceph.com/en/latest/man/8/ceph-volume/#cmdoption-ceph-volume-lvm-prepare-block.db it seems that ceph-volume wants an LV or partition. So it's apparently not just taking a VG itself? Also if there were multiple VGs / devices , I likely would need to at least pick  
those.


But I suppose this orchestration would then require cephadm  
(https://docs.ceph.com/en/latest/cephadm/services/osd/#drivegroups)  
and cannot be done via ceph-volume which merely takes care of ONE  
OSD at a time.



I'm not sure if automating db/wal migration has been considered, it  
might be (too) difficult. But moving the db/wal devices to  
new/different devices doesn't seem to be a reoccuring issue (corner  
case?), so maybe having control over that process for each OSD  
individually is the safe(r) option in case something goes wrong.


Sorry for the confusion. I was not talking about any migrations,  
just the initial creation of spinning rust OSDs with DB or WAL on  
fast storage.



Regards


Christian



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can ceph-volume manage the LVs optionally used for DB / WAL at all?

2023-08-25 Thread Christian Rohmann

On 11.08.23 16:06, Eugen Block wrote:
if you deploy OSDs from scratch you don't have to create LVs manually, 
that is handled entirely by ceph-volume (for example on cephadm based 
clusters you only provide a drivegroup definition). 


By looking at 
https://docs.ceph.com/en/latest/man/8/ceph-volume/#cmdoption-ceph-volume-lvm-prepare-block.db 
it seems that ceph-volume wants an LV or partition. So it's apparently 
not just taking a VG itself? Also if there were multiple VGs / devices , 
I likely would need to at least pick those.


But I suppose this orchestration would then require cephadm 
(https://docs.ceph.com/en/latest/cephadm/services/osd/#drivegroups) and 
cannot be done via ceph-volume which merely takes care of ONE OSD at a time.



I'm not sure if automating db/wal migration has been considered, it 
might be (too) difficult. But moving the db/wal devices to 
new/different devices doesn't seem to be a reoccuring issue (corner 
case?), so maybe having control over that process for each OSD 
individually is the safe(r) option in case something goes wrong. 


Sorry for the confusion. I was not talking about any migrations, just 
the initial creation of spinning rust OSDs with DB or WAL on fast storage.



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io