[ceph-users] Re: active+recovery_unfound+degraded in Pacific

2021-04-28 Thread Konstantin Shalygin
Hi,

You should crush reweight this OSD (sde) to zero, and ceph will remap all PG to 
another OSD, after draining you may replace your drive



k

Sent from my iPhone

> On 29 Apr 2021, at 06:00, Lomayani S. Laizer  wrote:
> 
> Any advice on this. Am stuck because one VM is not working now. Looks there
> is a read error in primary osd(15) for this pg. Should i mark osd 15 down
> or out? Is there any risk of doing this?
> 
> Apr 28 20:22:31 ceph-node3 kernel: [369172.974734] sd 0:2:4:0: [sde]
> tag#358 CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
> Apr 28 20:22:31 ceph-node3 kernel: [369172.974739] blk_update_request: I/O
> error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio
> class 0
> Apr 28 21:14:11 ceph-node3 kernel: [372273.275801] sd 0:2:4:0: [sde] tag#28
> FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
> Apr 28 21:14:11 ceph-node3 kernel: [372273.275809] sd 0:2:4:0: [sde] tag#28
> CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
> Apr 28 21:14:11 ceph-node3 kernel: [372273.275813] blk_update_request: I/O
> error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio
> class 0
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Double slashes in s3 name

2021-04-28 Thread Robin H. Johnson
On Tue, Apr 27, 2021 at 11:55:15AM -0400, Gavin Chen wrote:
> Hello,
> 
> We’ve got some issues when uploading s3 objects with a double slash //
> in the name, and was wondering if anyone else has observed this issue
> with uploading objects to the radosgw?
> 
> When connecting to the cluster to upload an object with the key
> ‘test/my//bucket’ the request returns with a 403
> (SignatureDoesNotMatch) error. 
> 
> Wondering if anyone else has observed this behavior and has any
> workarounds to work with double slashes in the object key name.
I'm not aware of any issues with this, but absolutely doesn't mean it's
bug free.

For debugging it, what client are you using? I'd suggest using
debug_rgw=20 AND maximal debug on the client side, and comparing the
signature construction to see why it doesn't match.

This goes doubly if the bug exists with only one of v2 or v4 signatures!

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: active+recovery_unfound+degraded in Pacific

2021-04-28 Thread Lomayani S. Laizer
Hello,

Any advice on this. Am stuck because one VM is not working now. Looks there
is a read error in primary osd(15) for this pg. Should i mark osd 15 down
or out? Is there any risk of doing this?

Apr 28 20:22:31 ceph-node3 kernel: [369172.974734] sd 0:2:4:0: [sde]
tag#358 CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
Apr 28 20:22:31 ceph-node3 kernel: [369172.974739] blk_update_request: I/O
error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio
class 0
Apr 28 21:14:11 ceph-node3 kernel: [372273.275801] sd 0:2:4:0: [sde] tag#28
FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
Apr 28 21:14:11 ceph-node3 kernel: [372273.275809] sd 0:2:4:0: [sde] tag#28
CDB: Read(16) 88 00 00 00 00 00 51 be e7 80 00 00 00 80 00 00
Apr 28 21:14:11 ceph-node3 kernel: [372273.275813] blk_update_request: I/O
error, dev sde, sector 1371465600 op 0x0:(READ) flags 0x0 phys_seg 16 prio
class 0


On Thu, Apr 29, 2021 at 12:24 AM Lomayani S. Laizer 
wrote:

> Hello,
> Last week I upgraded my production cluster to Pacific. the cluster was
> healthy until a few hours ago.
> When scrub  run 4hrs ago  left the cluster in an inconsistent state. Then
> issued the command ceph pg repair 7.182 to try to repair the cluster but
> ended with active+recovery_unfound+degraded
>
> All OSDs are up and all running bluestore with replication of 3 and
> minimum size of 2. I have restarted all OSD but still not helping.
>
> Any recommendations on how to recover the cluster safely?
>
> I have attached result of ceph pg 7.182 query
>
>  ceph health detail
> HEALTH_ERR 1/2459601 objects unfound (0.000%); Possible data damage: 1 pg
> recovery_unfound; Degraded data redundancy: 3/7045706 objects degraded
> (0.000%), 1 pg degraded
> [WRN] OBJECT_UNFOUND: 1/2459601 objects unfound (0.000%)
> pg 7.182 has 1 unfound objects
> [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
> pg 7.182 is active+recovery_unfound+degraded, acting [15,1,11], 1
> unfound
> [WRN] PG_DEGRADED: Degraded data redundancy: 3/7045706 objects degraded
> (0.000%), 1 pg degraded
> pg 7.182 is active+recovery_unfound+degraded, acting [15,1,11], 1
> unfound
>
>
>
> ceph -w
>   cluster:
> id: 4b9f6959-fead-4ada-ac58-de5d7b149286
> health: HEALTH_ERR
> 1/2459586 objects unfound (0.000%)
> Possible data damage: 1 pg recovery_unfound
> Degraded data redundancy: 3/7045661 objects degraded (0.000%),
> 1 pg degraded
>
>   services:
> mon: 3 daemons, quorum mon-a,mon-b,mon-c (age 38m)
> mgr: mon-a(active, since 38m)
> osd: 46 osds: 46 up (since 25m), 46 in (since 3w)
>
>   data:
> pools:   4 pools, 705 pgs
> objects: 2.46M objects, 9.1 TiB
> usage:   24 TiB used, 95 TiB / 119 TiB avail
> pgs: 3/7045661 objects degraded (0.000%)
>  1/2459586 objects unfound (0.000%)
>  701 active+clean
>  3   active+clean+scrubbing+deep
>  1   active+recovery_unfound+degraded
>
> ceph pg 7.182 list_unfound
> {
> "num_missing": 1,
> "num_unfound": 1,
> "objects": [
> {
> "oid": {
> "oid": "rbd_data.2f18f2a67fad72.0002021a",
> "key": "",
> "snapid": -2,
> "hash": 3951004034,
> "max": 0,
> "pool": 7,
> "namespace": ""
> },
> "need": "184249'118613008",
> "have": "0'0",
> "flags": "none",
> "clean_regions": "clean_offsets: [], clean_omap: 0,
> new_object: 1",
> "locations": []
> }
> ],
> "state": "NotRecovering",
> "available_might_have_unfound": true,
> "might_have_unfound": [],
> "more": false
> }
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Unable to add osds with ceph-volume

2021-04-28 Thread Andrei Mikhailovsky
Hello everyone, 

I am running ceph version 15.2.8 on Ubuntu servers. I am using bluestore osds 
with data on hdd and db and wal on ssd drives. Each ssd has been partitioned 
such that it holds 5 dbs and 5 wals. The ssd were were prepared a while back 
probably when I was running ceph 13.x. I have been gradually adding new osd 
drives as needed. Recently, I've tried to add more osds, which have failed to 
my surprise. Previously I've had no issues adding the drives. However, it seems 
that I can no longer do that with version 15.2.x 

Here is what I get: 


root@arh-ibstorage4-ib  /home/andrei  ceph-volume lvm prepare --bluestore 
--data /dev/sds --block.db /dev/ssd3/db5 --block.wal /dev/ssd3/wal5 
Running command: /usr/bin/ceph-authtool --gen-print-key 
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
6aeef34b-0724-4d20-a10b-197cab23e24d 
Running command: /usr/sbin/vgcreate --force --yes 
ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f /dev/sds 
stderr: WARNING: PV /dev/sdp in VG ceph-bc7587b5-0112-4097-8c9f-4442e8ea5645 is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sdo in VG ceph-33eda27c-53ed-493e-87a8-39e1862da809 is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sdn in VG ssd2 is using an old PV header, modify the 
VG to update. 
stderr: WARNING: PV /dev/sdm in VG ssd1 is using an old PV header, modify the 
VG to update. 
stderr: WARNING: PV /dev/sdj in VG ceph-9d8da00c-f6b9-473f-b499-fa60d74b46c5 is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sdi in VG ceph-1603149e-1e50-4b86-a360-1372f4243603 is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sdh in VG ceph-a5f4416c-8e69-4a66-a884-1d1229785acb is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sde in VG ceph-aac71121-e308-4e25-ae95-ca51bca7aaff is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sdd in VG ceph-1e216580-c01b-42c5-a10f-293674a55c4c is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sdc in VG ceph-630f7716-3d05-41bb-92c9-25402e9bb264 is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sdb in VG ceph-a549c28d-9b06-46d5-8ba3-3bd99ff54f57 is 
using an old PV header, modify the VG to update. 
stderr: WARNING: PV /dev/sda in VG ceph-70943bd0-de71-4651-a73d-c61bc624755f is 
using an old PV header, modify the VG to update. 
stdout: Physical volume "/dev/sds" successfully created. 
stdout: Volume group "ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f" successfully 
created 
Running command: /usr/sbin/lvcreate --yes -l 3814911 -n 
osd-block-6aeef34b-0724-4d20-a10b-197cab23e24d 
ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f 
stdout: Logical volume "osd-block-6aeef34b-0724-4d20-a10b-197cab23e24d" 
created. 
--> blkid could not detect a PARTUUID for device: /dev/ssd3/wal5 
--> Was unable to complete a new OSD, will rollback changes 
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd 
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.15 
--yes-i-really-mean-it 
stderr: 2021-04-28T20:05:52.290+0100 7f76bbfa9700 -1 auth: unable to find a 
keyring on 
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc
 
/ceph/keyring.bin,: (2) No such file or directory 
2021-04-28T20:05:52.290+0100 7f76bbfa9700 -1 AuthRegistry(0x7f76b4058e60) no 
keyring found at 
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyrin
 
g,/etc/ceph/keyring.bin,, disabling cephx 
stderr: purged osd.15 
--> RuntimeError: unable to use device 

I have tried to find a solution, but wasn't able to resolve the problem? I am 
sure that I've previously added new volumes using the above command. 

lvdisplay shows: 

--- Logical volume --- 
LV Path /dev/ssd3/wal5 
LV Name wal5 
VG Name ssd3 
LV UUID WPQJs9-olAj-ACbU-qnEM-6ytu-aLMv-hAABYy 
LV Write Access read/write 
LV Creation host, time arh-ibstorage4-ib, 2020-07-29 23:45:17 +0100 
LV Status available 
# open 0 
LV Size 1.00 GiB 
Current LE 256 
Segments 1 
Allocation inherit 
Read ahead sectors auto 
- currently set to 256 
Block device 253:6 


--- Logical volume --- 
LV Path /dev/ssd3/db5 
LV Name db5 
VG Name ssd3 
LV UUID FVT2Mm-a00P-eCoQ-FZAf-AulX-4q9r-PaDTC6 
LV Write Access read/write 
LV Creation host, time arh-ibstorage4-ib, 2020-07-29 23:46:01 +0100 
LV Status available 
# open 0 
LV Size 177.00 GiB 
Current LE 45312 
Segments 1 
Allocation inherit 
Read ahead sectors auto 
- currently set to 256 
Block device 253:11 



How do I resolve the errors and create the new osd? 

Cheers 

Andrei 




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] active+recovery_unfound+degraded in Pacific

2021-04-28 Thread Lomayani S. Laizer
Hello,
Last week I upgraded my production cluster to Pacific. the cluster was
healthy until a few hours ago.
When scrub  run 4hrs ago  left the cluster in an inconsistent state. Then
issued the command ceph pg repair 7.182 to try to repair the cluster but
ended with active+recovery_unfound+degraded

All OSDs are up and all running bluestore with replication of 3 and minimum
size of 2. I have restarted all OSD but still not helping.

Any recommendations on how to recover the cluster safely?

I have attached result of ceph pg 7.182 query

 ceph health detail
HEALTH_ERR 1/2459601 objects unfound (0.000%); Possible data damage: 1 pg
recovery_unfound; Degraded data redundancy: 3/7045706 objects degraded
(0.000%), 1 pg degraded
[WRN] OBJECT_UNFOUND: 1/2459601 objects unfound (0.000%)
pg 7.182 has 1 unfound objects
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
pg 7.182 is active+recovery_unfound+degraded, acting [15,1,11], 1
unfound
[WRN] PG_DEGRADED: Degraded data redundancy: 3/7045706 objects degraded
(0.000%), 1 pg degraded
pg 7.182 is active+recovery_unfound+degraded, acting [15,1,11], 1
unfound



ceph -w
  cluster:
id: 4b9f6959-fead-4ada-ac58-de5d7b149286
health: HEALTH_ERR
1/2459586 objects unfound (0.000%)
Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 3/7045661 objects degraded (0.000%),
1 pg degraded

  services:
mon: 3 daemons, quorum mon-a,mon-b,mon-c (age 38m)
mgr: mon-a(active, since 38m)
osd: 46 osds: 46 up (since 25m), 46 in (since 3w)

  data:
pools:   4 pools, 705 pgs
objects: 2.46M objects, 9.1 TiB
usage:   24 TiB used, 95 TiB / 119 TiB avail
pgs: 3/7045661 objects degraded (0.000%)
 1/2459586 objects unfound (0.000%)
 701 active+clean
 3   active+clean+scrubbing+deep
 1   active+recovery_unfound+degraded

ceph pg 7.182 list_unfound
{
"num_missing": 1,
"num_unfound": 1,
"objects": [
{
"oid": {
"oid": "rbd_data.2f18f2a67fad72.0002021a",
"key": "",
"snapid": -2,
"hash": 3951004034,
"max": 0,
"pool": 7,
"namespace": ""
},
"need": "184249'118613008",
"have": "0'0",
"flags": "none",
"clean_regions": "clean_offsets: [], clean_omap: 0, new_object:
1",
"locations": []
}
],
"state": "NotRecovering",
"available_might_have_unfound": true,
"might_have_unfound": [],
"more": false
}
ceph pg 7.182 query
{
"snap_trimq": "[]",
"snap_trimq_len": 0,
"state": "active+recovery_unfound+degraded",
"epoch": 184487,
"up": [
15,
1,
11
],
"acting": [
15,
1,
11
],
"acting_recovery_backfill": [
"1",
"11",
"15"
],
"info": {
"pgid": "7.182",
"last_update": "184487'118622945",
"last_complete": "0'0",
"log_tail": "184260'118615934",
"last_user_version": 174805058,
"last_backfill": "MAX",
"purged_snaps": [],
"history": {
"epoch_created": 80613,
"epoch_pool_created": 826,
"last_epoch_started": 184402,
"last_interval_started": 184401,
"last_epoch_clean": 184066,
"last_interval_clean": 184056,
"last_epoch_split": 80613,
"last_epoch_marked_full": 0,
"same_up_since": 184401,
"same_interval_since": 184401,
"same_primary_since": 184401,
"last_scrub": "184250'118615197",
"last_scrub_stamp": "2021-04-28T21:24:42.693619+0300",
"last_deep_scrub": "184250'118615197",
"last_deep_scrub_stamp": "2021-04-28T21:24:42.693619+0300",
"last_clean_scrub_stamp": "2021-04-28T21:24:42.693619+0300",
"prior_readable_until_ub": 12.74273018101
},
"stats": {
"version": "184487'118622945",
"reported_seq": "126997747",
"reported_epoch": "184487",
"state": "active+recovery_unfound+degraded",
"last_fresh": "2021-04-29T00:17:31.577010+0300",
"last_change": "2021-04-28T23:40:16.308380+0300",
"last_active": "2021-04-29T00:17:31.577010+0300",
"last_peered": "2021-04-29T00:17:31.577010+0300",
"last_clean": "2021-04-28T21:24:38.946369+0300",
"last_became_active": "2021-04-28T23:40:03.565550+0300",
"last_became_peered": "2021-04-28T23:40:03.565550+0300",
"last_unstale": "2021-04-29T00:17:31.577010+0300",
"last_undegraded": "2021-04-28T23:40:03.531480+0300",
"last_fullsized": "2021-04-29T00:17:31.577010+0300",
"mapping_epoch": 184401,
"log_start": "184260'118615

[ceph-users] Re: Unable to add osds with ceph-volume

2021-04-28 Thread Eugen Block

Hi,

when specifying the db device you should use --block.db VG/LV not /dev/VG/LV

Zitat von Andrei Mikhailovsky :


Hello everyone,

I am running ceph version 15.2.8 on Ubuntu servers. I am using  
bluestore osds with data on hdd and db and wal on ssd drives. Each  
ssd has been partitioned such that it holds 5 dbs and 5 wals. The  
ssd were were prepared a while back probably when I was running ceph  
13.x. I have been gradually adding new osd drives as needed.  
Recently, I've tried to add more osds, which have failed to my  
surprise. Previously I've had no issues adding the drives. However,  
it seems that I can no longer do that with version 15.2.x


Here is what I get:


root@arh-ibstorage4-ib  /home/andrei  ceph-volume lvm prepare  
--bluestore --data /dev/sds --block.db /dev/ssd3/db5 --block.wal  
/dev/ssd3/wal5

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name  
client.bootstrap-osd --keyring  
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new  
6aeef34b-0724-4d20-a10b-197cab23e24d
Running command: /usr/sbin/vgcreate --force --yes  
ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f /dev/sds
stderr: WARNING: PV /dev/sdp in VG  
ceph-bc7587b5-0112-4097-8c9f-4442e8ea5645 is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdo in VG  
ceph-33eda27c-53ed-493e-87a8-39e1862da809 is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdn in VG ssd2 is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdm in VG ssd1 is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdj in VG  
ceph-9d8da00c-f6b9-473f-b499-fa60d74b46c5 is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdi in VG  
ceph-1603149e-1e50-4b86-a360-1372f4243603 is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdh in VG  
ceph-a5f4416c-8e69-4a66-a884-1d1229785acb is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sde in VG  
ceph-aac71121-e308-4e25-ae95-ca51bca7aaff is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdd in VG  
ceph-1e216580-c01b-42c5-a10f-293674a55c4c is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdc in VG  
ceph-630f7716-3d05-41bb-92c9-25402e9bb264 is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sdb in VG  
ceph-a549c28d-9b06-46d5-8ba3-3bd99ff54f57 is using an old PV header,  
modify the VG to update.
stderr: WARNING: PV /dev/sda in VG  
ceph-70943bd0-de71-4651-a73d-c61bc624755f is using an old PV header,  
modify the VG to update.

stdout: Physical volume "/dev/sds" successfully created.
stdout: Volume group "ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f"  
successfully created
Running command: /usr/sbin/lvcreate --yes -l 3814911 -n  
osd-block-6aeef34b-0724-4d20-a10b-197cab23e24d  
ceph-1c7cef26-327a-4785-96b3-dcb1b97e8e2f
stdout: Logical volume  
"osd-block-6aeef34b-0724-4d20-a10b-197cab23e24d" created.

--> blkid could not detect a PARTUUID for device: /dev/ssd3/wal5
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name  
client.bootstrap-osd --keyring  
/var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.15  
--yes-i-really-mean-it
stderr: 2021-04-28T20:05:52.290+0100 7f76bbfa9700 -1 auth: unable to  
find a keyring on  
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc

/ceph/keyring.bin,: (2) No such file or directory
2021-04-28T20:05:52.290+0100 7f76bbfa9700 -1  
AuthRegistry(0x7f76b4058e60) no keyring found at  
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyrin

g,/etc/ceph/keyring.bin,, disabling cephx
stderr: purged osd.15
--> RuntimeError: unable to use device

I have tried to find a solution, but wasn't able to resolve the  
problem? I am sure that I've previously added new volumes using the  
above command.


lvdisplay shows:

--- Logical volume ---
LV Path /dev/ssd3/wal5
LV Name wal5
VG Name ssd3
LV UUID WPQJs9-olAj-ACbU-qnEM-6ytu-aLMv-hAABYy
LV Write Access read/write
LV Creation host, time arh-ibstorage4-ib, 2020-07-29 23:45:17 +0100
LV Status available
# open 0
LV Size 1.00 GiB
Current LE 256
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:6


--- Logical volume ---
LV Path /dev/ssd3/db5
LV Name db5
VG Name ssd3
LV UUID FVT2Mm-a00P-eCoQ-FZAf-AulX-4q9r-PaDTC6
LV Write Access read/write
LV Creation host, time arh-ibstorage4-ib, 2020-07-29 23:46:01 +0100
LV Status available
# open 0
LV Size 177.00 GiB
Current LE 45312
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:11



How do I resolve the errors and create the new osd?

Cheers

Andrei




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send

[ceph-users] recovering damaged rbd volume

2021-04-28 Thread mike brown
hello all,

I faced an incident for one of my very important rbd volumes with 5TB data, 
which is managed by OpenStack.
I was about to increase the volume size live but I shrinked the volume 
unintentionally by running a wrong command of "virsh qemu-monitor-command". 
then I realized it and expand it again, but obviously I lost my data. can you 
help me or give me hints on how I can recover data of this rbd volume?

unfortunately, I don't have any backup for this part of data, and it's really 
important (I know I made a big mistake). also, I can't stop the cluster since 
it's under heavy production load.

It seems qemu has been using the old set of APIs that allows shrinking by 
default without any warnings even in the latest version. (All other standard 
ways of resizing volume, does not allow shrinking)

  *   Ceph: 
https://sourcegraph.com/github.com/ceph/ceph@luminous/-/blob/src/librbd/librbd.cc#L815
  *   Qemu: https://sourcegraph.com/github.com/qemu/qemu/-/blob/block/rbd.c#L832
  *   Libvirt: 
https://sourcegraph.com/github.com/libvirt/libvirt/-/blob/src/storage/storage_backend_rbd.c#L1280


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to set bluestore_rocksdb_options_annex

2021-04-28 Thread ceph
Hello Anthony,

 it was introduced in octopus 15.2.10
See:  https://docs.ceph.com/en/latest/releases/octopus/

Do you know how you would set it in pacific? :)
Guess, there shouldnt be much difference...

Thank you
Mehmet

Am 28. April 2021 19:21:19 MESZ schrieb Anthony D'Atri 
:
>I think that’s new with Pacific.
>
>> On Apr 28, 2021, at 1:26 AM, c...@elchaka.de wrote:
>> 
>> 
>> 
>> Hello,
>> 
>> I have an octopus cluster and want to change some values - but i
>cannot find any documentation on how to set values(multiple) with
>> 
>> bluestore_rocksdb_options_annex
>> 
>> Could someone give me some examples.
>> I would like to do this like ceph config set ...
>> 
>> Thanks in advice
>> Mehmet
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG repair leaving cluster unavailable

2021-04-28 Thread Gesiel Galvão Bernardes
Complementing the information: I'm using mimic (13.2) on the cluster. I
noticed that during the PG repair process the entire cluster was extremely
slow, however, there was no overhead on the OSD nodes. The load of these
nodes, which in normal production is between 10.00 and 20.00 was less than
5. When repair is finished (after 4 hours), the cluster backed to normal.

Is this result expected?


Em ter., 27 de abr. de 2021 às 14:16, Gesiel Galvão Bernardes <
gesiel.bernar...@gmail.com> escreveu:

> Hi,
>
> I have 3 pools, where I use it exclusively for RBD images. 2 They are
> mirrored and one is an erasure code. It turns out that today I received the
> warning that a PG was inconsistent in the pool erasure, and then I ran
> "ceph pg repair ". It turns out that after that the entire cluster
> became extremely slow, to the point that no VM works.
>
>
> This is the output of "ceph -s":
> # ceph -s
>   cluster:
> id: 4ea72929-6f9e-453a-8cd5-bb0712f6b874
> health: HEALTH_ERR
> 1 scrub errors
> Possible data damage: 1 pg inconsistent, 1 pg repair
>
>   services:
> mon: 2 daemons, cmonitor quorum, cmonitor2
> mgr: cmonitor (active), standbys: cmonitor2
> osd: 87 osds: 87 up, 87 in
> tcmu-runner: 10 active daemons
>
>   date:
> pools: 7 pools, 3072 pgs
> objects: 30.00 M objects, 113 TiB
> usage: 304 TiB used, 218 TiB / 523 TiB avail
> pgs: 3063 active + clean
>  8 active + clean + scrubbing + deep
>  1 active + clean + scrubbing + deep + inconsistent + repair
>
>   io:
> client: 24 MiB / s rd, 23 MiB / s wr, 629 op / s rd, 519 op / s wr
> cache: 5.9 MiB / s flush, 35 MiB / s evict, 9 op / s promote
>
> Does anyone have any idea how to make it available again?
>
> Regards,
> Gesiel
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: libceph: get_reply osd2 tid 1459933 data 3248128 > preallocated 131072, skipping

2021-04-28 Thread Ilya Dryomov
On Sun, Apr 25, 2021 at 11:42 AM Ilya Dryomov  wrote:
>
> On Sun, Apr 25, 2021 at 12:37 AM Markus Kienast  wrote:
> >
> > I am seeing these messages when booting from RBD and booting hangs there.
> >
> > libceph: get_reply osd2 tid 1459933 data 3248128 > preallocated
> > 131072, skipping
> >
> > However, Ceph Health is OK, so I have no idea what is going on. I
> > reboot my 3 node cluster and it works again for about two weeks.
> >
> > How can I find out more about this issue, how can I dig deeper? Also
> > there has been at least one report about this issue before on this
> > mailing list - "[ceph-users] Strange Data Issue - Unexpected client
> > hang on OSD I/O Error" - but no solution has been presented.
> >
> > This report was from 2018, so no idea if this is still an issue for
> > Dyweni the original reporter. If you read this, I would be happy to
> > hear how you solved the problem.
>
> Hi Markus,
>
> What versions of ceph and the kernel are in use?
>
> Are you also seeing I/O errors and "missing primary copy of ..., will
> try copies on ..." messages in the OSD logs (in this case osd2)?

For the sake of archives, the "[ceph-users] Strange Data Issue
- Unexpected client hang on OSD I/O Error" instance has been fixed
in 12.2.12, 13.2.5 and 14.2.0:

https://tracker.ceph.com/issues/37680

I also tried to reply to that thread but it didn't go through because
the old ceph-us...@lists.ceph.com mailing list is decommissioned.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph export not producing file?

2021-04-28 Thread Piotr Baranowski
Hey all!

rbd export images/ec8a7ff8-6609-4b7d-8bdd-fadcf3b7973e /root/foo.img
DOES NOT produce the target file

no matter if I use --pool --image format or the one above the target
file is not there.

Progress bar shows up and prints percentage. It ends up with exit 0

[root@controller-0 mnt]# rbd export  --pool=images db8290c3-93fd-4a4e-
ad71-7c131070ad6f /mnt/cirros2.img
Exporting image: 100% complete...done.
[root@controller-0 mnt]# echo $?
0

Any idea what's going on here?
It's ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94)
luminous (stable)

Any hints will be much appreciated

best regards
Piotr

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] one of 3 monitors keeps going down

2021-04-28 Thread Robert W. Eckert
Hi,
On a daily basis, one of my monitors goes down

[root@cube ~]# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s); 1/3 mons down, quorum 
rhel1.robeckert.us,story
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon mon.cube on cube.robeckert.us is in error state
[WRN] MON_DOWN: 1/3 mons down, quorum rhel1.robeckert.us,story
mon.cube (rank 2) addr [v2:192.168.2.142:3300/0,v1:192.168.2.142:6789/0] is 
down (out of quorum)
[root@cube ~]# ceph --version
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)

I have a script that will copy the mon data from another server and it restarts 
and runs well for a while.

It is always the same monitor, and when I look at the logs the only thing I 
really see is the cephadm log showing it down

2021-04-28 10:07:26,173 DEBUG Running command: /usr/bin/podman --version
2021-04-28 10:07:26,217 DEBUG /usr/bin/podman: stdout podman version 2.2.1
2021-04-28 10:07:26,222 DEBUG Running command: /usr/bin/podman inspect --format 
{{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index .Config.Labels 
"io.ceph.version"}} ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-osd.2
2021-04-28 10:07:26,326 DEBUG /usr/bin/podman: stdout 
fab17e5242eb4875e266df19ca89b596a2f2b1d470273a99ff71da2ae81eeb3c,docker.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da452188daf2af72e,2021-04-26
 17:13:15.54183375 -0400 EDT,
2021-04-28 10:07:26,328 DEBUG Running command: systemctl is-enabled 
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08...@mon.cube
2021-04-28 10:07:26,334 DEBUG systemctl: stdout enabled
2021-04-28 10:07:26,335 DEBUG Running command: systemctl is-active 
ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08...@mon.cube
2021-04-28 10:07:26,340 DEBUG systemctl: stdout failed
2021-04-28 10:07:26,340 DEBUG Running command: /usr/bin/podman --version
2021-04-28 10:07:26,395 DEBUG /usr/bin/podman: stdout podman version 2.2.1
2021-04-28 10:07:26,402 DEBUG Running command: /usr/bin/podman inspect --format 
{{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index .Config.Labels 
"io.ceph.version"}} ceph-fe3a7cb0-69ca-11eb-8d45-c86000d08867-mon.cube
2021-04-28 10:07:26,526 DEBUG /usr/bin/podman: stdout 
04e7c673cbacf5160427b0c3eb2f0948b2f15d02c58bd1d9dd14f975a84cfc6f,docker.io/ceph/ceph:v15,5b724076c58f97872fc2f7701e8405ec809047d71528f79da452188daf2af72e,2021-04-28
 08:54:57.614847512 -0400 EDT,

I don't know if it matters, but this  server is an AMD 3600XT while my other 
two servers which have had no issues are intel based.

The root file system was originally on a SSD, and I switched to NVME, so I 
eliminated controller or drive issues.  (I didn't see anything in dmesg anyway)

If someone could point me in the right direction on where to troubleshoot next, 
I would appreciate it.

Thanks,
Rob Eckert
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Suspicious newsletter] Re: Getting `InvalidInput` when trying to create a notification topic with Kafka endpoint

2021-04-28 Thread Szabo, Istvan (Agoda)
Hi,

What we have found seems like it is a blocking issue when I terminate https on 
a loadbalancer and between the loadbalancer and rgw http is the mode. So seems 
liket he ssl termination has to be done on the rgw and can't be done on the 
loadbalancer? Or how we can workaround it any idea?

Here are the debug logs:

With loadbalancer https endpoint: https://justpaste.it/5d93w
Directly with rgw ip without loadbalancer: https://justpaste.it/9rn28

Both case the issue is like this: "endpoint validation error: sending password 
over insecure transport"

To be honest I want to do the ssl on loadbalancer don't want to do on rgw. 
Maybe you can suggest something.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Yuval Lifshitz 
Sent: Tuesday, April 27, 2021 11:49 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; Raveendran, Vigneshwaran (Agoda) 

Subject: [Suspicious newsletter] [ceph-users] Re: Getting `InvalidInput` when 
trying to create a notification topic with Kafka endpoint

On Tue, Apr 27, 2021 at 1:59 PM Szabo, Istvan (Agoda) < istvan.sz...@agoda.com> 
wrote:

> Hello,
>
> Thank you very much to pickup the question and sorry for the late response.
>
> Yes, we are sending in cleartext also using HTTPS, but how it should
> be send if not like this?
>
>
if you send the user/password using HTTPS connection between the client and the 
RGW there should be no error. could you please provide the RGW debug log, to 
see why "invalid argument" was replied?


> Also connected to this issue a bit, when we subscribe a bucket to a
> topic with non-ACL kafka topic, any operations (PUT or DELETE) is
> simply blocking and not returning. Not even any error response.
>
> this would be the case when the kafka broker is down (or the
> parameters
you provided to the topic were incorrect). a workaround for this issue is to 
mark the endpoint with "kafka-ack-level=none", this will not block for the 
reply, but note that if the broker is down or misconfigured, the notification 
will be lost.
a better option (if you are using "pacific" and up) is to mark the topic with 
the "persistent" flag. this would mean that even if the broker is down or 
misconfigured, the notification will be retired until successful, and, in 
addition, will not block the request.



> $ s3cmd -c ~/.s3cfg put --add-header x-amz-meta-foo:bar3
> certificate.pdf s3://vig-test
> WARNING: certificate.pdf: Owner groupname not known. Storing
> GID=1354917867 instead.
> WARNING: Module python-magic is not available. Guessing MIME types
> based on file extensions.
> upload: 'certificate.pdf' -> 's3://vig-test/certificate.pdf'  [1 of 1]
>  65536 of 9122471% in0s   291.17 KB/s
>
>
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
>
> *From:* Yuval Lifshitz 
> *Sent:* Wednesday, April 21, 2021 10:34 PM
> *To:* Szabo, Istvan (Agoda) 
> *Cc:* ceph-users@ceph.io
> *Subject:* Re: [ceph-users] Getting `InvalidInput` when trying to
> create a notification topic with Kafka endpoint
>
>
>
> Hi Istvan,
>
> Can you please share the relevant part for the radosgw log, indicating
> which input was invalid?
>
> The only way I managed to reproduce that error is by sending the
> request to a non-HTTPS radosgw (which does not seem to be your case).
> In such a case it replies with "InvalidInput" because we are trying to
> send user/password in cleartext.
>
> I used curl, similarly to what you did against a vstart cluster based
> off of master: https://paste.sh/SQ_8IrB5#BxBYbh1kTh15n7OKvjB5wEOM
>
>
>
> Yuval
>
>
>
> On Wed, Apr 21, 2021 at 11:23 AM Szabo, Istvan (Agoda) <
> istvan.sz...@agoda.com> wrote:
>
> Hi Ceph Users,
> Here is the latest request I tried but still not working
>
> curl -v -H 'Date: Tue, 20 Apr 2021 16:05:47 +' -H 'Authorization:
> AWS :' -L -H 'content-type:
> application/x-www-form-urlencoded' -k -X POST https://servername -d
> Action=CreateTopic&Name=test-ceph-event-replication&Attributes.entry.8
> .key=push-endpoint&Attributes.entry.8.value=kafka://: rd>@servername2:9093&Attributes.entry.5.key=use-ssl&Attributes.entry.5
> .value=true
>
> And the response I get is still Invalid Input  encoding="UTF-8"?>InvalidInputtx000
> 0007993081-00607efbdd-1c7e96b-hkg1c7e96b-hkg-d
> ata
> Can someone please help with this?
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
>
> 
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privile

[ceph-users] Re: [Suspicious newsletter] Re: Getting `InvalidInput` when trying to create a notification topic with Kafka endpoint

2021-04-28 Thread Yuval Lifshitz
I don't think there is a way around that. the RGW code does not allow
user/password on non-ssl transport.
what is the issue with SSL between the balancer and the RGW?
if you have issues with self-signed certificates, maybe there is a way on
the balancer to not verify them?

On Wed, Apr 28, 2021 at 1:18 PM Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> wrote:

> Hi,
>
> What we have found seems like it is a blocking issue when I terminate
> https on a loadbalancer and between the loadbalancer and rgw http is the
> mode. So seems liket he ssl termination has to be done on the rgw and can't
> be done on the loadbalancer? Or how we can workaround it any idea?
>
> Here are the debug logs:
>
> With loadbalancer https endpoint: https://justpaste.it/5d93w
> Directly with rgw ip without loadbalancer: https://justpaste.it/9rn28
>
> Both case the issue is like this: "endpoint validation error: sending
> password over insecure transport"
>
> To be honest I want to do the ssl on loadbalancer don't want to do on rgw.
> Maybe you can suggest something.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> -Original Message-
> From: Yuval Lifshitz 
> Sent: Tuesday, April 27, 2021 11:49 PM
> To: Szabo, Istvan (Agoda) 
> Cc: ceph-users@ceph.io; Raveendran, Vigneshwaran (Agoda) <
> vigneshwaran.raveend...@agoda.com>
> Subject: [Suspicious newsletter] [ceph-users] Re: Getting `InvalidInput`
> when trying to create a notification topic with Kafka endpoint
>
> On Tue, Apr 27, 2021 at 1:59 PM Szabo, Istvan (Agoda) <
> istvan.sz...@agoda.com> wrote:
>
> > Hello,
> >
> > Thank you very much to pickup the question and sorry for the late
> response.
> >
> > Yes, we are sending in cleartext also using HTTPS, but how it should
> > be send if not like this?
> >
> >
> if you send the user/password using HTTPS connection between the client
> and the RGW there should be no error. could you please provide the RGW
> debug log, to see why "invalid argument" was replied?
>
>
> > Also connected to this issue a bit, when we subscribe a bucket to a
> > topic with non-ACL kafka topic, any operations (PUT or DELETE) is
> > simply blocking and not returning. Not even any error response.
> >
> > this would be the case when the kafka broker is down (or the
> > parameters
> you provided to the topic were incorrect). a workaround for this issue is
> to mark the endpoint with "kafka-ack-level=none", this will not block for
> the reply, but note that if the broker is down or misconfigured, the
> notification will be lost.
> a better option (if you are using "pacific" and up) is to mark the topic
> with the "persistent" flag. this would mean that even if the broker is down
> or misconfigured, the notification will be retired until successful, and,
> in addition, will not block the request.
>
>
>
> > $ s3cmd -c ~/.s3cfg put --add-header x-amz-meta-foo:bar3
> > certificate.pdf s3://vig-test
> > WARNING: certificate.pdf: Owner groupname not known. Storing
> > GID=1354917867 instead.
> > WARNING: Module python-magic is not available. Guessing MIME types
> > based on file extensions.
> > upload: 'certificate.pdf' -> 's3://vig-test/certificate.pdf'  [1 of 1]
> >  65536 of 9122471% in0s   291.17 KB/s
> >
> >
> >
> > Istvan Szabo
> > Senior Infrastructure Engineer
> > ---
> > Agoda Services Co., Ltd.
> > e: istvan.sz...@agoda.com
> > ---
> >
> >
> >
> > *From:* Yuval Lifshitz 
> > *Sent:* Wednesday, April 21, 2021 10:34 PM
> > *To:* Szabo, Istvan (Agoda) 
> > *Cc:* ceph-users@ceph.io
> > *Subject:* Re: [ceph-users] Getting `InvalidInput` when trying to
> > create a notification topic with Kafka endpoint
> >
> >
> >
> > Hi Istvan,
> >
> > Can you please share the relevant part for the radosgw log, indicating
> > which input was invalid?
> >
> > The only way I managed to reproduce that error is by sending the
> > request to a non-HTTPS radosgw (which does not seem to be your case).
> > In such a case it replies with "InvalidInput" because we are trying to
> > send user/password in cleartext.
> >
> > I used curl, similarly to what you did against a vstart cluster based
> > off of master: https://paste.sh/SQ_8IrB5#BxBYbh1kTh15n7OKvjB5wEOM
> >
> >
> >
> > Yuval
> >
> >
> >
> > On Wed, Apr 21, 2021 at 11:23 AM Szabo, Istvan (Agoda) <
> > istvan.sz...@agoda.com> wrote:
> >
> > Hi Ceph Users,
> > Here is the latest request I tried but still not working
> >
> > curl -v -H 'Date: Tue, 20 Apr 2021 16:05:47 +' -H 'Authorization:
> > AWS :' -L -H 'content-type:
> > application/x-www-form-urlencoded' -k -X POST https://servername -d
> > Action=CreateTopic&Name=test-ceph-event-replication&Attributes.entry.8
> > .key=push-endpoint&Attributes.entry.8.value=kafka://: > rd>@servername2:9093&Attributes.entry

[ceph-users] BlueFS.cc ceph_assert(bl.length() <= runway): protection against bluefs log file growth

2021-04-28 Thread Konstantin Shalygin
Hi,

Recently was added [1] protection against BlueFS log growth infinite, I get 
assert on 14.2.19:

/build/ceph-14.2.19/src/os/bluestore/BlueFS.cc: 2404: FAILED 
ceph_assert(bl.length() <= runway)

Then OSD dead. Tracker (may be already exists?), logs is interested for this 
case?



[1] https://github.com/ceph/ceph/pull/37948 


Thanks,

k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacifif and Openstack Wallaby - ERROR cinder.scheduler.flows.create_volume

2021-04-28 Thread Eugen Block

Hi,

Yes, If I set "show_image_direct_url" to false, creation of volumes  
from images works fine.
But creation takes much more time, because of data movements  
out-and-in ceph cluster, instead snap and copy-on-write approach.


All documentation recommends "show_image_direct_url" setted to true  
with Ceph storage, even if it exposes image locations.



hm, that's not good, I think this should be addressed again in the  
openstack community. It sounds like it's still not properly resolved.



Zitat von "Tecnologia Charne.Net" :


Thanks, Eugen, for your quick answer!

Yes, If I set "show_image_direct_url" to false, creation of volumes  
from images works fine.
But creation takes much more time, because of data movements  
out-and-in ceph cluster, instead snap and copy-on-write approach.


All documentation recommends "show_image_direct_url" setted to true  
with Ceph storage, even if it exposes image locations.


Thanks again!


El 28/4/21 a las 03:41, Eugen Block escribió:

Hi,

the glance option "show_image_direct_url" has been marked as  
deprecated for quite some time because it's a security issue, but  
without it the interaction between glance and ceph didn't work very  
well, I can't quite remember what the side effects were. It seems  
that they now actually tried to get rid of it, and it seems to work  
for you if you set it to false, right? Do you see any other side  
effects when you set it to false?


Regards,
Eugen


Zitat von "Tecnología CHARNE.NET" :


Hello!

I'm working with Openstack Wallaby (1 controller, 2 compute nodes)  
connected to Ceph Pacific cluster in a devel environment.


With Openstack Victoria and Ceph Pacific (before last friday  
update) everything was running like a charm.


Then, I upgraded Openstack to Wallaby and Ceph  to version 16.2.1.  
(Because of auth_allow_insecure_global_id_reclaim I had to upgrade  
many clients... but that's another story...)


After upgrade, when I try to create a volume from image,

 openstack volume create --image  
f1df058d-be99-4401-82d9-4af9410744bc debian10_volume1 --size 5


with "show_image_direct_url = True", I get "No valid backend" in  
/var/log/cinder/cinder-scheduler.log


2021-04-26 20:35:24.957 41348 ERROR  
cinder.scheduler.flows.create_volume  
[req-651937e5-148f-409c-8296-33f200892e48  
c048e887df994f9cb978554008556546 f02ae99c34cf44fd8ab3b1fd1b3be964  
- - -] Failed to run task  
cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create: No valid backend was found. Exceeded max scheduling attempts 3 for resource 56fbb645-2c34-477d-9a59-beec78f4fd3f: cinder.exception.NoValidBackend: No valid backend was found. Exceeded max scheduling attempts 3 for resource  
56fbb645-2c34-477d-9a59-beec78f4fd3f


and

2021-04-26 20:35:24.968 41347 ERROR oslo_messaging.rpc.server  
[req-651937e5-148f-409c-8296-33f200892e48  
c048e887df994f9cb978554008556546 f02ae99c34cf44fd8ab3b1fd
1b3be964 - - -] Exception during message handling:  
rbd.InvalidArgument: [errno 22] RBD invalid argument (error  
creating clone)


in /var/log/cinder/cinder-volume.log


If I disable "show_image_direct_url = False", volume creation from  
image works fine.



I have spent the last four days googling and reading lots of docs,  
old and new ones, unlucly...


Does anybody have a clue, (please)?

Thanks in advance!


Javier.-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacifif and Openstack Wallaby - ERROR cinder.scheduler.flows.create_volume

2021-04-28 Thread Tecnologia Charne.Net

Thanks, Eugen, for your quick answer!

Yes, If I set "show_image_direct_url" to false, creation of volumes from 
images works fine.
But creation takes much more time, because of data movements out-and-in 
ceph cluster, instead snap and copy-on-write approach.


All documentation recommends "show_image_direct_url" setted to true with 
Ceph storage, even if it exposes image locations.


Thanks again!


El 28/4/21 a las 03:41, Eugen Block escribió:

Hi,

the glance option "show_image_direct_url" has been marked as 
deprecated for quite some time because it's a security issue, but 
without it the interaction between glance and ceph didn't work very 
well, I can't quite remember what the side effects were. It seems that 
they now actually tried to get rid of it, and it seems to work for you 
if you set it to false, right? Do you see any other side effects when 
you set it to false?


Regards,
Eugen


Zitat von "Tecnología CHARNE.NET" :


Hello!

I'm working with Openstack Wallaby (1 controller, 2 compute nodes) 
connected to Ceph Pacific cluster in a devel environment.


With Openstack Victoria and Ceph Pacific (before last friday update) 
everything was running like a charm.


Then, I upgraded Openstack to Wallaby and Ceph  to version 16.2.1. 
(Because of auth_allow_insecure_global_id_reclaim I had to upgrade 
many clients... but that's another story...)


After upgrade, when I try to create a volume from image,

 openstack volume create --image 
f1df058d-be99-4401-82d9-4af9410744bc debian10_volume1 --size 5


with "show_image_direct_url = True", I get "No valid backend" in 
/var/log/cinder/cinder-scheduler.log


2021-04-26 20:35:24.957 41348 ERROR 
cinder.scheduler.flows.create_volume 
[req-651937e5-148f-409c-8296-33f200892e48 
c048e887df994f9cb978554008556546 f02ae99c34cf44fd8ab3b1fd1b3be964 - - 
-] Failed to run task 
cinder.scheduler.flows.create_volume.ScheduleCreateVolumeTask;volume:create: 
No valid backend was found. Exceeded max scheduling attempts 3 for 
resource 56fbb645-2c34-477d-9a59-beec78f4fd3f: 
cinder.exception.NoValidBackend: No valid backend was found. Exceeded 
max scheduling attempts 3 for resource 
56fbb645-2c34-477d-9a59-beec78f4fd3f


and

2021-04-26 20:35:24.968 41347 ERROR oslo_messaging.rpc.server 
[req-651937e5-148f-409c-8296-33f200892e48 
c048e887df994f9cb978554008556546 f02ae99c34cf44fd8ab3b1fd
1b3be964 - - -] Exception during message handling: 
rbd.InvalidArgument: [errno 22] RBD invalid argument (error creating 
clone)


in /var/log/cinder/cinder-volume.log


If I disable "show_image_direct_url = False", volume creation from 
image works fine.



I have spent the last four days googling and reading lots of docs, 
old and new ones, unlucly...


Does anybody have a clue, (please)?

Thanks in advance!


Javier.-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Rbd map fails occasionally with module libceph: Relocation (type 6) overflow vs section 4

2021-04-28 Thread huxia...@horebdata.cn
Dear Cephers,

I encountered a strange issue when using rbd map (Luminous 12.2.13 version), 
rbd map not always fail, but occasionally, with the following dmesg 

[16818.70] module libceph: Relocation (type 6) overflow vs section 4
[16857.46] module libceph: Relocation (type 6) overflow vs section 4
[16891.85] module libceph: Relocation (type 6) overflow vs section 4 

what could be wrong? and how to fix it?

Any help or suggestions would be highly appreciated,

thanks a lot in advance,

samuel



huxia...@horebdata.cn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to set bluestore_rocksdb_options_annex

2021-04-28 Thread ceph



Hello,

I have an octopus cluster and want to change some values - but i cannot find 
any documentation on how to set values(multiple) with

bluestore_rocksdb_options_annex

Could someone give me some examples.
I would like to do this like ceph config set ...

Thanks in advice
Mehmet
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io