[ceph-users] Error When Replacing OSD - Please Help

2024-08-01 Thread duluxoz

Hi All,

I'm trying to replace an OSD in our cluster.

This is on Reef 18.2.2 on Rocky 9.4.

I performed the following steps (from this page of the Ceph Doco: 
https://docs.ceph.com/en/reef/rados/operations/add-or-rm-osds/):


1. Make sure that it is safe to destroy the OSD:
   `while!cephosdsafe-to-destroyosd.{id};dosleep10;done`
2. Destroy the OSD: `cephosddestroy0--yes-i-really-mean-it`
3. Replaced the HDD
4. Prepare the disk for replacement by using the ID of the OSD that was
   destroyed in previous steps:
   `ceph-volumelvmprepare--osd-id0--data/dev/sd3`

However, at this point I get the following errors:

~~~

Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring 
osd tree -f json
 stderr: 2024-08-02T13:52:58.812+1000 7ff26e904640 -1 auth: unable to 
find a keyring on 
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: 
(2) No such file or directory
 stderr: 2024-08-02T13:52:58.812+1000 7ff26e904640 -1 
AuthRegistry(0x7ff268063e88) no keyring found at 
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, 
disabling cephx
 stderr: 2024-08-02T13:52:58.817+1000 7ff26e904640 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such 
file or directory
 stderr: 2024-08-02T13:52:58.817+1000 7ff26e904640 -1 
AuthRegistry(0x7ff268063e88) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2024-08-02T13:52:58.819+1000 7ff26e904640 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such 
file or directory
 stderr: 2024-08-02T13:52:58.819+1000 7ff26e904640 -1 
AuthRegistry(0x7ff268065b00) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2024-08-02T13:52:58.821+1000 7ff26e904640 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such 
file or directory
 stderr: 2024-08-02T13:52:58.821+1000 7ff26e904640 -1 
AuthRegistry(0x7ff26e9030c0) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx

 stderr: [errno 2] RADOS object not found (error connecting to the cluster)
-->  RuntimeError: Unable check if OSD id exists: 0

~~~

I can see a `client.bootstap-osd` user in the Ceph Dashboard Ceph User 
List, so I'm not sure what's going on.


Any help is greatly appreciated - thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch issue: lsblk: /dev/vg_osd/lvm_osd: not a block device

2024-05-27 Thread duluxoz

Hi PWB

Both ways (just to see if both ways would work) - remember, this is a 
brand new box, so I had the luxury of "blowing away" the first iteration 
to test the second:


 * ceph orch daemon add osd ceph1:/dev/vg_osd/lv_osd
 * ceph orch daemon add osd ceph1:vg_osd/lv_osd

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch issue: lsblk: /dev/vg_osd/lvm_osd: not a block device

2024-05-27 Thread duluxoz

@Eugen, @Cedric

DOH!

Sorry lads, my bad! I had a typo in my lv name - that was the cause of 
my issues.


My apologises for being so stupid - and *thank you* for the help; having 
a couple of fresh brains on things help to eliminate possibilities and 
so narrow down onto the cause of the issue.


Thanks again for all the help

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch issue: lsblk: /dev/vg_osd/lvm_osd: not a block device

2024-05-27 Thread duluxoz

Nope, tried that, it didn't work (similar error messages).

Thanks for input  :-)

So, still looking for ideas on this one - thanks in advance
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph orch issue: lsblk: /dev/vg_osd/lvm_osd: not a block device

2024-05-26 Thread duluxoz

Hi All,

Is the following a bug or some other problem (I can't tell)  :-)

Brand new Ceph (Reef v18.2.3) install on Rocky Linux v9.4 - basically, 
its a brand new box.


Ran the following commands (in order; no issues until final command):

1. pvcreate /dev/sda6
2. vgcreate vg_osd /dev/sda6
3. lvcreate -l 100%VG -n lv_osd vg_osd
4. cephadmbootstrap--mon-ip192.168.0.20
5. ceph orch daemon add osd ceph1:/dev/vg_osd/lvm_osd

Received a whole bunch of error info on the console; the two relevant 
lines (as far as I can tell) are:


 * /usr/bin/podman: stderr  stderr: lsblk: /dev/vg_osd/lvm_osd: not a
   block device
 * RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host
   --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume
   --privileged --group-add=disk --init -e
   
CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:257b3f5140c11b51fd710ffdad6213ed53d74146f464a51717262d156daef553
   -e NODE_NAME=ceph1 -e CEPH_USE_RANDOM_NONCE=1 -e
   CEPH_VOLUME_OSDSPEC_AFFINITY=None -e CEPH_VOLUME_SKIP_RESTORECON=yes
   -e CEPH_VOLUME_DEBUG=1 -v
   /var/run/ceph/477045f4-1b34-11ef-9a30-0800274c7359:/var/run/ceph:z
   -v
   /var/log/ceph/477045f4-1b34-11ef-9a30-0800274c7359:/var/log/ceph:z
   -v
   
/var/lib/ceph/477045f4-1b34-11ef-9a30-0800274c7359/crash:/var/lib/ceph/crash:z
   -v /run/systemd/journal:/run/systemd/journal -v /dev:/dev -v
   /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
   /run/lock/lvm:/run/lock/lvm -v
   /var/lib/ceph/477045f4-1b34-11ef-9a30-0800274c7359/selinux:/sys/fs/selinux:ro
   -v /:/rootfs -v /etc/hosts:/etc/hosts:ro -v
   /tmp/ceph-tmpe_krhtt8:/etc/ceph/ceph.conf:z -v
   /tmp/ceph-tmp_47jsxdp:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
   
quay.io/ceph/ceph@sha256:257b3f5140c11b51fd710ffdad6213ed53d74146f464a51717262d156daef553
   lvm batch --no-auto /dev/vg_osd/lvm_osd --yes --no-systemd

I had a look around the Net and couldn't find anything relevant. This 
post (https://github.com/rook/rook/issues/4967) talks about a similar 
issue using Rook, but I'm not using Rook but cephadm.


Any help in resolving this (or confirming it is a bug) would be greatly 
appreciated - thanks in advance.


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Forcing Posix Permissions On New CephFS Files

2024-05-09 Thread duluxoz

Hi All,

I've gone and gotten myself into a "can't see the forest for the trees" 
state, so I'm hoping someone can take pity on me and answer a really dumb Q.


So I've got a CephFS system happily bubbling along and a bunch of 
(linux) workstations connected to a number of common shares/folders. To 
take a single one of these folders as an example ("music") the 
sub-folders and files of that share all belong to root:music with 
permissions of 2770 (folders) and 0660 (files). The "music" folder is 
then connected to (as per the Ceph Doco: mount.ceph) via each 
workstation's fstab file - all good, all working, everyone's happy.


What I'm trying to achieve is that when a new piece of music (a file) is 
uploaded to the Ceph Cluster the file inherits the music share's default 
ownership (root:music) and permissions (0660). What is happening at the 
moment is I'm getting permissions of 644 (and 755 for new folders).


I've been looking for a way to do what I want but, as I said, I've gone 
and gotten myself thoroughly mixed-up.


Could someone please point me in the right direction on how to achieve 
what I'm after - thanks


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mysterious Space-Eating Monster

2024-05-06 Thread duluxoz

Thanks Sake,

That recovered just under 4 Gig of space for us

Sorry about the delay getting back to you (been *really* busy) :-)

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-users Digest, Vol 118, Issue 85

2024-04-24 Thread duluxoz

Hi Eugen,

Thank you for a viable solution to our underlying issue - I'll attempt 
to implement it shortly.  :-)


However, with all the respect in world, I believe you are incorrect when 
you say the doco is correct (but I will be more than happy to be proven 
wrong).  :-)


The relevant text (extracted from the document page (the last couple of 
paragraphs)) says:


~~~

If a client already has a capability for file-system name |a| and path 
|dir1|, running |fs authorize| again for FS name |a| but path |dir2|, 
instead of modifying the capabilities client already holds, a new cap 
for |dir2| will be granted:


cephfsauthorizeaclient.x/dir1rw
cephauthgetclient.x

[client.x]
key  =  AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA==
caps  mds  =  "allow rw fsname=a path=/dir1"
caps  mon  =  "allow r fsname=a"
caps  osd  =  "allow rw tag cephfs data=a"

cephfsauthorizeaclient.x/dir2rw

updated  caps  for  client.x

cephauthgetclient.x

[client.x]
key  =  AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA==
caps  mds  =  "allow rw fsname=a path=dir1, allow rw fsname=a path=dir2"
caps  mon  =  "allow r fsname=a"
caps  osd  =  "allow rw tag cephfs data=a"

~~~

The above *seems* to me to say (as per the 2nd `cephauthgetclient.x` 
example) that a 2nd directory (dir2) *will* be added to the `client.x` 
authorisation.


HOWEVER, this does not work in practice - hence my original query.

This is what we originally attempted to do (word for word, only 
substituting our CechFS name for "a") and we got the error in the 
original post.


So if the doco says that something can be done *and* gives a working 
example, but an end-user (admin) cannot achieve the same results but 
gets an error instead when following the exact same commands, then 
either the doco is incorrect *or* there is something else wrong.


BUT your statement ("running 'ceph fs authorize' will overwrite the 
existing caps, it will not add more caps to the client") is in direct 
contradiction to the documentation ("If a client already has a 
capability for file-system name |a| and path |dir1|, running |fs 
authorize| again for FS name |a| but path |dir2|, instead of modifying 
the capabilities client already holds, a new cap for |dir2| will be 
granted").


So there's some sort of "disconnect" there.  :-)

Cheers


On 24/04/2024 17:33, ceph-users-requ...@ceph.io wrote:

Send ceph-users mailing list submissions to
ceph-users@ceph.io

To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
ceph-users-requ...@ceph.io

You can reach the person managing the list at
ceph-users-ow...@ceph.io

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

1. Re: Latest Doco Out Of Date? (Eugen Block)
2. Re: stretched cluster new pool and second pool with nvme
   (Eugen Block)
3. Re: Latest Doco Out Of Date? (Frank Schilder)

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Latest Doco Out Of Date?

2024-04-23 Thread duluxoz

Hi Zac,

Any movement on this? We really need to come up with an answer/solution 
- thanks


Dulux-Oz

On 19/04/2024 18:03, duluxoz wrote:


Cool!

Thanks for that  :-)

On 19/04/2024 18:01, Zac Dover wrote:
I think I understand, after more thought. The second command is 
expected to work after the first.


I will ask the cephfs team when they wake up.

Zac Dover
Upstream Docs
Ceph Foundation


On Fri, Apr 19, 2024 at 17:51, duluxoz mailto:On 
Fri, Apr 19, 2024 at 17:51, duluxoz <> wrote:

Hi All,

In reference to this page from the Ceph documentation:
https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of
that page it says that you can run the following commands:

~~~
ceph fs authorize a client.x /dir1 rw
ceph fs authorize a client.x /dir2 rw
~~~

This will allow `client.x` to access both `dir1` and `dir2`.

So, having a use case where we need to do this, we are, HOWEVER, getting
the following error on running the 2nd command on a Reef 18.2.2 cluster:

`Error EINVAL: client.x already has fs capabilities that differ from
those supplied. To generate a new auth key for client.x, first remove
client.x from configuration files, execute 'ceph auth rm client.x', then
execute this command again.`

Something we're doing wrong, or is the doco "out of date" (mind you,
that's from the "latest" version of the doco, and the "reef" version),
or is something else going on?

Thanks in advance for the help

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Mysterious Space-Eating Monster

2024-04-19 Thread duluxoz

Hi All,

*Something* is chewing up a lot of space on our `\var` partition to the 
point where we're getting warnings about the Ceph monitor running out of 
space (ie > 70% full).


I've been looking, but I can't find anything significant (ie log files 
aren't too big, etc) BUT there seem to be a hell of a lot (15) of 
sub-directories (with GUIDs for names) under the 
`/var/lib/containers/storage/overlay/` folder, all ending with `merged` 
- ie `/var/lib/containers/storage/overlay/{{GUID}}/`merged`.


Is this normal, or is something going wrong somewhere, or am I looking 
in the wrong place?


Also, if this is the issue, can I delete these folders?

Sorry for asking such a noob Q, but the Cephadm/Podman stuff is 
extremely new to me  :-)


Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Latest Doco Out Of Date?

2024-04-19 Thread duluxoz

Cool!

Thanks for that  :-)

On 19/04/2024 18:01, Zac Dover wrote:
I think I understand, after more thought. The second command is 
expected to work after the first.


I will ask the cephfs team when they wake up.

Zac Dover
Upstream Docs
Ceph Foundation


On Fri, Apr 19, 2024 at 17:51, duluxoz mailto:On 
Fri, Apr 19, 2024 at 17:51, duluxoz <> wrote:

Hi All,

In reference to this page from the Ceph documentation:
https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of
that page it says that you can run the following commands:

~~~
ceph fs authorize a client.x /dir1 rw
ceph fs authorize a client.x /dir2 rw
~~~

This will allow `client.x` to access both `dir1` and `dir2`.

So, having a use case where we need to do this, we are, HOWEVER, getting
the following error on running the 2nd command on a Reef 18.2.2 cluster:

`Error EINVAL: client.x already has fs capabilities that differ from
those supplied. To generate a new auth key for client.x, first remove
client.x from configuration files, execute 'ceph auth rm client.x', then
execute this command again.`

Something we're doing wrong, or is the doco "out of date" (mind you,
that's from the "latest" version of the doco, and the "reef" version),
or is something else going on?

Thanks in advance for the help

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Latest Doco Out Of Date?

2024-04-19 Thread duluxoz

Hi Zac,

Yeap, followed the instructions (ie removed the client) and then re-ran 
the commands - say thing.


What in particular do you need to know?  :-)

Cheers

Dulux-Oz

On 19/04/2024 17:58, Zac Dover wrote:

Did you remove client.x from the config?

I need more information about your cluster before I can determine 
whether the documentation is wrong.


Zac Dover
Upstream Docs
Ceph Foundation


On Fri, Apr 19, 2024 at 17:51, duluxoz mailto:On 
Fri, Apr 19, 2024 at 17:51, duluxoz <> wrote:

Hi All,

In reference to this page from the Ceph documentation:
https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of
that page it says that you can run the following commands:

~~~
ceph fs authorize a client.x /dir1 rw
ceph fs authorize a client.x /dir2 rw
~~~

This will allow `client.x` to access both `dir1` and `dir2`.

So, having a use case where we need to do this, we are, HOWEVER, getting
the following error on running the 2nd command on a Reef 18.2.2 cluster:

`Error EINVAL: client.x already has fs capabilities that differ from
those supplied. To generate a new auth key for client.x, first remove
client.x from configuration files, execute 'ceph auth rm client.x', then
execute this command again.`

Something we're doing wrong, or is the doco "out of date" (mind you,
that's from the "latest" version of the doco, and the "reef" version),
or is something else going on?

Thanks in advance for the help

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Latest Doco Out Of Date?

2024-04-19 Thread duluxoz

Hi All,

In reference to this page from the Ceph documentation: 
https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of 
that page it says that you can run the following commands:


~~~
ceph fs authorize a client.x /dir1 rw
ceph fs authorize a client.x /dir2 rw
~~~

This will allow `client.x` to access both `dir1` and `dir2`.

So, having a use case where we need to do this, we are, HOWEVER, getting 
the following error on running the 2nd command on a Reef 18.2.2 cluster:


`Error EINVAL: client.x already has fs capabilities that differ from 
those supplied. To generate a new auth key for client.x, first remove 
client.x from configuration files, execute 'ceph auth rm client.x', then 
execute this command again.`


Something we're doing wrong, or is the doco "out of date" (mind you, 
that's from the "latest" version of the doco, and the "reef" version), 
or is something else going on?


Thanks in advance for the help

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-26 Thread duluxoz
I don't know Marc, i only know what I had to do to get the thing 
working  :-)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-26 Thread duluxoz

Hi All,

OK, an update for everyone, a note about some (what I believe to be) 
missing information in the Ceph Doco, a success story, and an admission 
on my part that I may have left out some important information.


So to start with, I finally got everything working - I now have my 4T 
RBD Image mapped, mounted, and tested on my host.  YA!


The missing Ceph Doco Info:

What I found in the latested Redhat documentation 
(https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/7/html/block_device_guide/the-rbd-kernel-module) 
that is not in the Ceph documentation (perhaps because it is 
EL-specific? - but a note should be placed anyway, even if it is 
EL-specific) is that the RBD Image needs to have a partition entry 
created for it - that might be "obvious" to some, but my ongoing belief 
is that most "obvious" things aren't, so its better to be explicit about 
such things. Just my $0.02 worth.  :-)


The relevant commands, which are performed after a `rbd map 
my_pool.meta/my_image --id my_image_user` are:


[codeblock]

parted /dev/rbd0 mklabel gpt

parted /dev/rbd0 mkpart primary xfs 0% 100%

[/codebock]

From there the RBD Image needs a file system: `mkfs.xfs /dev/rbd0p1`

And a mount: `mount /dev/rbd0p1 /mnt/my_image`

Now, the omission on my part:

The host I was attempting all this on was an oVirt-managed VM. 
Apparently, an oVirt-Managed VM doesn't like/allow (speculation on my 
part) running the `parted` or `mkfs.xfs` commands on an RBD Image. What 
I had to do to test this and get it working was to run the `rbd map`, 
`parted`, and `mkfs.xfs` commands on a physical host (which I did), THEN 
unmount/unmap the image from the physical host and map / mount it on the VM.


So my apologises for not providing all the info - I didn't consider it 
to be relevant - my bad!


So all good in the end. I hope the above helps others if they have 
similar issues.


Thank you all who helped / pitched in with ideas - I really, *really* 
appreciate it.


Thanks too to Wesley Dillingham - although the suggestion wasn't 
relevant to this issue, it did cause me to look at the firewall settings 
on the Ceph Cluster where I found (and corrected) an unrelated issue 
that hadn't reared its ugly head yet. Thanks Wes.


Cheers (until next time)  :-P

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz

Hi Alexander,

Already set (and confirmed by running the command again) - no good, I'm 
afraid.


So I just restart with a brand new image and ran the following commands 
on the ceph cluster and the host respectively. Results are below:


On the ceph cluster:

[code]

rbd create --size 4T my_pool.meta/my_image --data-pool my_pool.data 
--image-feature exclusive-lock --image-feature deep-flatten 
--image-feature fast-diff --image-feature layering --image-feature 
object-map --image-feature data-pool


[/code]

On the host:

[code]

rbd device map my_pool.meta/my_image --id ceph_rbd_user --keyring 
/etc/ceph/ceph.client.ceph_rbd_user.keyring


mkfs.xfs /dev/rbd0

[/code]

Results:

[code]

meta-data=/dev/rbd0  isize=512    agcount=32, 
agsize=33554432 blks

 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1    finobt=1, sparse=1, rmapbt=0
 =   reflink=1    bigtime=1 inobtcount=1 
nrext64=0

data =   bsize=4096   blocks=1073741824, imaxpct=5
 =   sunit=16 swidth=16 blks
naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
log  =internal log   bsize=4096   blocks=521728, version=2
 =   sectsz=512   sunit=16 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
mkfs.xfs: pwrite failed: Input/output error
libxfs_bwrite: write failed on (unknown) bno 0x1ff00/0x100, err=5
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
mkfs.xfs: pwrite failed: Input/output error
libxfs_bwrite: write failed on (unknown) bno 0x0/0x100, err=5
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
mkfs.xfs: pwrite failed: Input/output error
libxfs_bwrite: write failed on xfs_sb bno 0x0/0x1, err=5
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
mkfs.xfs: pwrite failed: Input/output error
libxfs_bwrite: write failed on (unknown) bno 0x10080/0x80, err=5
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
mkfs.xfs: read failed: Input/output error
mkfs.xfs: data size check failed
mkfs.xfs: filesystem failed to initialize
[/code]

On 25/03/2024 15:17, Alexander E. Patrakov wrote:

Hello Matthew,

Is the overwrite enabled in the erasure-coded pool? If not, here is
how to fix it:

ceph osd pool set my_pool.data allow_ec_overwrites true

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz

Hi Curt,

Blockdev --getbsz: 4096

Rbd info my_pool.meta/my_image:

~~~

rbd image 'my_image':
    size 4 TiB in 1048576 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: 294519bf21a1af
    data_pool: my_pool.data
    block_name_prefix: rbd_data.30.294519bf21a1af
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, data-pool

    op_features:
    flags:
    create_timestamp: Sun Mar 24 17:44:33 2024
    access_timestamp: Sun Mar 24 17:44:33 2024
    modify_timestamp: Sun Mar 24 17:44:33 2024
~~~

On 24/03/2024 21:10, Curt wrote:

Hey Mathew,

One more thing out of curiosity can you send the output of blockdev 
--getbsz on the rbd dev and rbd info?


I'm using 16TB rbd images without issue, but I haven't updated to reef 
.2 yet.


Cheers,
Curt

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz

Hi, Alwin,

Command (as requested): rbd create --size 4T my_pool.meta/my_image 
--data-pool my_pool.data --image-feature exclusive-lock --image-feature 
deep-flatten --image-feature fast-diff --image-feature layering 
--image-feature object-map --image-feature data-pool


On 24/03/2024 22:53, Alwin Antreich wrote:

Hi,


March 24, 2024 at 8:19 AM, "duluxoz"  wrote:

Hi,

Yeah, I've been testing various configurations since I sent my last

email - all to no avail.

So I'm back to the start with a brand new 4T image which is rbdmapped to

/dev/rbd0.

Its not formatted (yet) and so not mounted.

Every time I attempt a mkfs.xfs /dev/rbd0 (or mkfs.xfs

/dev/rbd/my_pool/my_image) I get the errors I previous mentioned and the

resulting image then becomes unusable (in ever sense of the word).

If I run a fdisk -l (before trying the mkfs.xfs) the rbd image shows up

in the list - no, I don't actually do a full fdisk on the image.

An rbd info my_pool:my_image shows the same expected values on both the

host and ceph cluster.

I've tried this with a whole bunch of different sized images from 100G

to 4T and all fail in exactly the same way. (My previous successful 100G

test I haven't been able to reproduce).

I've also tried all of the above using an "admin" CephX(sp?) account - I

always can connect via rbdmap, but as soon as I try an mkfs.xfs it

fails. This failure also occurs with a mkfs.ext4 as well (all size drives).

The Ceph Cluster is good (self reported and there are other hosts

happily connected via CephFS) and this host also has a CephFS mapping

which is working.

Between running experiments I've gone over the Ceph Doco (again) and I

can't work out what's going wrong.

There's also nothing obvious/helpful jumping out at me from the

logs/journal (sample below):

~~~

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno

524773 0~65536 result -1

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno

524772 65536~4128768 result -1

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write result -1

Mar 24 17:38:29 my_host.my_net.local kernel: blk_print_req_error: 119

callbacks suppressed

Mar 24 17:38:29 my_host.my_net.local kernel: I/O error, dev rbd0, sector

4298932352 op 0x1:(WRITE) flags 0x4000 phys_seg 1024 prio class 2

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno

524774 0~65536 result -1

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno

524773 65536~4128768 result -1

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write result -1

Mar 24 17:38:29 my_host.my_net.local kernel: I/O error, dev rbd0, sector

4298940544 op 0x1:(WRITE) flags 0x4000 phys_seg 1024 prio class 2

~~~

Any ideas what I should be looking at?

Could you please share the command you've used to create the RBD?

Cheers,
Alwin

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz

Hi Curt,

Nope, no dropped packets or errors - sorry, wrong tree  :-)

Thanks for chiming in.

On 24/03/2024 20:01, Curt wrote:
I may be barking up the wrong tree, but if you run ip -s link show 
yourNicID on this server or your OSDs do you see any 
errors/dropped/missed?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread duluxoz

Hi,

Yeah, I've been testing various configurations since I sent my last 
email - all to no avail.


So I'm back to the start with a brand new 4T image which is rbdmapped to 
/dev/rbd0.


Its not formatted (yet) and so not mounted.

Every time I attempt a mkfs.xfs /dev/rbd0 (or mkfs.xfs 
/dev/rbd/my_pool/my_image) I get the errors I previous mentioned and the 
resulting image then becomes unusable (in ever sense of the word).


If I run a fdisk -l (before trying the mkfs.xfs) the rbd image shows up 
in the list - no, I don't actually do a full fdisk on the image.


An rbd info my_pool:my_image shows the same expected values on both the 
host and ceph cluster.


I've tried this with a whole bunch of different sized images from 100G 
to 4T and all fail in exactly the same way. (My previous successful 100G 
test I haven't been able to reproduce).


I've also tried all of the above using an "admin" CephX(sp?) account - I 
always can connect via rbdmap, but as soon as I try an mkfs.xfs it 
fails. This failure also occurs with a mkfs.ext4 as well (all size drives).


The Ceph Cluster is good (self reported and there are other hosts 
happily connected via CephFS) and this host also has a CephFS mapping 
which is working.


Between running experiments I've gone over the Ceph Doco (again) and I 
can't work out what's going wrong.


There's also nothing obvious/helpful jumping out at me from the 
logs/journal (sample below):


~~~

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno 
524773 0~65536 result -1
Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno 
524772 65536~4128768 result -1

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write result -1
Mar 24 17:38:29 my_host.my_net.local kernel: blk_print_req_error: 119 
callbacks suppressed
Mar 24 17:38:29 my_host.my_net.local kernel: I/O error, dev rbd0, sector 
4298932352 op 0x1:(WRITE) flags 0x4000 phys_seg 1024 prio class 2
Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno 
524774 0~65536 result -1
Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno 
524773 65536~4128768 result -1

Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write result -1
Mar 24 17:38:29 my_host.my_net.local kernel: I/O error, dev rbd0, sector 
4298940544 op 0x1:(WRITE) flags 0x4000 phys_seg 1024 prio class 2

~~~

Any ideas what I should be looking at?

And thank you for the help  :-)

On 24/03/2024 17:50, Alexander E. Patrakov wrote:

Hi,

Please test again, it must have been some network issue. A 10 TB RBD
image is used here without any problems.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-23 Thread duluxoz

Hi Alexander,

DOH!

Thanks for pointing out my typo - I missed it, and yes, it was my 
issue.  :-)


New issue (sort of): The requirement of the new RBD Image is 2 TB in 
size (its for a MariaDB Database/Data Warehouse). However, I'm getting 
the following errors:


~~~

mkfs.xfs: pwrite failed: Input/output error
libxfs_bwrite: write failed on (unknown) bno 0x7f00/0x100, err=5
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
~~~

I tested with a 100 GB image in the same pool and was 100% successful, 
so I'm now wondering if there is some sort of Ceph RBD Image size limit 
- although, honestly, that seems to be counter-intuitive to me 
considering CERN uses Ceph for their data storage needs.


Any ideas / thoughts?

Cheers

Dulux-Oz

On 23/03/2024 18:52, Alexander E. Patrakov wrote:

Hello Dulux-Oz,

Please treat the RBD as a normal block device. Therefore, "mkfs" needs
to be run before mounting it.

The mistake is that you run "mkfs xfs" instead of "mkfs.xfs" (space vs
dot). And, you are not limited to xfs, feel free to use ext4 or btrfs
or any other block-based filesystem.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread duluxoz

On 23/03/2024 18:25, Konstantin Shalygin wrote:

Hi,

Yes, this is generic solution for end users mounts - samba gateway


k
Sent from my iPhone


Thanks Konstantin, I really appreciate the help
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread duluxoz

On 23/03/2024 18:22, Alexander E. Patrakov wrote:

On Sat, Mar 23, 2024 at 3:08 PM duluxoz  wrote:
Almost right. Please set up a cluster of two SAMBA servers with CTDB,
for high availability.


Cool - thanks Alex, I really appreciate it :-)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread duluxoz



On 23/03/2024 18:00, Alexander E. Patrakov wrote:

Hi Dulux-Oz,

CephFS is not designed to deal with mobile clients such as laptops
that can lose connectivity at any time. And I am not talking about the
inconveniences on the laptop itself, but about problems that your
laptop would cause to other clients. The problems stem from the fact
that MDSes give out "caps" to clients, which are, essentially,
licenses to do local caching. If another client wants to access the
same file, the MDS would need to contact the laptop and tell it to
release the caps - which is no longer possible. Result: a health
warning and delays/hangs on other clients.

The proper solution here is to use NFSv3 (ideally with a userspace
client instead of a kernel mount). NFSv3, because v4 has leases which
bring the problem back. And this means that you cannot use cephadm to
deploy this NFS server, as cephadm-deployed NFS-Ganesha is hard-coded
to speak only NFSv4.

SAMBA server with oplocks disabled, and, again, a userspace client
could be another solution.

If you decide to disregard this advice, here are some tips.

With systemd, configuring autofs is as easy as adding
"x-systemd.automount,x-systemd.idle-timeout=1min,noauto,nofail,_netdev"
to your /etc/fstab line. This applies both to CephFS and NFS.

For kernel-based NFSv3 mounts, consider adding "nolock".

Another CephFS-specific mount option that somewhat helps with
reconnects is "recover_session=clean".

--
Alexander E. Patrakov
Hi Alex, and thanks for getting back to me so quickly (I really 
appreciate it),


So from what you said it looks like we've got the wrong solution. 
Instead, (if I'm understanding things correctly) we may be better off 
setting up a dedicated Samba server with the CephFS mounts, and then 
using Samba to share those out - is that right?


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Mounting A RBD Via Kernal Modules

2024-03-23 Thread duluxoz

Hi All,

I'm trying to mount a Ceph Reef (v18.2.2 - latest version) RBD Image as 
a 2nd HDD on a Rocky Linux v9.3 (latest version) host.


The EC pool has been created and initialised and the image has been 
created.


The ceph-common package has been installed on the host.

The correct keyring has been added to the host (with a chmod of 600) and 
the host has been configure with an rbdmap file as follows: 
`my_pool.meta/my_image 
id=ceph_user,keyring=/etc/ceph/ceph.client.ceph_user.keyring`.


When running the rbdmap.service the image appears as both `/dev/rbd0` 
and `/dev/rbd/my_pool.meta/my_image`, exactly as the Ceph Doco says it 
should.


So everything *appears* AOK up to this point.

My question now is: Should I run `mkfs xfs` on `/dev/rbd0` *before* or 
*after* I try to mount the image (via fstab: 
`/dev/rbd/my_pool.meta/my_image  /mnt/my_image  xfs  noauto  0 0` - as 
per the Ceph doco)?


The reason I ask is that I've tried this *both* ways and all I get is an 
error message (sorry, can't remember the exact messages and I'm not 
currently in front of the host to confirm it  :-) - but from memory it 
was something about not being able to recognise the 1st block - or 
something like that).


So, I'm obviously doing something wrong, but I can't work out what 
exactly (and the logs don't show any useful info).


Do I, for instance, have the process wrong / don't understand the exact 
process, or is there something else wrong?


All comments/suggestions/etc greatly appreciated - thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Laptop Losing Connectivity To CephFS On Sleep/Hibernation

2024-03-23 Thread duluxoz

Hi All,

I'm looking for some help/advice to solve the issue outlined in the heading.

I'm running CepfFS (name: cephfs) on a Ceph Reef (v18.2.2 - latest 
update) cluster, connecting from a laptop running Rocky Linux v9.3 
(latest update) with KDE v5 (latest update).


I've set up the laptop to connect to a number of directories on CephFS 
via the `/etc/fstab' folder, an example of such is: 
`ceph_user@.cephfs=/my_folder  /mnt/my_folder  ceph noatime,_netdev  0 0`.


Everything is working great; the required Ceph Key is on the laptop 
(with a chmod of 600), I can access the files on the Ceph Cluster, etc, 
etc, etc - all good.


However, whenever the laptop is in sleep or hibernate mode (ie when I 
close the laptop's lid) and then bring the laptop out of 
sleep/hibernation (ie I open the laptop's lid) I've lost the CephFS 
mountings. The only way to bring them back is to run `mount -a` as root 
(or sudo). This is, as I'm sure you'll agree, not a long-term viable 
options - especially as this is a running as a pilot-project and the 
eventual end-users won't have access to root/sudo.


So I'm seeking the collective wisdom of the community in how to solve 
this issue.


I've taken a brief look at autofs, and even half-heartedly had a go at 
configuring it, but it didn't seem to work - honestly, it was late and I 
wanted to get home after a long day.  :-)


Is this the solution to my issue, or is there a better way to construct 
the fstab entries, or is there another solution I haven't found yet in 
the doco or via google-foo?


All help and advice greatly appreciated - thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS On Windows 10

2024-03-08 Thread duluxoz

Thanks Lucian, your advice helped uncover the root cause of my issue :-)

(Sorry about the delay in the reply)

On 28/02/2024 21:06, Lucian Petrut wrote:

Hi,

I’d double check that the 3300 port is accessible (e.g. using telnet, which can 
be installed as an optional Windows feature). Make sure that it’s using the 
default port and not a custom one, also be aware the v1 protocol uses 6789 by 
default.

Increasing the messenger log level to 10 might also be useful: debug ms = 10.

Regards,
Lucian


On 28 Feb 2024, at 11:05, duluxoz  wrote:

Hi All,

  I'm looking for some pointers/help as to why I can't get my Win10 PC to 
connect to our Ceph Cluster's CephFS Service. Details are as follows:

Ceph Cluster:

- IP Addresses: 192.168.1.10, 192.168.1.11, 192.168.1.12

- Each node above is a monitor & an MDS

- Firewall Ports: open (ie 3300, etc)

- CephFS System Name: my_cephfs

- Log files: nothing jumps out at me

Windows PC:

- Keyring file created and findable: ceph.client.me.keyring

- dokany installed

- ceph-for-windows installed

- Can ping all three ceph nodes

- Connection command: ceph-dokan -l v -o -id me --debug --client_fs my_cephfs 
-c C:\ProgramData\Ceph\ceph.conf

Ceph.conf contents:

~~~

[global]
   mon_host = 192.168.1.10, 192.168.1.11, 192.168.1.12
   log to stderr = true
   log to syslog = true
   run dir = C:/ProgramData/ceph
   crash dir = C:/logs/ceph
   debug client = 2
[client]
   keyring = C:/ProgramData/ceph/ceph.client.me.keyring
   log file = C:/logs/ceph/$name.$pid.log
   admin socket = C:/ProgramData/ceph/$name.$pid.asok
~~~

Windows Logfile contents (ieC:/logs/ceph/client.me..log):

~~~

2024-02-28T18:26:45.201+1100 1  0 monclient(hunting): authenticate timed out 
after 300
2024-02-28T18:31:45.203+1100 1  0 monclient(hunting): authenticate timed out 
after 300
2024-02-28T18:36:45.205+1100 1  0 monclient(hunting): authenticate timed out 
after 300

~~~

Additional info from Windows CLI:

~~~

failed to fetch mon config (--no-mon-config to skip)

~~~

So I've gone through the doco and done some Google-foo and I can't work out 
*why* I'm getting a failure; why I'm getting the authentication failure. I know 
it'll be something simple, something staring me in the face, but I'm at the 
point where I can't see the forest for the trees - please help.

Any help greatly appreciated

Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS On Windows 10

2024-03-08 Thread duluxoz

Thanks Robert (sorry about the delay in replying)  :-)

On 29/02/2024 01:03, Robert W. Eckert wrote:

I have it working on my machines-  the global configuration for me looks like
[global]
 fsid = fe3a7cb0-69ca-11eb-8d45-c86000d08867
 mon_host = [v2:192.168.2.142:3300/0,v1:192.168.2.142:6789/0] 
[v2:192.168.2.141:3300/0,v1:192.168.2.141:6789/0] 
[v2:192.168.2.199:3300/0,v1:192.168.2.199:6789/0]
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx

The important thing for me was to add the FSID, and the auth sections

Also note the port is the 6789 not 3300.



-Original Message-
From: duluxoz  
Sent: Wednesday, February 28, 2024 4:05 AM

To:ceph-users@ceph.io
Subject: [ceph-users] CephFS On Windows 10

Hi All,

   I'm looking for some pointers/help as to why I can't get my Win10 PC to 
connect to our Ceph Cluster's CephFS Service. Details are as follows:

Ceph Cluster:

- IP Addresses: 192.168.1.10, 192.168.1.11, 192.168.1.12

- Each node above is a monitor & an MDS

- Firewall Ports: open (ie 3300, etc)

- CephFS System Name: my_cephfs

- Log files: nothing jumps out at me

Windows PC:

- Keyring file created and findable: ceph.client.me.keyring

- dokany installed

- ceph-for-windows installed

- Can ping all three ceph nodes

- Connection command: ceph-dokan -l v -o -id me --debug --client_fs my_cephfs 
-c C:\ProgramData\Ceph\ceph.conf

Ceph.conf contents:

~~~

[global]
    mon_host = 192.168.1.10, 192.168.1.11, 192.168.1.12
    log to stderr = true
    log to syslog = true
    run dir = C:/ProgramData/ceph
    crash dir = C:/logs/ceph
    debug client = 2
[client]
    keyring = C:/ProgramData/ceph/ceph.client.me.keyring
    log file = C:/logs/ceph/$name.$pid.log
    admin socket = C:/ProgramData/ceph/$name.$pid.asok
~~~

Windows Logfile contents (ieC:/logs/ceph/client.me..log):

~~~

2024-02-28T18:26:45.201+1100 1  0 monclient(hunting): authenticate timed out 
after 300
2024-02-28T18:31:45.203+1100 1  0 monclient(hunting): authenticate timed out 
after 300
2024-02-28T18:36:45.205+1100 1  0 monclient(hunting): authenticate timed out 
after 300

~~~

Additional info from Windows CLI:

~~~

failed to fetch mon config (--no-mon-config to skip)

~~~

So I've gone through the doco and done some Google-foo and I can't work out 
*why* I'm getting a failure; why I'm getting the authentication failure. I know 
it'll be something simple, something staring me in the face, but I'm at the 
point where I can't see the forest for the trees - please help.

Any help greatly appreciated

Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list --ceph-users@ceph.io  To unsubscribe send an email 
toceph-users-le...@ceph.io

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Which RHEL/Fusion/CentOS/Rocky Package Contains cephfs-shell?

2024-03-08 Thread duluxoz

Thanks for the info Kefu - hmm, I wonder who I should raise this with?

On 08/03/2024 19:57, kefu chai wrote:



On Fri, Mar 8, 2024 at 3:54 PM duluxoz  wrote:

Hi All,

The subject pretty much says it all: I need to use cephfs-shell
and its
not installed on my Ceph Node, and I can't seem to locate which
package
contains it - help please.  :-)


Hi Dulux,

cephfs-shell depends on cmd2 and colorama, that's why you can only 
find it on fedora. not sure why it is missing on rhel>=9 though, as 
these two packages are available on el9. but somehow, cephfs-shell is 
missing there.



Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
Regards
Kefu Chai

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Which RHEL/Fusion/CentOS/Rocky Package Contains cephfs-shell?

2024-03-07 Thread duluxoz

Hi All,

The subject pretty much says it all: I need to use cephfs-shell and its 
not installed on my Ceph Node, and I can't seem to locate which package 
contains it - help please.  :-)


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Cluster Config File Locations?

2024-03-05 Thread duluxoz

Hi All,

I don't know how its happened (bad backup/restore, bad config file 
somewhere, I don't know) but my (DEV) Ceph Cluster is in a very bad 
state, and I'm looking for pointers/help in getting it back running 
(unfortunate, a complete rebuild/restore is *not* an option).


This is on Ceph Reef (on Rocky 9) which was converted to CephAdm from a 
manual install a few weeks ago (which worked). Five days ago everything 
when "t!ts-up" (an Ozzie technical ICT term meaning nothing works :-)   )


So, my (first?) issue is that I can't get any Managers to come up clean. 
Each one tries to connect on an ip subnet which doesn't exist any longer 
and hasn't for a couple of years.


The second issue is that (possible because of the first) every `ceph 
orch` command just hangs. Cephadm commands work fine.


I've checked, checked, and checked again that the individual config 
files all point towards the correct ip subnet for the monitors, and I 
cannot find any trace of the old subnet's ip address in any config file 
(that I can find).


For the record I am *not* a "podman guy" so there may be something there 
that's causing my issue(s?) but I don't know where to even begin to look.


Any/all logs simply start that the Manager(s) try to come up, can't find 
an address in the "old" subnet, and so fail - nothing else helpful (at 
least to me).


I've even pulled a copy of the monmap and its showing the correct ip 
subnet addresses for the monitors.


The firewalls are all set correctly and an audit2allow shows nothing is 
out of place, as does disabling SELinux (ie no change).


A `ceph -s` shows I've got no active managers (and that a monitor is 
down - that's my third issue), plus a whole bunch of osds and pgs aren't 
happy either. I have, though, got a monitor quorum.


So, what should I be looking at / where should I be looking? Any help is 
greatly *greatly* appreciated.


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CephFS On Windows 10

2024-02-28 Thread duluxoz

Hi All,

 I'm looking for some pointers/help as to why I can't get my Win10 PC 
to connect to our Ceph Cluster's CephFS Service. Details are as follows:


Ceph Cluster:

- IP Addresses: 192.168.1.10, 192.168.1.11, 192.168.1.12

- Each node above is a monitor & an MDS

- Firewall Ports: open (ie 3300, etc)

- CephFS System Name: my_cephfs

- Log files: nothing jumps out at me

Windows PC:

- Keyring file created and findable: ceph.client.me.keyring

- dokany installed

- ceph-for-windows installed

- Can ping all three ceph nodes

- Connection command: ceph-dokan -l v -o -id me --debug --client_fs 
my_cephfs -c C:\ProgramData\Ceph\ceph.conf


Ceph.conf contents:

~~~

[global]
  mon_host = 192.168.1.10, 192.168.1.11, 192.168.1.12
  log to stderr = true
  log to syslog = true
  run dir = C:/ProgramData/ceph
  crash dir = C:/logs/ceph
  debug client = 2
[client]
  keyring = C:/ProgramData/ceph/ceph.client.me.keyring
  log file = C:/logs/ceph/$name.$pid.log
  admin socket = C:/ProgramData/ceph/$name.$pid.asok
~~~

Windows Logfile contents (ieC:/logs/ceph/client.me..log):

~~~

2024-02-28T18:26:45.201+1100 1  0 monclient(hunting): authenticate timed 
out after 300
2024-02-28T18:31:45.203+1100 1  0 monclient(hunting): authenticate timed 
out after 300
2024-02-28T18:36:45.205+1100 1  0 monclient(hunting): authenticate timed 
out after 300


~~~

Additional info from Windows CLI:

~~~

failed to fetch mon config (--no-mon-config to skip)

~~~

So I've gone through the doco and done some Google-foo and I can't work 
out *why* I'm getting a failure; why I'm getting the authentication 
failure. I know it'll be something simple, something staring me in the 
face, but I'm at the point where I can't see the forest for the trees - 
please help.


Any help greatly appreciated

Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Image Returning 'Unknown Filesystem LVM2_member' On Mount - Help Please

2024-02-04 Thread duluxoz

Mounting/Mapping commands:

`rbd device map data_pool/data_pool_image_1 --id rbd_user --keyring 
/etc/ceph/ceph.client.rbd_user.keyring`


`mount /dev/rbd0 /mountpoint/rbd_data`

data_pool is showing up in a lsblk command as mapped to /dev/rbd0:

~~~

NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS

rbd0 252:0  0  8T  0     disk

~~~

*Yes, its an 8 TB "disk" - lot of space for bulk files - in truth, its 
probably "overkill" but if you've got that much space...  :-)


On 05/02/2024 17:43, Curt wrote:
Out of curiosity, how are you mapping the rbd? Have you tried using 
guestmount?


I'm just spitballing, I have no experience with your issue, so 
probably not much help or useful.


On Mon, 5 Feb 2024, 10:05 duluxoz,  wrote:

~~~
Hello,
I think that /dev/rbd* devices are flitered "out" or not filter
"in" by the fiter
option in the devices section of /etc/lvm/lvm.conf.
So pvscan (pvs, vgs and lvs) don't look at your device.
~~~

Hi Gilles,

So the lvm filter from the lvm.conf file is set to the default of
`filter = [ "a|.*|" ]`, so that's accept every block device, so no
luck there  :-(


~~~
For Ceph based LVM volumes, you would do this to import:
Map every one of the RBDs to the host
Include this in /etc/lvm/lvm.conf:
types = [ "rbd", 1024 ]
pvscan
vgscan
pvs
vgs
If you see the VG:
vgimportclone -n  /dev/rbd0 /dev/rbd1 ... --import
Now you should be able to vgchange -a y  and see the LVs
~~~

Hi Alex,

Did the above as you suggested - the rbd devices (3 of them, none
of which were originally part of an lvm on the ceph servers - at
least, not set up manually by me) still do not show up using
pvscan, etc.

So I still can't mount any of them (not without re-creating a fs,
anyway, and thus losing the data I'm trying to read/import) - they
all return the same error message (see original post).

Anyone got any other ideas?   :-)

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Image Returning 'Unknown Filesystem LVM2_member' On Mount - Help Please

2024-02-04 Thread duluxoz

~~~
Hello,
I think that /dev/rbd* devices are flitered "out" or not filter "in" by the 
fiter
option in the devices section of /etc/lvm/lvm.conf.
So pvscan (pvs, vgs and lvs) don't look at your device.
~~~

Hi Gilles,

So the lvm filter from the lvm.conf file is set to the default of `filter = [ 
"a|.*|" ]`, so that's accept every block device, so no luck there  :-(


~~~
For Ceph based LVM volumes, you would do this to import:
Map every one of the RBDs to the host
Include this in /etc/lvm/lvm.conf:
types = [ "rbd", 1024 ]
pvscan
vgscan
pvs
vgs
If you see the VG:
vgimportclone -n  /dev/rbd0 /dev/rbd1 ... --import
Now you should be able to vgchange -a y  and see the LVs
~~~

Hi Alex,

Did the above as you suggested - the rbd devices (3 of them, none of which were 
originally part of an lvm on the ceph servers - at least, not set up manually 
by me) still do not show up using pvscan, etc.

So I still can't mount any of them (not without re-creating a fs, anyway, and 
thus losing the data I'm trying to read/import) - they all return the same 
error message (see original post).

Anyone got any other ideas?   :-)

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Image Returning 'Unknown Filesystem LVM2_member' On Mount - Help Please

2024-02-04 Thread duluxoz

Hi Jayanth,

Only a couple of glusterfs volumes ie the glusterfs bricks are sitting 
on lvs which are on a lv spares volume on a vg which spans two pvs


My Google Foo led me to believe that the above set-up would (should?) be 
entirely independent of anything to do with rbd/ceph - was I wrong in this?


Cheers

On 04/02/2024 19:34, Jayanth Reddy wrote:

Hi,
Anything with "pvs" and "vgs" on the client machine where there is 
/dev/rbd0?


Thanks
----
*From:* duluxoz 
*Sent:* Sunday, February 4, 2024 1:59:04 PM
*To:* yipik...@gmail.com ; matt...@peregrineit.net 


*Cc:* ceph-users@ceph.io 
*Subject:* [ceph-users] Re: RBD Image Returning 'Unknown Filesystem 
LVM2_member' On Mount - Help Please

Hi Cedric,

That's what I thought - the access method shouldn't make a difference.

No, no lvs details at all - I mean, yes, the osds show up with the lvs
command on the ceph node(s), but not on the individual pools/images (on
the ceph node or the client) - this is, of course, that I'm doing this
right (and there's no guarantee of that).

To clarify: entering `lvs` on the client (which has the rbd image
"attached" as /dev/rbd0) returns nothing, and `lvs` on any of the ceph
nodes only returns the data for each OSD/HDD.

Full disclosure (as I should have done in the first post): the
pool/image was/is used as block device for oVirt VM disk images - but as
far as I'm aware this should not be the cause of this issue (because we
also use glusterfs and we've got similar VM disk images on gluster
drives/bricks and the VM images show up as "simple" files (yes, I'm
simplifying things a bit with that last statement)

On 04/02/2024 19:16, Cedric wrote:
> Hello,
>
> Data on a volume should be the same independently on how they are
> being accessed.
>
> I would think the volume was previously initialized with an LVM layer,
> did "lvs" shows any logical volume on the system ?
>
> On Sun, Feb 4, 2024, 08:56 duluxoz  wrote:
>
> Hi All,
>
> All of this is using the latest version of RL and Ceph Reef
>
> I've got an existing RBD Image (with data on it - not "critical"
> as I've
> got a back up, but its rather large so I was hoping to avoid the
> restore
> scenario).
>
> The RBD Image used to be server out via an (Ceph) iSCSI Gateway,
> but we
> are now looking to use plain old kernal module.
>
> The RBD Image has been RBD Mapped to the client's /dev/rbd0 
location.

>
> So now I'm trying a straight `mount /dev/rbd0 /mount/old_image/`
> as a test
>
> What I'm getting back is `mount: /mount/old_image/: unknown
> filesystem
> type 'LVM2_member'.`
>
> All my Google Foo is telling me that to solve this issue I need to
> reformat the image with a new file system - which would mean 
"losing"

> the data.
>
> So my question is: How can I get to this data using rbd kernal
> modules
> (the iSCSI Gateway is no longer available, so not an option), or 
am I

> stuck with the restore option?
>
> Or is there something I'm missing (which would not surprise me 
in the

> least)?  :-)
>
> Thanks in advance (as always, you guys and gals are really, really
> helpful)
>
> Cheers
>
>
> Dulux-Oz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Image Returning 'Unknown Filesystem LVM2_member' On Mount - Help Please

2024-02-04 Thread duluxoz

Hi Cedric,

That's what I thought - the access method shouldn't make a difference.

No, no lvs details at all - I mean, yes, the osds show up with the lvs 
command on the ceph node(s), but not on the individual pools/images (on 
the ceph node or the client) - this is, of course, that I'm doing this 
right (and there's no guarantee of that).


To clarify: entering `lvs` on the client (which has the rbd image 
"attached" as /dev/rbd0) returns nothing, and `lvs` on any of the ceph 
nodes only returns the data for each OSD/HDD.


Full disclosure (as I should have done in the first post): the 
pool/image was/is used as block device for oVirt VM disk images - but as 
far as I'm aware this should not be the cause of this issue (because we 
also use glusterfs and we've got similar VM disk images on gluster 
drives/bricks and the VM images show up as "simple" files (yes, I'm 
simplifying things a bit with that last statement)


On 04/02/2024 19:16, Cedric wrote:

Hello,

Data on a volume should be the same independently on how they are 
being accessed.


I would think the volume was previously initialized with an LVM layer, 
did "lvs" shows any logical volume on the system ?


On Sun, Feb 4, 2024, 08:56 duluxoz  wrote:

Hi All,

All of this is using the latest version of RL and Ceph Reef

I've got an existing RBD Image (with data on it - not "critical"
as I've
got a back up, but its rather large so I was hoping to avoid the
restore
scenario).

The RBD Image used to be server out via an (Ceph) iSCSI Gateway,
but we
are now looking to use plain old kernal module.

The RBD Image has been RBD Mapped to the client's /dev/rbd0 location.

So now I'm trying a straight `mount /dev/rbd0 /mount/old_image/`
as a test

What I'm getting back is `mount: /mount/old_image/: unknown
filesystem
type 'LVM2_member'.`

All my Google Foo is telling me that to solve this issue I need to
reformat the image with a new file system - which would mean "losing"
the data.

So my question is: How can I get to this data using rbd kernal
modules
(the iSCSI Gateway is no longer available, so not an option), or am I
stuck with the restore option?

Or is there something I'm missing (which would not surprise me in the
least)?  :-)

Thanks in advance (as always, you guys and gals are really, really
helpful)

Cheers


Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD Image Returning 'Unknown Filesystem LVM2_member' On Mount - Help Please

2024-02-03 Thread duluxoz

Hi All,

All of this is using the latest version of RL and Ceph Reef

I've got an existing RBD Image (with data on it - not "critical" as I've 
got a back up, but its rather large so I was hoping to avoid the restore 
scenario).


The RBD Image used to be server out via an (Ceph) iSCSI Gateway, but we 
are now looking to use plain old kernal module.


The RBD Image has been RBD Mapped to the client's /dev/rbd0 location.

So now I'm trying a straight `mount /dev/rbd0 /mount/old_image/` as a test

What I'm getting back is `mount: /mount/old_image/: unknown filesystem 
type 'LVM2_member'.`


All my Google Foo is telling me that to solve this issue I need to 
reformat the image with a new file system - which would mean "losing" 
the data.


So my question is: How can I get to this data using rbd kernal modules 
(the iSCSI Gateway is no longer available, so not an option), or am I 
stuck with the restore option?


Or is there something I'm missing (which would not surprise me in the 
least)?  :-)


Thanks in advance (as always, you guys and gals are really, really helpful)

Cheers


Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Changing A Ceph Cluster's Front- And/Or Back-End Networks IP Address(es)

2024-01-30 Thread duluxoz

Hi All,

Quick Q: How easy/hard is it to change the IP networks of:

1) A Ceph Cluster's "Front-End" Network?

2) A Ceph Cluster's "Back-End" Network?

Is it a "simply" matter of:

a) Placing the Nodes in maintenance mode

b) Changing a config file (I assume it's /etc/ceph/ceph.conf) on each Node

c) Rebooting the Nodes

d) Taking each Node out of Maintenance Mode

Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RFI: Prometheus, Etc, Services - Optimum Number To Run

2024-01-19 Thread duluxoz

Hi All,

In regards to the monitoring services on a Ceph Cluster (ie Prometheus, 
Grafana, Alertmanager, Loki, Node-Exported, Promtail, etc) how many 
instances should/can we run for fault tolerance purposes? I can't seem 
to recall that advice being in the doco anywhere (but of course, I 
probably missed it).


I'm concerned about HA on those services - will they continue to run if 
the Ceph Node they're on fails?


At the moment we're running only 1 instance of each in the cluster, but 
several Ceph Nodes are capable of running each - ie/eg 3 nodes 
configured but only count:1.


This is on the latest version of Reef using cephadmin (if it makes a 
huge difference :-) ).


So any advice, etc, would be greatly appreciated, including if we should 
be running any services not mentioned (not Mgr, Mon, OSD, or iSCSI, 
obviously :-) )


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: REST API Endpoint Failure - Request For Where To Look To Resolve

2024-01-06 Thread duluxoz

Hi Niz (may I call you "Niz"?)

So, with the info you provided I was able to find what the issue was in 
the logs (now I know where the darn things are!) and so we have resolved 
our problem - a mis-configured port number - obvious when you think 
about it - and so I'd like to thank you once again for all of you 
patience and help


Cheers

Dulux-oz

On 05/01/2024 20:39, Nizamudeen A wrote:
ah sorry for that. Outside the cephadm shell, if you do cephadm ls | 
grep "mgr.", that should give you the mgr container name. It should 
look something like this

[root@ceph-node-00 ~]# cephadm ls | grep "mgr."
        "name": "mgr.ceph-node-00.aoxbdg",
        "systemd_unit": 
"ceph-e877a630-abaa-11ee-b7ce-52540097c...@mgr.ceph-node-00.aoxbdg",

        "service_name": "mgr",

and you can use that name to see the logs.

On Fri, Jan 5, 2024 at 3:04 PM duluxoz  wrote:

Yeah, that's what I meant when I said I'm new to podman and
containers - so, stupid Q: What is the "typical" name for a given
container eg if the server is "node1" is the management container
"mgr.node1" of something similar?

And thanks for the help - I really *do* appreciate it. :-)

On 05/01/2024 20:30, Nizamudeen A wrote:

ah yeah, its usually inside the container so you'll need to check
the mgr container for the logs.
cephadm logs -n 

also cephadm has
its own log channel which can be used to get the logs.
    
https://docs.ceph.com/en/quincy/cephadm/operations/#watching-cephadm-log-messages

On Fri, Jan 5, 2024 at 2:54 PM duluxoz  wrote:

Yeap, can do - are the relevant logs in the "usual" place or
buried somewhere inside some sort of container (typically)?  :-)

On 05/01/2024 20:14, Nizamudeen A wrote:

no, the error message is not clear enough to deduce an
error. could you perhaps share the mgr logs at the time of
the error? It could have some tracebacks
which can give more info to debug it further.

Regards,

On Fri, Jan 5, 2024 at 2:00 PM duluxoz 
wrote:

Hi Nizam,

Yeap, done all that - we're now at the point of creating
the iSCSI Target(s) for the gateway (via the Dashboard
and/or the CLI: see the error message in the OP) - any
ideas?  :-)

Cheers

Dulux-Oz

On 05/01/2024 19:10, Nizamudeen A wrote:

Hi,

You can find the APIs associated with the iscsi here:
https://docs.ceph.com/en/reef/mgr/ceph_api/#iscsi

and if you create iscsi service through dashboard or
cephadm, it should add the iscsi gateways to the dashboard.
you can view them by issuing *ceph dashboard
iscsi-gateway-list* and you can add or remove gateways
manually by

ceph dashboard iscsi-gateway-add -i
 []
ceph dashboard iscsi-gateway-rm 

which you can find the documentation here:

https://docs.ceph.com/en/quincy/mgr/dashboard/#enabling-iscsi-management

Regards,
Nizam




On Fri, Jan 5, 2024 at 12:53 PM duluxoz
 wrote:

Hi All,

A little help please.

TL/DR: Please help with error message:
~~~
REST API failure, code : 500
Unable to access the configuration object
Unable to contact the local API endpoint
(https://localhost:5000/api)
~~~

The Issue


I've been through the documentation and can't find
what I'm looking for
- possibly because I'm not really sure what it is I
*am* looking for, so
if someone can point me in the right direction I
would really appreciate it.

I get the above error message when I run the
`gwcli` command from inside
a cephadm shell.

What I'm trying to do is set up a set of iSCSI
Gateways in our Ceph-Reef
18.2.1 Cluster (yes, I know its being depreciated
as of Nov 22 - or
whatever). We recently migrated 7 upgraded from a
manual install of
Quincy to a CephAdm install of Reef - everything
went AOK *except* for
the iSCSI Gateways. So we tore them down and then
rebuilt them as per
the latest documentation. So now we've got 3
gateways as per the Service
page of the Dashboard and I'm trying to create the
targets.

I tried via the Dash

[ceph-users] Re: REST API Endpoint Failure - Request For Where To Look To Resolve

2024-01-05 Thread duluxoz
Yeah, that's what I meant when I said I'm new to podman and containers - 
so, stupid Q: What is the "typical" name for a given container eg if the 
server is "node1" is the management container "mgr.node1" of something 
similar?


And thanks for the help - I really *do* appreciate it.  :-)

On 05/01/2024 20:30, Nizamudeen A wrote:
ah yeah, its usually inside the container so you'll need to check the 
mgr container for the logs.

cephadm logs -n 

also cephadm has
its own log channel which can be used to get the logs.
https://docs.ceph.com/en/quincy/cephadm/operations/#watching-cephadm-log-messages

On Fri, Jan 5, 2024 at 2:54 PM duluxoz  wrote:

Yeap, can do - are the relevant logs in the "usual" place or
buried somewhere inside some sort of container (typically)?  :-)

On 05/01/2024 20:14, Nizamudeen A wrote:

no, the error message is not clear enough to deduce an error.
could you perhaps share the mgr logs at the time of the error? It
could have some tracebacks
which can give more info to debug it further.

Regards,

    On Fri, Jan 5, 2024 at 2:00 PM duluxoz  wrote:

Hi Nizam,

Yeap, done all that - we're now at the point of creating the
iSCSI Target(s) for the gateway (via the Dashboard and/or the
CLI: see the error message in the OP) - any ideas?  :-)

Cheers

Dulux-Oz

On 05/01/2024 19:10, Nizamudeen A wrote:

Hi,

You can find the APIs associated with the iscsi here:
https://docs.ceph.com/en/reef/mgr/ceph_api/#iscsi

and if you create iscsi service through dashboard or
cephadm, it should add the iscsi gateways to the dashboard.
you can view them by issuing *ceph dashboard
iscsi-gateway-list* and you can add or remove gateways
manually by

ceph dashboard iscsi-gateway-add -i
 []
ceph dashboard iscsi-gateway-rm 

which you can find the documentation here:
https://docs.ceph.com/en/quincy/mgr/dashboard/#enabling-iscsi-management

Regards,
Nizam




    On Fri, Jan 5, 2024 at 12:53 PM duluxoz 
wrote:

Hi All,

A little help please.

TL/DR: Please help with error message:
~~~
REST API failure, code : 500
Unable to access the configuration object
Unable to contact the local API endpoint
(https://localhost:5000/api)
~~~

The Issue


I've been through the documentation and can't find what
I'm looking for
- possibly because I'm not really sure what it is I *am*
looking for, so
if someone can point me in the right direction I would
really appreciate it.

I get the above error message when I run the `gwcli`
command from inside
a cephadm shell.

What I'm trying to do is set up a set of iSCSI Gateways
in our Ceph-Reef
18.2.1 Cluster (yes, I know its being depreciated as of
Nov 22 - or
whatever). We recently migrated 7 upgraded from a manual
install of
Quincy to a CephAdm install of Reef - everything went
AOK *except* for
the iSCSI Gateways. So we tore them down and then
rebuilt them as per
the latest documentation. So now we've got 3 gateways as
per the Service
page of the Dashboard and I'm trying to create the targets.

I tried via the Dashboard but had errors, so instead I
went in to do it
via gwcli and hit the above error (which I now bevel to
be the cause of
the GUI creation I encountered.

I have absolutely no experience with podman or
containers in general,
and can't work out how to fix the issue. So I'm
requesting some help -
not to solve the problem for me, but to point me in the
right direction
to solve it myself.  :-)

So, anyone?

Cheers
Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



-- 


*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn
<http://au.linkedin.com/in/mjblack>

This Email is intended only for the addressee.  Its use is limited
to that intended by the auth

[ceph-users] Re: REST API Endpoint Failure - Request For Where To Look To Resolve

2024-01-05 Thread duluxoz
Yeap, can do - are the relevant logs in the "usual" place or buried 
somewhere inside some sort of container (typically)?  :-)


On 05/01/2024 20:14, Nizamudeen A wrote:
no, the error message is not clear enough to deduce an error. could 
you perhaps share the mgr logs at the time of the error? It could have 
some tracebacks

which can give more info to debug it further.

Regards,

On Fri, Jan 5, 2024 at 2:00 PM duluxoz  wrote:

Hi Nizam,

Yeap, done all that - we're now at the point of creating the iSCSI
Target(s) for the gateway (via the Dashboard and/or the CLI: see
the error message in the OP) - any ideas?  :-)

Cheers

Dulux-Oz

On 05/01/2024 19:10, Nizamudeen A wrote:

Hi,

You can find the APIs associated with the iscsi here:
https://docs.ceph.com/en/reef/mgr/ceph_api/#iscsi

and if you create iscsi service through dashboard or cephadm, it
should add the iscsi gateways to the dashboard.
you can view them by issuing *ceph dashboard iscsi-gateway-list*
and you can add or remove gateways manually by

ceph dashboard iscsi-gateway-add -i 
[]
ceph dashboard iscsi-gateway-rm 

which you can find the documentation here:
https://docs.ceph.com/en/quincy/mgr/dashboard/#enabling-iscsi-management

Regards,
Nizam




On Fri, Jan 5, 2024 at 12:53 PM duluxoz  wrote:

Hi All,

A little help please.

TL/DR: Please help with error message:
~~~
REST API failure, code : 500
Unable to access the configuration object
Unable to contact the local API endpoint
(https://localhost:5000/api)
~~~

The Issue


I've been through the documentation and can't find what I'm
looking for
- possibly because I'm not really sure what it is I *am*
looking for, so
if someone can point me in the right direction I would really
appreciate it.

I get the above error message when I run the `gwcli` command
from inside
a cephadm shell.

What I'm trying to do is set up a set of iSCSI Gateways in
our Ceph-Reef
18.2.1 Cluster (yes, I know its being depreciated as of Nov
22 - or
whatever). We recently migrated 7 upgraded from a manual
install of
Quincy to a CephAdm install of Reef - everything went AOK
*except* for
the iSCSI Gateways. So we tore them down and then rebuilt
them as per
the latest documentation. So now we've got 3 gateways as per
the Service
page of the Dashboard and I'm trying to create the targets.

I tried via the Dashboard but had errors, so instead I went
in to do it
via gwcli and hit the above error (which I now bevel to be
the cause of
the GUI creation I encountered.

I have absolutely no experience with podman or containers in
general,
and can't work out how to fix the issue. So I'm requesting
some help -
not to solve the problem for me, but to point me in the right
direction
to solve it myself.  :-)

So, anyone?

Cheers
Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: REST API Endpoint Failure - Request For Where To Look To Resolve

2024-01-05 Thread duluxoz

Hi Nizam,

Yeap, done all that - we're now at the point of creating the iSCSI 
Target(s) for the gateway (via the Dashboard and/or the CLI: see the 
error message in the OP) - any ideas?  :-)


Cheers

Dulux-Oz

On 05/01/2024 19:10, Nizamudeen A wrote:

Hi,

You can find the APIs associated with the iscsi here: 
https://docs.ceph.com/en/reef/mgr/ceph_api/#iscsi


and if you create iscsi service through dashboard or cephadm, it 
should add the iscsi gateways to the dashboard.
you can view them by issuing *ceph dashboard iscsi-gateway-list* and 
you can add or remove gateways manually by


ceph dashboard iscsi-gateway-add -i  
[]

ceph dashboard iscsi-gateway-rm 

which you can find the documentation here: 
https://docs.ceph.com/en/quincy/mgr/dashboard/#enabling-iscsi-management


Regards,
Nizam




On Fri, Jan 5, 2024 at 12:53 PM duluxoz  wrote:

Hi All,

A little help please.

TL/DR: Please help with error message:
~~~
REST API failure, code : 500
Unable to access the configuration object
Unable to contact the local API endpoint (https://localhost:5000/api)
~~~

The Issue


I've been through the documentation and can't find what I'm
looking for
- possibly because I'm not really sure what it is I *am* looking
for, so
if someone can point me in the right direction I would really
appreciate it.

I get the above error message when I run the `gwcli` command from
inside
a cephadm shell.

What I'm trying to do is set up a set of iSCSI Gateways in our
Ceph-Reef
18.2.1 Cluster (yes, I know its being depreciated as of Nov 22 - or
whatever). We recently migrated 7 upgraded from a manual install of
Quincy to a CephAdm install of Reef - everything went AOK *except*
for
the iSCSI Gateways. So we tore them down and then rebuilt them as per
the latest documentation. So now we've got 3 gateways as per the
Service
page of the Dashboard and I'm trying to create the targets.

I tried via the Dashboard but had errors, so instead I went in to
do it
via gwcli and hit the above error (which I now bevel to be the
cause of
the GUI creation I encountered.

I have absolutely no experience with podman or containers in general,
and can't work out how to fix the issue. So I'm requesting some
help -
not to solve the problem for me, but to point me in the right
direction
to solve it myself.  :-)

So, anyone?

Cheers
Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] REST API Endpoint Failure - Request For Where To Look To Resolve

2024-01-04 Thread duluxoz

Hi All,

A little help please.

TL/DR: Please help with error message:
~~~
REST API failure, code : 500
Unable to access the configuration object
Unable to contact the local API endpoint (https://localhost:5000/api)
~~~

The Issue


I've been through the documentation and can't find what I'm looking for 
- possibly because I'm not really sure what it is I *am* looking for, so 
if someone can point me in the right direction I would really appreciate it.


I get the above error message when I run the `gwcli` command from inside 
a cephadm shell.


What I'm trying to do is set up a set of iSCSI Gateways in our Ceph-Reef 
18.2.1 Cluster (yes, I know its being depreciated as of Nov 22 - or 
whatever). We recently migrated 7 upgraded from a manual install of 
Quincy to a CephAdm install of Reef - everything went AOK *except* for 
the iSCSI Gateways. So we tore them down and then rebuilt them as per 
the latest documentation. So now we've got 3 gateways as per the Service 
page of the Dashboard and I'm trying to create the targets.


I tried via the Dashboard but had errors, so instead I went in to do it 
via gwcli and hit the above error (which I now bevel to be the cause of 
the GUI creation I encountered.


I have absolutely no experience with podman or containers in general, 
and can't work out how to fix the issue. So I'm requesting some help - 
not to solve the problem for me, but to point me in the right direction 
to solve it myself.  :-)


So, anyone?

Cheers
Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-iscsi on RL9

2023-12-27 Thread duluxoz

Hi All,

A follow up: So, I've got all the Ceph Nodes running Reef v18.2.1 on 
RL9.3, and everything is working - YAH!


Except...

The Ceph Dashboard shows 0 of 3 iSCSI Gateways working, and when I click 
on that panel it returns a "Page not Found" message - so I *assume* 
those are the three "original" iSCSI Gateways I had set up under Quincy/RL8.


How do I get rid of them? I think I've removed all references to them 
(ie tcmu-runner, rbd-target-api, rbd-target-gw) but obviously, something 
has been missed - could someone please point me in the correct direction 
to finish "cleaning them up" - thanks.


I've also created (via the Web GUI) three new iSCSI Services, which the 
GUI says are running. `ceph -s`, however, doesn't show them at all - is 
this normal?


Also, it is not clear (to me) from the Reef doco if there is anything 
else that needs to be done to get iSCSI up and running (on the server 
side - obviously I need to create/update the initiators on the client 
side). Under the "old manual" way of doing it (ie 
https://docs.ceph.com/en/reef/rbd/iscsi-target-cli/) there was "extra 
stuff to do" - does that not apply any more?


And finally, during my investigations I discovered a systemd service for 
osd.21 loaded but failed - there is no osd.21, so I must have made a 
typo somewhere in the past (there are only 21 osds in the cluster, so 
the last one is osd.20). The trouble is I can't seem to find *where* 
this is defined (ie non of the typical commands, etc (eg `ceph osd 
destroy osd.21`) csn seem to find it and/or get rid of it) - could 
someone please help me out with this as well - thanks.


Anything else anyone want to know please ask

Cheers

Dulux-oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-iscsi on RL9

2023-12-23 Thread duluxoz

Hi All,

Just successfully(?) completed a "live" update of the first node of a 
Ceph Quincy cluster from RL8 to RL9. Everything "seems" to be working - 
EXCEPT the iSCSI Gateway on that box.


During the update the ceph-iscsi package was removed (ie 
`ceph-iscsi-3.6-2.g97f5b02.el8.noarch.rpm` - this is the latest package 
available from the Ceph Repos). So, obviously, I reinstalled the package.


However, `dnf` is throwing errors (unsurprisingly, as that package is an 
el8 package and this box is now running el9): that package requires 
python 3.6 and el9 runs with python 3.8 (I believe).


So my question(s) is: Can I simply "downgrade" python to 3.6, or is 
there an el9-compatible version of `ceph-iscsi` somewhere, and/or is 
there some process I need to follow to get the iSCSI Gateway back up and 
running?


Some further info: The next step in my 
"happy-happy-fun-time-holiday-ICT-maintenance" was to upgrade the 
current Ceph Cluster to use `cephadm` and to go from Ceph-Quincy to 
Ceph-Reef - is this my ultimate upgrade path to get the iSCSI G/W back?


BTW the Ceph Cluster is used *only* to provide iSCSI LUNS to an oVirt 
(KVM) Cluster front-end. Because it is the holidays I can take the 
entire network down (ie shutdown all the VMs) to facilitate this update 
process, which also means that I can use some other way (ie a non-iSCSI 
way - I think) to connect the Ceph SAN Cluster to the oVirt VM-Hosting 
Cluster - if *this* is the solution (ie no iSCSI) does someone have any 
experience in running oVirt off of Ceph in a non-iSCSI way - and could 
you be so kind as to provide some pointers/documentation/help?


And before anyone says it, let me: "I broke, now I own it" :-)

Thanks in advance, and everyone have a Merry Christmas, Heavenly 
Hanukkah, Quality Kwanzaa, Really-good (upcoming) Ramadan, and/or a 
Happy Holidays.


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Pool Migration / Import/Export

2023-12-10 Thread duluxoz

Hi All,

I find myself in the position of having to change the k/m values on an 
ec-pool. I've discovered that I simply can't change the ec-profile, but 
have to create a "new ec-profile" and a "new ec-pool" using the new 
values, then migrate the "old ec-pool" to the new (see: 
https://ceph.io/en/news/blog/2015/ceph-pool-migration/ => Using Rados 
Export/Import).


(Yes, a PITA, but it has to be done, and better doing it now when the 
data-size isn't that big - yet!)


My only concern is that the "old ec-pool" is a `--data-pool` and part of 
an rbd image (ie the image was created with `rbd create --size 2T 
ec_rbd_pool/disk01 --data-pool ec_pool --image-feature journaling`).


So my Q is: What are the "gotchas" (if any) with this, or is it simpler 
to backup the data (already done), destroy and recreate the rbd image 
from scratch, and restore the data to the re-created pool(s)/image?


FTR: space isn't an issue (I've got plenty to play with) but I'm looking 
for the quickest way to do this as its a live (but little used) system, 
so I can take it down for a few hours (or more), but don't particular 
*want* to have it off-line for longer than absolutely necessary.


Thanks in advance for any advice/help/warnings/etc  :-)

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-users Digest, Vol 114, Issue 14

2023-12-05 Thread duluxoz

Hi Zitat,

I'm confused - doesn't k4 m2 mean that you can loose any 2 out of the 6 
osds?


Cheers

Dulux-Oz

On 05/12/2023 20:02, ceph-users-requ...@ceph.io wrote:

Send ceph-users mailing list submissions to
ceph-users@ceph.io

To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
ceph-users-requ...@ceph.io

You can reach the person managing the list at
ceph-users-ow...@ceph.io

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

1. Re: EC Profiles & DR (David Rivera)
2. Re: EC Profiles & DR (duluxoz)
3. Re: EC Profiles & DR (Eugen Block)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread duluxoz

Thanks David, I knew I had something wrong  :-)

Just for my own edification: Why is k=2, m=1 not recommended for 
production? Considered to "fragile", or something else?


Cheers

Dulux-Oz

On 05/12/2023 19:53, David Rivera wrote:
First problem here is you are using crush-failure-domain=osd when you 
should use crush-failure-domain=host. With three hosts, you should use 
k=2, m=1; this is not recommended in  production environment.


On Mon, Dec 4, 2023, 23:26 duluxoz  wrote:

Hi All,

Looking for some help/explanation around erasure code pools, etc.

I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs
(HDDs) and each box running Monitor, Manager, and iSCSI Gateway.
For the
record the cluster runs beautifully, without resource issues, etc.

I created an Erasure Code Profile, etc:

~~~
ceph osd erasure-code-profile set my_ec_profile plugin=jerasure
k=4 m=2
crush-failure-domain=osd
ceph osd crush rule create-erasure my_ec_rule my_ec_profile
ceph osd crush rule create-replicated my_replicated_rule default host
~~~

My Crush Map is:

~~~
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host ceph_1 {
   id -3    # do not change unnecessarily
   id -4 class hdd  # do not change unnecessarily
   # weight 38.09564
   alg straw2
   hash 0  # rjenkins1
   item osd.0 weight 5.34769
   item osd.1 weight 5.45799
   item osd.2 weight 5.45799
   item osd.3 weight 5.45799
   item osd.4 weight 5.45799
   item osd.5 weight 5.45799
   item osd.6 weight 5.45799
}
host ceph_2 {
   id -5    # do not change unnecessarily
   id -6 class hdd  # do not change unnecessarily
   # weight 38.09564
   alg straw2
   hash 0  # rjenkins1
   item osd.7 weight 5.34769
   item osd.8 weight 5.45799
   item osd.9 weight 5.45799
   item osd.10 weight 5.45799
   item osd.11 weight 5.45799
   item osd.12 weight 5.45799
   item osd.13 weight 5.45799
}
host ceph_3 {
   id -7    # do not change unnecessarily
   id -8 class hdd  # do not change unnecessarily
   # weight 38.09564
   alg straw2
   hash 0  # rjenkins1
   item osd.14 weight 5.34769
   item osd.15 weight 5.45799
   item osd.16 weight 5.45799
   item osd.17 weight 5.45799
   item osd.18 weight 5.45799
   item osd.19 weight 5.45799
   item osd.20 weight 5.45799
}
root default {
   id -1    # do not change unnecessarily
   id -2 class hdd  # do not change unnecessarily
   # weight 114.28693
   alg straw2
   hash 0  # rjenkins1
   item ceph_1 weight 38.09564
   item ceph_2 weight 38.09564
   item ceph_3 weight 38.09564
}

# rules
rule replicated_rule {
   id 0
   type replicated
   step take default
   step chooseleaf firstn 0 type host
   step emit
}
rule my_replicated_rule {
   id 1
   type replicated
   step take default
   step chooseleaf firstn 0 type host
   step emit
}
rule my_ec_rule {
   id 2
   type erasure
   step set_chooseleaf_tries 5
   step set_choose_tries 100
   step take default
   step choose indep 3 type host
   step chooseleaf indep 2 type osd
   step emit
}

# end crush map
~~~

Finally I create a pool:

~~~
ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
ceph osd pool application enable my_meta_pool rbd
rbd pool init my_meta_pool
rbd pool init my_pool
rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool
--image-feature journaling
~~~

So all this is to have some VMs (oVirt VMs, for the record) with
automatic fall-over in the case of a Ceph Node loss - ie I was

[ceph-users] EC Profiles & DR

2023-12-04 Thread duluxoz

Hi All,

Looking for some help/explanation around erasure code pools, etc.

I set up a 3-node Ceph (Quincy) cluster with each box holding 7 OSDs 
(HDDs) and each box running Monitor, Manager, and iSCSI Gateway. For the 
record the cluster runs beautifully, without resource issues, etc.


I created an Erasure Code Profile, etc:

~~~
ceph osd erasure-code-profile set my_ec_profile plugin=jerasure k=4 m=2 
crush-failure-domain=osd

ceph osd crush rule create-erasure my_ec_rule my_ec_profile
ceph osd crush rule create-replicated my_replicated_rule default host
~~~

My Crush Map is:

~~~
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host ceph_1 {
  id -3    # do not change unnecessarily
  id -4 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.0 weight 5.34769
  item osd.1 weight 5.45799
  item osd.2 weight 5.45799
  item osd.3 weight 5.45799
  item osd.4 weight 5.45799
  item osd.5 weight 5.45799
  item osd.6 weight 5.45799
}
host ceph_2 {
  id -5    # do not change unnecessarily
  id -6 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.7 weight 5.34769
  item osd.8 weight 5.45799
  item osd.9 weight 5.45799
  item osd.10 weight 5.45799
  item osd.11 weight 5.45799
  item osd.12 weight 5.45799
  item osd.13 weight 5.45799
}
host ceph_3 {
  id -7    # do not change unnecessarily
  id -8 class hdd  # do not change unnecessarily
  # weight 38.09564
  alg straw2
  hash 0  # rjenkins1
  item osd.14 weight 5.34769
  item osd.15 weight 5.45799
  item osd.16 weight 5.45799
  item osd.17 weight 5.45799
  item osd.18 weight 5.45799
  item osd.19 weight 5.45799
  item osd.20 weight 5.45799
}
root default {
  id -1    # do not change unnecessarily
  id -2 class hdd  # do not change unnecessarily
  # weight 114.28693
  alg straw2
  hash 0  # rjenkins1
  item ceph_1 weight 38.09564
  item ceph_2 weight 38.09564
  item ceph_3 weight 38.09564
}

# rules
rule replicated_rule {
  id 0
  type replicated
  step take default
  step chooseleaf firstn 0 type host
  step emit
}
rule my_replicated_rule {
  id 1
  type replicated
  step take default
  step chooseleaf firstn 0 type host
  step emit
}
rule my_ec_rule {
  id 2
  type erasure
  step set_chooseleaf_tries 5
  step set_choose_tries 100
  step take default
  step choose indep 3 type host
  step chooseleaf indep 2 type osd
  step emit
}

# end crush map
~~~

Finally I create a pool:

~~~
ceph osd pool create my_pool 32 32 erasure my_ec_profile my_ec_rule
ceph osd pool application enable my_meta_pool rbd
rbd pool init my_meta_pool
rbd pool init my_pool
rbd create --size 16T my_pool/my_disk_1 --data-pool my_pool 
--image-feature journaling

~~~

So all this is to have some VMs (oVirt VMs, for the record) with 
automatic fall-over in the case of a Ceph Node loss - ie I was trying to 
"replicate" a 3-Disk RAID 5 array across the Ceph Nodes, so that I could 
loose a Node and still have a working set of VMs.


However, I took one of the Ceph Nodes down (gracefully) for some 
maintenance the other day and I lost *all* the VMs (ie oVirt complained 
that there was no active pool). As soon as I brought the down node back 
up everything was good again.


So my question is: What did I do wrong with my config?

Sound I, for example, change the EC Profile to `k=2, m=1`, but how is 
that practically different from `k=4, m=2` - yes, the later spreads the 
pool over more disks, but it should still only put 2 disks on each node, 
shouldn't it?


Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Quincy On Rocky 8.x - Upgrade To Rocky 9.1

2023-02-10 Thread duluxoz

Sorry, let me qualify things / try to make them simpler:

When upgrading from a Rocky Linux 8.6 Server running Ceph-Quincy to 
Rocky Linux 9.1 Server running Ceph-Quincy (ie an in-place upgrade of a 
host-node in an existing cluster):


- What is the update procedure?

- Can we use the "standard(?)" update methodology as per numerous blog 
posts available on-line?


- Is this procedure documented anywhere?

- Are there any special actions we need to be aware of?

- Are there any "gotchas", etc that we need to be aware of?

Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Quincy On Rocky 8.x - Upgrade To Rocky 9.1

2023-02-10 Thread duluxoz
As I said in the initial post the servers are currently Rocky v8.6. 
Obviously there's the migrate2rocky.sh and migrate2rocky9.sh scripts, 
but I was wondering if there was anything "special" that we need to do 
when running them with quincy ie any "gotchas"?  :-)


On 11/02/2023 16:43, Konstantin Shalygin wrote:
You are mentioned that your cluster is Quincy, the el9 package are 
also for Quincy. What exactly upgrade you mean?



k
Sent from my iPhone


On 11 Feb 2023, at 12:29, duluxoz  wrote:



That's great - thanks.

Any idea if there are any upgrade instructions? Any "gotchas", etc?

I mean, having the new rpm is great for a fresh install, but we were 
wanting to upgrade an existing cluster  :-)


Cheers

Dulux-Oz

On 11/02/2023 15:02, Konstantin Shalygin wrote:

Hi,

Seems packages el9_quincy are available [1]
You can try


k
[1] https://download.ceph.com/rpm-quincy/el9/x86_64/


On 10 Feb 2023, at 13:23, duluxoz  wrote:

Sorry if this was mentioned previously (I obviously missed it if it 
was) but can we upgrade a Ceph Quincy Host/Cluster from Rocky Linux 
(RHEL) v8.6/8.7 to v9.1 (yet), and if so, what is / where can I 
find the procedure to do this - ie is there anything "special" that 
needs to be done because of Ceph, or can we just do a "simple" v8.x 
+> v9.1 upgrade?



--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be 
distributed without the author’s consent.  You must not use or 
disclose the contents of this Email, or add the sender’s Email 
address to any database, list or mailing list unless you are 
expressly authorised to do so.  Unless otherwise stated, Peregrine 
I.T. Pty Ltd accepts no liability for the contents of this Email 
except where subsequently confirmed in writing.  The opinions 
expressed in this Email are those of the author and do not 
necessarily represent the views of Peregrine I.T. Pty Ltd.  This 
Email is confidential and may be subject to a claim of legal privilege.


If you have received this Email in error, please notify the author 
and delete this message immediately.



--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Quincy On Rocky 8.x - Upgrade To Rocky 9.1

2023-02-10 Thread duluxoz

That's great - thanks.

Any idea if there are any upgrade instructions? Any "gotchas", etc?

I mean, having the new rpm is great for a fresh install, but we were 
wanting to upgrade an existing cluster  :-)


Cheers

Dulux-Oz

On 11/02/2023 15:02, Konstantin Shalygin wrote:

Hi,

Seems packages el9_quincy are available [1]
You can try


k
[1] https://download.ceph.com/rpm-quincy/el9/x86_64/


On 10 Feb 2023, at 13:23, duluxoz  wrote:

Sorry if this was mentioned previously (I obviously missed it if it 
was) but can we upgrade a Ceph Quincy Host/Cluster from Rocky Linux 
(RHEL) v8.6/8.7 to v9.1 (yet), and if so, what is / where can I find 
the procedure to do this - ie is there anything "special" that needs 
to be done because of Ceph, or can we just do a "simple" v8.x +> v9.1 
upgrade?



--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Quincy On Rocky 8.x - Upgrade To Rocky 9.1

2023-02-09 Thread duluxoz

Hi All,

Sorry if this was mentioned previously (I obviously missed it if it was) 
but can we upgrade a Ceph Quincy Host/Cluster from Rocky Linux (RHEL) 
v8.6/8.7 to v9.1 (yet), and if so, what is / where can I find the 
procedure to do this - ie is there anything "special" that needs to be 
done because of Ceph, or can we just do a "simple" v8.x +> v9.1 upgrade?


Thanks in advance

Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mysterious HDD-Space Eating Issue

2023-01-17 Thread duluxoz

Hi Eneko,

Well, that's the thing: there are a whole bunch of ceph-guest-XX.log 
files in /var/log/caeh/; most of them are empty, a handful are up to 250 
Kb in size, and this one () keeps on growing - and where not sure where 
they're coming from (ie there's nothing that we can see in the conf files.


However, your "debug" comment has sparked a vague memory, so I'll check 
that out tomorrow (its after dinner here in Aus, so I'll tackle it first 
thing tomorrow - now that we've got our "hack" in place and we don't 
have to worry about the cluster in the short term  :-)


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mysterious HDD-Space Eating Issue

2023-01-16 Thread duluxoz

Hi All,

Thanks to Eneko Lacunza, E Taka, and Anthony D'Atri for replying - all 
that advice was really helpful.


So, we finally tracked down our "disk eating monster" (sort of). We've 
got a "runaway" ceph-guest-NN that is filling up its log file 
(/var/log/ceph/ceph-guest-NN.log) and eventually over-flowing the /var 
partition.


What we haven't been able to do yet is actually track-down the 
"ceph-guest-NN" process so we can kill it. Restarting the monitor 
service on that node "pauses" the offending process, but as soon the mon 
service restarts the relevant log file is re-created/begins to fill up 
again. What we've done in the mean time is put cron job in place to run 
once a day to delete the offending log file - that's keeping us online, 
but it's a less-than-optimal solution (ie it's a "hack").


Soo... anyone got any pointers as to how we can go about actually 
finding the offending process?


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Mysterious Disk-Space Eater

2023-01-11 Thread duluxoz

Hi All,

Got a funny one, which I'm hoping someone can help us with.

We've got three identical(?) Ceph Quincy Nodes running on Rocky Linux 
8.7. Each Node has 4 OSDs, plus Monitor, Manager, and iSCSI G/W services 
running on them (we're only a small shop). Each Node has a separate 16 
GiB partition mounted as /var. Everything is running well and the Ceph 
Cluster is handling things very well).


However, one of the Nodes (not the one currently acting as the Active 
Manager) is running out of space on /var. Normally, all of the Nodes 
have around 10% space used (via a df -H command), but the problem Node 
only takes 1 to 3 days to run out of space, hence taking it out of 
Quorum. Its currently at 85% and growing.


At first we thought this was caused by an overly large log file, but 
investigations showed that all the logs on all 3 Nodes were of 
comparable size. Also, searching for the 20 largest files on the problem 
Node's /var didn't produce any significant results.


Coincidentally, unrelated to this issue, the problem Node (but not the 
other 2 Nodes) was re-booted a couple of days ago and, when the Cluster 
had re-balanced itself and everything was back online and reporting as 
Healthy, the problem Node's /var was back down to around 10%, the same 
as the other two Nodes.


This lead us to suspect that there was some sort of "run-away" process 
or journaling/logging/temporary file(s) or whatever that the re-boot has 
"cleaned up". So we've been keeping an eye on things but we can't see 
anything causing the issue and now, as I said above, the problem Node's 
/var is back up to 85% and growing.


I've been looking at the log files, tying to determine the issue, but as 
I don't really know what I'm looking for I don't even know if I'm 
looking in the *correct* log files...


Obviously rebooting the problem Node every couple of days is not a 
viable option, and increasing the size of the /var partition is only 
going to postpone the issue, not resolve it. So if anyone has any ideas 
we'd love to hear about it - thanks


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 2-Layer CRUSH Map Rule?

2022-09-27 Thread duluxoz

Thank you Tyler,

That looks like exactly what I was looking for (now to test it in a test 
rig)  :-)


Cheers

Dulux-Oz

On 28/09/2022 07:16, Tyler Brekke wrote:

Hi Matthew,

You just have to take two steps when writing your crush rule. First 
you want to get 3 different hosts, then you need 2 osd from each host.


ceph osd getcrushmap -o /tmp/crush
crushtool -d /tmp/crush -o /tmp/crush.txt

#edit it / make new rule

rule custom-ec-ruleset {
        id 3
        type erasure
        min_size 4
        max_size 6
        step take your-root
        step choose indep 3 type host
        step chooseleaf indep 2 type osd
        step emit
}

crushtool -c /tmp/crush.txt -o /tmp/crush.new

You can use the crushtool to test the mappings and make sure they are 
working as you expect.


crushtool  -i /tmp/crush.new --test --show-mappings --rule 3 --num-rep 6

You can then compare the OSD id and make sure its exactly what you are 
looking for.


You can set the crushmap with

ceph osd setcrushmap -i /tmp/crush.new

Note: If you are overwriting your current rule, your data will need to 
rebalance as soon as your set the crushmap, close to 100% of your 
objects will move. If you create a new rule, you can set your pool to 
use the new pool id anytime you are ready.


On Sun, Sep 25, 2022 at 12:49 AM duluxoz  wrote:

Hi Everybody (Hi Dr. Nick),

TL/DR: Is is possible to have a "2-Layer" Crush Map?

I think it is (although I'm not sure about how to set it up).

My issue is that we're using 4-2 Erasure coding on our OSDs, with
7-OSDs
per OSD-Node (yes, the Cluster is handling things AOK - we're
running at
about 65-70% utilisation of CPU, RAM, etc, so no problem there).

However, we only have 3 OSD-Nodes, and I'd like to ensure that
each Node
has 2 of each pool's OSDs so that if we loose a Node the other 2 can
take up the slack. I know(?) that with 4-2 EC we can loose 2 out
of the
6 OSDs, but I'm worried that if we loose a Node it'll take more
than 2
OSDs with it, rendering us "stuffed" (stuffed: a technical term
which is
used as a substitute for a four-letter word rhyming with "truck") 

Anyone have any pointers?

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 2-Layer CRUSH Map Rule?

2022-09-25 Thread duluxoz

Hi Everybody (Hi Dr. Nick),

TL/DR: Is is possible to have a "2-Layer" Crush Map?

I think it is (although I'm not sure about how to set it up).

My issue is that we're using 4-2 Erasure coding on our OSDs, with 7-OSDs 
per OSD-Node (yes, the Cluster is handling things AOK - we're running at 
about 65-70% utilisation of CPU, RAM, etc, so no problem there).


However, we only have 3 OSD-Nodes, and I'd like to ensure that each Node 
has 2 of each pool's OSDs so that if we loose a Node the other 2 can 
take up the slack. I know(?) that with 4-2 EC we can loose 2 out of the 
6 OSDs, but I'm worried that if we loose a Node it'll take more than 2 
OSDs with it, rendering us "stuffed" (stuffed: a technical term which is 
used as a substitute for a four-letter word rhyming with "truck") 


Anyone have any pointers?

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Quince Not Enabling `diskprediction-local` - RESOLVED

2022-09-21 Thread duluxoz

Hi Everybody (Hi Dr. Nick),

A I've just figured it out - it should have been an underscore 
(`_`) not a dash (`-`) in `ceph mgr module enable diskprediction_local`


"Sorry about that Chief"

And sorry for the double-post (damn email client).

Cheers

Dulux-Oz


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Quince Not Enabling `diskprediction-local` - Help Please

2022-09-21 Thread duluxoz

Hi Everybody (Hi Dr. Nick),

So, I'm trying to get my Ceph Quincy Cluster to recognise/interact with 
the "diskprediction-local" manager module.


I have the "SMARTMon Tools" and the "ceph-mgr-diskprediction-local" 
package installed on all of the relevant nodes.


Whenever I attempt to enable the local disk-predicion (ie `ceph mgr 
module enable diskprediction-local`) I receive the following error:


`Error ENOENT: all mgr daemons do not support module 
'diskprediction-local', pass --force to force enablement`


Anyone have any idea what I'm doing wrong.

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Quince Not Enabling `diskprediction-local` - Help Please

2022-09-21 Thread duluxoz

Hi Everybody (Hi Dr. Nick),

So, I'm trying to get my Ceph Quincy Cluster to recognise/interact with 
the "diskprediction-local" manager module.


I have the "SMARTMon Tools" and the "ceph-mgr-diskprediction-local" 
package installed on all of the relevant nodes.


Whenever I attempt to enable the local disk-predicion (ie `ceph mgr 
module enable diskprediction-local`) I receive the following error:


`Error ENOENT: all mgr daemons do not support module 
'diskprediction-local', pass --force to force enablement`


Anyone have any idea what I'm doing wrong.

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph iSCSI & oVirt

2022-09-21 Thread duluxoz

Hi Everybody (Hi Dr. Nick),

I'm attacking this issue from both ends (ie from the Ceph-end and from 
the oVirt-end - I've posted questions on both mailing lists to ensure we 
capture the required knowledge-bearer(s)).


We've got a Ceph Cluster set up with three iSCSI Gateways configured, 
and we want to use this Cluster as the back-end storage for an oVirt 
Cluster.  *Somewhere* in the oVirt documentation I read that when using 
oVirt with multiple iSCSI paths (in my case, multiple Ceph iSCSI 
Gateways) we need to set up DM Multipath.


My question is: Has anyone done what we're trying to do, and if so are 
there any "gotchas" we should be aware of


Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph iSCSI rbd-target.api Failed to Load

2022-09-09 Thread duluxoz

Hi Guys,

So, I finally got things sorted :-)

Time to eat some crow-pie :-P

Turns out I had two issues, both of which involved typos (don't they 
always?).


The first was I had transposed two digits of an IP Address in the 
`iscsi-gateway.cfg` -> `trusted_ip_list`.


The second was that I had called the `iscsi-gateway.cfg` file 
`isci-gateway.cfg`.


DOH!

Thanks for all your help - if I hadn't had a couple of people to bounce 
ideas off and point out the blindingly obvious (to confirm I wasn't 
going crazy) then I don;t think I would have found these errors so quickly


Thank you!

Cheers

Dulux-Oz

On 10/09/2022 00:40, Bailey Allison wrote:

Hi Matt,

No problem, looking at the output of gwcli -d there it looks like it's having 
issues getting the api endpoint, are you able to try running:

curl --user admin:admin -X GET http://X.X.X.X:5000/api

or

curl http://X.X.X.X:5000/api

Replacing the IP address with the node hosting the iSCSI gateway?

It should spit out a bunch of stuff, but it would at least let us know if the 
api itself is listening or not.

Also here's the output of gwcli -d from our cluster to compare:

root@ubuntu-gw01:~# gwcli -d
Adding ceph cluster 'ceph' to the UI
Fetching ceph osd information
Querying ceph for state information
Refreshing disk information from the config object
- Scanning will use 8 scan threads
- rbd image scan complete: 0s
Refreshing gateway & client information
- checking iSCSI/API ports on ubuntu-gw01
- checking iSCSI/API ports on ubuntu-gw02

1 gateway is inaccessible - updates will be disabled
Querying ceph for state information
Gathering pool stats for cluster 'ceph'

Regards,

Bailey

-Original Message-
From: duluxoz 
Sent: September 9, 2022 4:11 AM
To: Bailey Allison ; ceph-users@ceph.io
Subject: [ceph-users] Re: Ceph iSCSI rbd-target.api Failed to Load

Hi Bailey,

Sorry for the delay in getting back to you (I had a few non-related issues to 
resolve) - and thanks for replying.

The results from `gwcli -d`:

~~~
Adding ceph cluster 'ceph' to the UI
Fetching ceph osd information
Querying ceph for state information
REST API failure, code : 500
Unable to access the configuration object Traceback (most recent call last):
File "/usr/bin/gwcli", line 194, in 
  main()
File "/usr/bin/gwcli", line 108, in main
  "({})".format(settings.config.api_endpoint))
AttributeError: 'Settings' object has no attribute 'api_endpoint'
~~~

Checked all of the other things you mentioned: all good.

Any further ideas?

Cheers

On 08/09/2022 10:08, Bailey Allison wrote:

Hi Dulux-oz,

Are you able to share the output of "gwcli -d" from your iSCSI node?

Just a few things I can think to check off the top of my head, is port 5000 
accessible/opened on the node running iSCSI?

I think by default the API tries to listen/use a pool called rbd, so does your 
cluster have a pool named that? It looks like it does based on your logs but 
something to check anyways, otherwise I believe you can change the pool it uses 
within the iscsi-gateway.cfg file though.

If there's any blocklisted OSDs on the node you're running iSCSI on it will 
also prevent rbd-target-api from starting I have found from experience, but 
again per your logs it looks like there isn't any.

Just in case it might help I've also attached an iscsi-gateway-cfg file from a 
cluster we've got with it working here:

# This is seed configuration used by the ceph_iscsi_config modules #
when handling configuration tasks for iscsi gateway(s) # # Please do
not change this file directly since it is managed by Ansible and will
be overwritten [config] api_password = admin api_port = 5000 # API
settings.
# The API supports a number of options that allow you to tailor it to
your # local environment. If you want to run the API under https, you
will need to # create cert/key files that are compatible for each
iSCSI gateway node, that is # not locked to a specific node. SSL cert
and key files *must* be called # 'iscsi-gateway.crt' and
'iscsi-gateway.key' and placed in the '/etc/ceph/' directory # on *each* 
gateway node. With the SSL files in place, you can use 'api_secure = true'
# to switch to https mode.
# To support the API, the bear minimum settings are:
api_secure = False
# Optional settings related to the CLI/API service api_user = admin
cluster_name = ceph loop_delay = 1 pool = rbd trusted_ip_list =
X.X.X.X,X.X.X.X,X.X.X.X,X.X.X.X

-Original Message-
From: duluxoz 
Sent: September 7, 2022 6:38 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Ceph iSCSI rbd-target.api Failed to Load

Hi All,

I've followed the instructions on the CEPH Doco website on Configuring the 
iSCSI Target. Everything went AOK up to the point where I try to start the 
rbd-target-api service, which fails (the rbd-target-gw service started OK).

A `systemctl status rbd-target-api` gives:

~~~
rbd-target-api.service - Ceph iscsi target configuration API
  Loaded: loaded 

[ceph-users] Re: Ceph iSCSI rbd-target.api Failed to Load

2022-09-09 Thread duluxoz

Hi Bailey,

Sorry for the delay in getting back to you (I had a few non-related 
issues to resolve) - and thanks for replying.


The results from `gwcli -d`:

~~~
Adding ceph cluster 'ceph' to the UI
Fetching ceph osd information
Querying ceph for state information
REST API failure, code : 500
Unable to access the configuration object
Traceback (most recent call last):
  File "/usr/bin/gwcli", line 194, in 
    main()
  File "/usr/bin/gwcli", line 108, in main
    "({})".format(settings.config.api_endpoint))
AttributeError: 'Settings' object has no attribute 'api_endpoint'
~~~

Checked all of the other things you mentioned: all good.

Any further ideas?

Cheers

On 08/09/2022 10:08, Bailey Allison wrote:

Hi Dulux-oz,

Are you able to share the output of "gwcli -d" from your iSCSI node?

Just a few things I can think to check off the top of my head, is port 5000 
accessible/opened on the node running iSCSI?

I think by default the API tries to listen/use a pool called rbd, so does your 
cluster have a pool named that? It looks like it does based on your logs but 
something to check anyways, otherwise I believe you can change the pool it uses 
within the iscsi-gateway.cfg file though.

If there's any blocklisted OSDs on the node you're running iSCSI on it will 
also prevent rbd-target-api from starting I have found from experience, but 
again per your logs it looks like there isn't any.

Just in case it might help I've also attached an iscsi-gateway-cfg file from a 
cluster we've got with it working here:

# This is seed configuration used by the ceph_iscsi_config modules
# when handling configuration tasks for iscsi gateway(s)
#
# Please do not change this file directly since it is managed by Ansible and 
will be overwritten
[config]
api_password = admin
api_port = 5000
# API settings.
# The API supports a number of options that allow you to tailor it to your
# local environment. If you want to run the API under https, you will need to
# create cert/key files that are compatible for each iSCSI gateway node, that is
# not locked to a specific node. SSL cert and key files *must* be called
# 'iscsi-gateway.crt' and 'iscsi-gateway.key' and placed in the '/etc/ceph/' 
directory
# on *each* gateway node. With the SSL files in place, you can use 'api_secure 
= true'
# to switch to https mode.
# To support the API, the bear minimum settings are:
api_secure = False
# Optional settings related to the CLI/API service
api_user = admin
cluster_name = ceph
loop_delay = 1
pool = rbd
trusted_ip_list = X.X.X.X,X.X.X.X,X.X.X.X,X.X.X.X

-Original Message-
From: duluxoz 
Sent: September 7, 2022 6:38 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Ceph iSCSI rbd-target.api Failed to Load

Hi All,

I've followed the instructions on the CEPH Doco website on Configuring the 
iSCSI Target. Everything went AOK up to the point where I try to start the 
rbd-target-api service, which fails (the rbd-target-gw service started OK).

A `systemctl status rbd-target-api` gives:

~~~
rbd-target-api.service - Ceph iscsi target configuration API
 Loaded: loaded (/usr/lib/systemd/system/rbd-target-api.service;
enabled; vendor preset: disabled)
 Active: failed (Result: exit-code) since Wed 2022-09-07 18:07:26 AEST; 1h 
5min ago
Process: 32547 ExecStart=/usr/bin/rbd-target-api (code=exited, status=16)
   Main PID: 32547 (code=exited, status=16)

Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]:
rbd-target-api.service: Start request repeated too quickly.
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]:
rbd-target-api.service: Failed with result 'exit-code'.
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]: Failed to start Ceph 
iscsi target configuration API.
~~~

A `journalctl -xe` gives:

~~~
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]:
rbd-target-api.service: Start request repeated too quickly.
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]:
rbd-target-api.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The unit rbd-target-api.service has entered the 'failed' state with result 
'exit-code'.
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]: Failed to start Ceph 
iscsi target configuration API.
-- Subject: Unit rbd-target-api.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit rbd-target-api.service has failed.
--
-- The result is failed.
~~~

The `rbd-target-api.log` gives:

~~~
2022-09-07 19:19:01,084DEBUG [common.py:141:_open_ioctx()] -
(_open_ioctx) Opening connection to rbd pool
2022-09-07 19:19:01,086DEBUG [common.py:148:_open_ioctx()] -
(_open_ioctx) connection opened
2022-09-07 19:19:01,105DEBUG [common.py:438:init_config()] -
(init_config) using pre existing config object
2022-09-07 19:19:01,105DEBUG [common.py:141:_open_ioctx()] -
(_open_ioctx) Opening

[ceph-users] Ceph iSCSI rbd-target.api Failed to Load

2022-09-07 Thread duluxoz

Hi All,

I've followed the instructions on the CEPH Doco website on Configuring 
the iSCSI Target. Everything went AOK up to the point where I try to 
start the rbd-target-api service, which fails (the rbd-target-gw service 
started OK).


A `systemctl status rbd-target-api` gives:

~~~
rbd-target-api.service - Ceph iscsi target configuration API
   Loaded: loaded (/usr/lib/systemd/system/rbd-target-api.service; 
enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2022-09-07 18:07:26 
AEST; 1h 5min ago

  Process: 32547 ExecStart=/usr/bin/rbd-target-api (code=exited, status=16)
 Main PID: 32547 (code=exited, status=16)

Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]: 
rbd-target-api.service: Start request repeated too quickly.
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]: 
rbd-target-api.service: Failed with result 'exit-code'.
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]: Failed to start 
Ceph iscsi target configuration API.

~~~

A `journalctl -xe` gives:

~~~
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]: 
rbd-target-api.service: Start request repeated too quickly.
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]: 
rbd-target-api.service: Failed with result 'exit-code'.

-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The unit rbd-target-api.service has entered the 'failed' state with 
result 'exit-code'.
Sep 07 19:19:03 ceph-host1.mydomain.local systemd[1]: Failed to start 
Ceph iscsi target configuration API.

-- Subject: Unit rbd-target-api.service has failed
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit rbd-target-api.service has failed.
--
-- The result is failed.
~~~

The `rbd-target-api.log` gives:

~~~
2022-09-07 19:19:01,084DEBUG [common.py:141:_open_ioctx()] - 
(_open_ioctx) Opening connection to rbd pool
2022-09-07 19:19:01,086DEBUG [common.py:148:_open_ioctx()] - 
(_open_ioctx) connection opened
2022-09-07 19:19:01,105DEBUG [common.py:438:init_config()] - 
(init_config) using pre existing config object
2022-09-07 19:19:01,105DEBUG [common.py:141:_open_ioctx()] - 
(_open_ioctx) Opening connection to rbd pool
2022-09-07 19:19:01,105DEBUG [common.py:148:_open_ioctx()] - 
(_open_ioctx) connection opened
2022-09-07 19:19:01,106DEBUG [common.py:120:_read_config_object()] - 
_read_config_object reading the config object
2022-09-07 19:19:01,107DEBUG [common.py:170:_get_ceph_config()] - 
(_get_rbd_config) config object contains 'b'{\n"created": "2022/09/07 
07:25:58",\n"discovery_auth": {\n"mutual_password": 
"",\n"mutual_password_encryption_enabled": false,\n"mutual_username": 
"",\n"password": "",\n"password_encryption_enabled": false,\n"username": 
""\n},\n"disks": {},\n"epoch": 0,\n"gateways": {},\n"targets": 
{},\n"updated": "",\n"version": 11\n}''
2022-09-07 19:19:01,107 INFO [rbd-target-api:2810:run()] - Started the 
configuration object watcher
2022-09-07 19:19:01,107 INFO [rbd-target-api:2812:run()] - Checking for 
config object changes every 1s
2022-09-07 19:19:01,109 INFO [gateway.py:82:osd_blocklist_cleanup()] - 
Processing osd blocklist entries for this node
2022-09-07 19:19:01,497 INFO [gateway.py:140:osd_blocklist_cleanup()] - 
No OSD blocklist entries found
2022-09-07 19:19:01,497 INFO [gateway.py:250:define()] - Reading the 
configuration object to update local LIO configuration
2022-09-07 19:19:01,497 INFO [gateway.py:261:define()] - Configuration 
does not have an entry for this host(ceph-host1.mydomain.local) - 
nothing to define to LIO
2022-09-07 19:19:01,507 CRITICAL [rbd-target-api:2942:main()] - Secure 
API requested but the crt/key files missing/incompatible?
2022-09-07 19:19:01,508 CRITICAL [rbd-target-api:2944:main()] - Unable 
to start
2022-09-07 19:19:01,956DEBUG [common.py:141:_open_ioctx()] - 
(_open_ioctx) Opening connection to rbd pool
2022-09-07 19:19:01,958DEBUG [common.py:148:_open_ioctx()] - 
(_open_ioctx) connection opened
2022-09-07 19:19:01,976DEBUG [common.py:438:init_config()] - 
(init_config) using pre existing config object
2022-09-07 19:19:01,976DEBUG [common.py:141:_open_ioctx()] - 
(_open_ioctx) Opening connection to rbd pool
2022-09-07 19:19:01,976DEBUG [common.py:148:_open_ioctx()] - 
(_open_ioctx) connection opened
2022-09-07 19:19:01,977DEBUG [common.py:120:_read_config_object()] - 
_read_config_object reading the config object
2022-09-07 19:19:01,978DEBUG [common.py:170:_get_ceph_config()] - 
(_get_rbd_config) config object contains 'b'{\n"created": "2022/09/07 
07:25:58",\n"discovery_auth": {\n"mutual_password": 
"",\n"mutual_password_encryption_enabled": false,\n"mutual_username": 
"",\n"password": "",\n"password_encryption_enabled": false,\n"username": 
""\n},\n"disks": {},\n"epoch": 0,\n"gateways": {},\n"targets": 
{},\n"updated": "",\n"version": 11\n}''
2022-09-07 19:19:01,979 INFO [rbd-target-api:2810:run()] - Started the 
configuration object 

[ceph-users] ceph -s command hangs with an authentication timeout

2022-08-06 Thread duluxoz

Hi All,

So, I've been researching this for days (including reading this 
mailing-list), and I've had no luck what-so-ever in resolving my issue. 
I'm hoping someone here can point be in the correct direction.


This is a brand new (physical) machine, and I've followed the Manual 
Deployment instructions from the Ceph Documentation 
(https://docs.ceph.com/en/latest/install/manual-deployment/) to the 
letter, except I'm using a different UUID and a different IP Address. 
I've even used the sample ceph.conf file (with the proceeding changes, 
of course).


My issue is when I reach Step 19 (sudo ceph -s) the command hangs for 
300 seconds and then returns  'monclient(hunting): authenticate timed 
out after 300'.


This is using Ceph Quincy on el8.

Doing a 'ps aux | grep ceph' tells me that ceph-mon is running, as does 
the logs.


As far as I can tell every file and directory had the correct owner 
(ceph:ceph) and correct file permissions - and even the correct firewall 
settings (although as I'm running on this machine's console directly the 
firewall shouldn't be an issue).


So if the "official" documentation is throwing this error, can someone 
please tell me what's wrong?


Thanks in advance

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: New Issue - Mapping Block Devices

2021-03-23 Thread duluxoz

Hi Ilya,

OK, so I've updated the my-id permissions to include 'profile rbd 
pool=my-pool-data'.


Yes, "rbd device map" does succeed (both before and after the my-id update).

The full dmesg form the "rbd device map" command is:

[18538.539416] libceph: mon0 (1):6789 session established
[18538.554143] libceph: client25428 fsid 
[18538.615761] rbd: rbd0: capacity 1099511627776 features 0xbd

The full dmesg form the fdisk command is (which seems to have worked now 
that I've updated the my-id auth):


[18770.784126]  rbd0: p1

There is no dmesg from the mount command. The mount command itself gives:

mount: /my-rbd-bloc-device: special device /dev/rbd0p1 does not exist 
(same as before I updated my-id)


Cheers

Matthew J


On 23/03/2021 17:34, Ilya Dryomov wrote:

On Tue, Mar 23, 2021 at 6:13 AM duluxoz  wrote:

Hi All,

I've got a new issue (hopefully this one will be the last).

I have a working Ceph (Octopus) cluster with a replicated pool
(my-pool), an erasure-coded pool (my-pool-data), and an image (my-image)
created - all *seems* to be working correctly. I also have the correct
Keyring specified (ceph.client.my-id.keyring).

ceph -s is reporting all healthy.

The ec profile (my-ec-profile) was created with: ceph osd
erasure-code-profile set my-ec-profile k=4 m=2 crush-failure-domain=host

The replicated pool was created with: ceph osd pool create my-pool 100
100 replicated

Followed by: rbd pool init my-pool

The ec pool was created with: ceph osd pool create my-pool-data 100 100
erasure my-ec-profile --autoscale-mode=on

Followed by: rbd pool init my-pool-data

The image was created with: rbd create -s 1T --data-pool my-pool-data
my-pool/my-image

The Keyring was created with: ceph auth get-or-create client.my-id mon
'profile rbd' osd 'profile rbd pool=my-pool' mgr 'profile rbd
pool=my-pool' -o /etc/ceph/ceph.client.my-id.keyring

Hi Matthew,

If you are using a separate data pool, you need to give "my-id" access
to it:

   osd 'profile rbd pool=my-pool, profile rbd pool=my-pool-data'


On a centos8 client machine I have installed ceph-common, placed the
Keyring file into /etc/ceph/, and run the command: rbd device map
my-pool/my-image --id my-id

Does "rbd device map" actually succeed?  Can you attach dmesg from that
client machine from when you (attempted to) map, ran fdisk, etc?

Thanks,

 Ilya

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] New Issue - Mapping Block Devices

2021-03-22 Thread duluxoz

Hi All,

I've got a new issue (hopefully this one will be the last).

I have a working Ceph (Octopus) cluster with a replicated pool 
(my-pool), an erasure-coded pool (my-pool-data), and an image (my-image) 
created - all *seems* to be working correctly. I also have the correct 
Keyring specified (ceph.client.my-id.keyring).


ceph -s is reporting all healthy.

The ec profile (my-ec-profile) was created with: ceph osd 
erasure-code-profile set my-ec-profile k=4 m=2 crush-failure-domain=host


The replicated pool was created with: ceph osd pool create my-pool 100 
100 replicated


Followed by: rbd pool init my-pool

The ec pool was created with: ceph osd pool create my-pool-data 100 100 
erasure my-ec-profile --autoscale-mode=on


Followed by: rbd pool init my-pool-data

The image was created with: rbd create -s 1T --data-pool my-pool-data 
my-pool/my-image


The Keyring was created with: ceph auth get-or-create client.my-id mon 
'profile rbd' osd 'profile rbd pool=my-pool' mgr 'profile rbd 
pool=my-pool' -o /etc/ceph/ceph.client.my-id.keyring


On a centos8 client machine I have installed ceph-common, placed the 
Keyring file into /etc/ceph/, and run the command: rbd device map 
my-pool/my-image --id my-id


All *seems* AOK.

However - and here's my issue - when I try to create a partition on 
/dev/rbd0 and/or try to mount it, the client reports: fdisk: cannot open 
/dev/rbd0: Input/output error  OR mount: /my-rbd-bloc-device: special 
device /dev/rbd0 does not exist (respectively).


What am I doing wrong?

Thanks in advance for the help

Matthew J

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net 
Web:www.peregrineit.net 

View Matthew J BLACK's profile on LinkedIn 



This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Cluster Taking An Awful Long Time To Rebalance

2021-03-16 Thread duluxoz

Yeap - that was the issue: an incorrect CRUSH rule

Thanks for the help

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Erasure-coded Block Device Image Creation With qemu-img - Help

2021-03-16 Thread duluxoz

Hi Guys,

So, new issue (I'm gonna get the hang of this if it kills me :-) ).

I have a working/healthy Ceph (Octopus) Cluster (with qemu-img, libvert, 
etc, installed), and an erasure-coded pool called "my_pool". I now need 
to create a "my_data" image within the "my_pool" pool. As this is for a 
KVM host / block device (hence the qemu-img et.al.) I'm attempting to 
use qemu-img, so the command I am using is:


```

qemu-img create -f rbd rbd:my_pool/my_data 1T

```

The error message I received was:

```

qemu-img: rbd:my_pool/my_data: error rbd create: Operation not supported

```

So, I tried the 'raw' rbd command:

```

rbd create -s 1T my_pool/my_data

```

and got the error:

```

_add_image_to_directory: error adding image to directory: (95) Operation 
not supported

rbd: create error: (95) Operation not supported

```

So I don't believe the issue is with the 'qemu-img' command - but I may 
be wrong.


After doing some research I *think* I need to specify a replicated (as 
opposed to erasure-coded) pool for my_pool's metadata (eg 
'my_pool_metadata'), and thus use the command:


```

rbd create -s 1T --data-pool my_pool my_pool_metadata/my_data

```

First Question: Is this correct?

Second Question: What is the qemu-img equivalent command - is it:

```

qemu-img create -f rbd rbd:--data-pool my_pool my_pool_metadata/my_data 1T

```

or something similar?

Thanks in advance

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Cluster Taking An Awful Long Time To Rebalance

2021-03-16 Thread duluxoz

Ah, right, that makes sense - I'll have a go at that

Thank you

On 16/03/2021 19:12, Janne Johansson wrote:

  pgs: 88.889% pgs not active
   6/21 objects misplaced (28.571%)
   256 creating+incomplete

For new clusters, "creating+incomplete" sounds like you created a pool
(with 256 PGs) with some crush rule that doesn't allow it to find
suitable placements, like "replication = 3" and "failure domain =
host" but only having 2 hosts, or something to that effect. Unless you
add hosts (in my example), this will not "fix itself" until you either
add hosts, or change the crush rules to something less reliable.


--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net 
Web:www.peregrineit.net 

View Matthew J BLACK's profile on LinkedIn 



This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Cluster Taking An Awful Long Time To Rebalance

2021-03-15 Thread duluxoz
OK, so I set autoscaling to off for all five pools, and the "ceph -s" 
has not changed:


~~~

 cluster:
    id: [REDACTED]
    health: HEALTH_WARN
    Reduced data availability: 256 pgs inactive, 256 pgs incomplete
    Degraded data redundancy: 12 pgs undersized

  services:
    mon: 1 daemons, quorum [REDACTED] (age 23h)
    mgr: [REDACTED](active, since 23h)
    osd: 7 osds: 7 up (since 22h), 7 in (since 22h); 32 remapped pgs

  data:
    pools:   5 pools, 288 pgs
    objects: 7 objects, 0 B
    usage:   7.1 GiB used, 38 TiB / 38 TiB avail
    pgs: 88.889% pgs not active
 6/21 objects misplaced (28.571%)
 256 creating+incomplete
 18  active+clean
 12  active+undersized+remapped
 2   active+clean+remapped

  progress:
    Rebalancing after osd.1 marked in (23h)
  []
    PG autoscaler decreasing pool 1 PGs from 32 to 1 (21h)
  []

~~~

Any ideas - or is this normal ie does this normally take this long?

(I'm wondering if I shouldn't tear down the cluster and start again?)

Cheers

Matthew J

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Cluster Taking An Awful Long Time To Rebalance

2021-03-15 Thread duluxoz
Thanks for that - I'll disable the autoscaler on all 5 pools and see 
what happens


Cheers

Matthew J

On 16/03/2021 13:53, ash...@amerrick.co.uk wrote:

I think your issue is you also have the PG scaler trying to change to 1PG.

Due to the size of the OSD/no data it thinks you only need 1PG, I 
would suggest disabling the PG Auto Scaler on small test clusters.


Thanks
On 16 Mar 2021, 10:50 +0800, duluxoz , wrote:

Hi Guys,


Is the below "ceph -s" normal?


This is a brand new cluster with (at the moment) a single Monitor
and 7

OSDs (each 6 GiB) that has no data in it (yet), and yet its taking

almost a day to "heal itself" from adding in the 2nd OSD.


~~~


  cluster:

    id:     [REDACTED]

    health: HEALTH_WARN

            Reduced data availability: 256 pgs inactive, 256 pgs
incomplete

            Degraded data redundancy: 12 pgs undersized


  services:

    mon: 1 daemons, quorum [REDACTED] (age 22h)

    mgr: [REDACTED](active, since 22h)

    osd: 7 osds: 7 up (since 21h), 7 in (since 21h); 32 remapped pgs


  data:

    pools:   5 pools, 288 pgs

    objects: 7 objects, 0 B

    usage:   7.1 GiB used, 38 TiB / 38 TiB avail

    pgs:     88.889% pgs not active

             6/21 objects misplaced (28.571%)

             256 creating+incomplete

             18  active+clean

             12  active+undersized+remapped

             2   active+clean+remapped


  progress:

    Rebalancing after osd.1 marked in (22h)

      []

    PG autoscaler decreasing pool 1 PGs from 32 to 1 (19h)

      []


~~~


Thanks in advance


Matthew J


--

Peregrine IT Signature


*Matthew J BLACK*

  M.Inf.Tech.(Data Comms)

  MBA

  B.Sc.

  MACS (Snr), CP, IP3P


When you want it done /right/ ‒ the first time!


Phone: +61 4 0411 0089

Email: matt...@peregrineit.net <mailto:matt...@peregrineit.net>

Web: www.peregrineit.net <http://www.peregrineit.net>


View Matthew J BLACK's profile on LinkedIn

<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to

that intended by the author at the time and it is not to be
distributed

without the author’s consent.  You must not use or disclose the
contents

of this Email, or add the sender’s Email address to any database, list

or mailing list unless you are expressly authorised to do so.  Unless

otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the

contents of this Email except where subsequently confirmed in

writing.  The opinions expressed in this Email are those of the author

and do not necessarily represent the views of Peregrine I.T. Pty

Ltd.  This Email is confidential and may be subject to a claim of
legal

privilege.


If you have received this Email in error, please notify the author and

delete this message immediately.


___

ceph-users mailing list -- ceph-users@ceph.io

To unsubscribe send an email to ceph-users-le...@ceph.io


--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net <mailto:matt...@peregrineit.net>
Web:www.peregrineit.net <http://www.peregrineit.net>

View Matthew J BLACK's profile on LinkedIn 
<http://au.linkedin.com/in/mjblack>


This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Cluster Taking An Awful Long Time To Rebalance

2021-03-15 Thread duluxoz

Hi Guys,

Is the below "ceph -s" normal?

This is a brand new cluster with (at the moment) a single Monitor and 7 
OSDs (each 6 GiB) that has no data in it (yet), and yet its taking 
almost a day to "heal itself" from adding in the 2nd OSD.


~~~

  cluster:
    id: [REDACTED]
    health: HEALTH_WARN
    Reduced data availability: 256 pgs inactive, 256 pgs incomplete
    Degraded data redundancy: 12 pgs undersized

  services:
    mon: 1 daemons, quorum [REDACTED] (age 22h)
    mgr: [REDACTED](active, since 22h)
    osd: 7 osds: 7 up (since 21h), 7 in (since 21h); 32 remapped pgs

  data:
    pools:   5 pools, 288 pgs
    objects: 7 objects, 0 B
    usage:   7.1 GiB used, 38 TiB / 38 TiB avail
    pgs: 88.889% pgs not active
 6/21 objects misplaced (28.571%)
 256 creating+incomplete
 18  active+clean
 12  active+undersized+remapped
 2   active+clean+remapped

  progress:
    Rebalancing after osd.1 marked in (22h)
  []
    PG autoscaler decreasing pool 1 PGs from 32 to 1 (19h)
  []

~~~

Thanks in advance

Matthew J

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net 
Web:www.peregrineit.net 

View Matthew J BLACK's profile on LinkedIn 



This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Newbie Help With ceph-mgr

2021-02-25 Thread duluxoz

Hi All,

My ceph-mgr keeps stopping (for some unknown reason) after about an hour 
or so (but has run for up to 2-3 hours before stopping). Up till now 
I've simple restarted it with 'ceph-mgr -i ceph01'.


Is this normal behaviour, or if it isn't, what should I be looking for 
in the logs?


I was thinking of writing a quick cron script (with 'ceph-mgr -i 
ceph01') to run on the hour every hour to restart it, but figured that 
there had to be a better way - especially if ceph-mgr was crashing 
instead of being a "feature". Any ideas/advice?


Thanks in advance

Dulux-Oz

--
Peregrine IT Signature

*Matthew J BLACK*
  M.Inf.Tech.(Data Comms)
  MBA
  B.Sc.
  MACS (Snr), CP, IP3P

When you want it done /right/ ‒ the first time!

Phone:  +61 4 0411 0089
Email:  matt...@peregrineit.net 
Web:www.peregrineit.net 

View Matthew J BLACK's profile on LinkedIn 



This Email is intended only for the addressee.  Its use is limited to 
that intended by the author at the time and it is not to be distributed 
without the author’s consent.  You must not use or disclose the contents 
of this Email, or add the sender’s Email address to any database, list 
or mailing list unless you are expressly authorised to do so.  Unless 
otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the 
contents of this Email except where subsequently confirmed in 
writing.  The opinions expressed in this Email are those of the author 
and do not necessarily represent the views of Peregrine I.T. Pty 
Ltd.  This Email is confidential and may be subject to a claim of legal 
privilege.


If you have received this Email in error, please notify the author and 
delete this message immediately.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Newbie Requesting Help - Please, This Is Driving Me Mad/Crazy! - A Follow Up

2021-02-25 Thread duluxoz

Hi Everyone,

Thanks to all for both the online and PM help - once it was pointed out 
that the existing (Octopus) Documentation was... less than current I 
ended up using the ceph-volume command.


A couple of follow-up questions:

When using ceph-volume lvm create:

1. Can you specify an osd number, or are you stuck with the system
   assigned one?
2. How do you use the command with only part of a HDD - ie something
   along the lines of 'ceph-volume lvm create --data /dev/sda4'
   (because sda1-3 contains the os/boot/swap systems)?
3. Where the hell do the drives get mapped/mounted/whatever too?

Thanks for the help

Regards

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Newbie Requesting Help - Please, This Is Driving Me Mad/Crazy!

2021-02-24 Thread duluxoz
Yes, the OSD Key is in the correct folder (or, at least, I think it is). 
The line in the steps I did is:


sudo -u ceph ceph auth get-or-create osd.0 osd 'allow *' mon 'allow 
profile osd' mgr 'allow profile osd' -o /var/lib/ceph/osd/ceph-0/keyring

This places the osd-0 key in the file 'keyring' in the 
'var/lib/ceph/osd/ceph-0' folder.


Now, I *assume* (ie made an @ss out of... well, me) that this is the 
correct location for that key (based on my understanding of the Ceph 
Doco), but obviously, I could be wrong.


And as far as the start-up and shutdown-log is concerned: there ain't 
none - or at least, I can't find them (unless you mean the 'systemctl 
start' log, etc?)


Any other ideas :-)

Cheers

Dulux-Oz

On 25/02/2021 07:04, Frank Schilder wrote:

I'm not running octupus and I don't use the hard-core bare metal deployment 
method. I use ceph-volume and things work smoothly. Hence, my input might be 
useless.

Now looking at your text, you should always include the start-up and shut-down 
log of the OSD. As a wild guess, did you copy the OSD auth key to the required 
directory? Its somewhere in the instructions and I can't seem to find the copy 
command in your description.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: matt...@peregrineit.net 
Sent: 23 February 2021 06:09:52
To: ceph-users@ceph.io
Subject: [ceph-users] Newbie Requesting Help - Please, This Is Driving Me 
Mad/Crazy!

Hi Everyone,

Let me apologise upfront:

 If this isn't the correct List to post to
 If this has been answered already (& I've missed it in my searching)
 If this has ended up double posted
 If I've in any way given (or about to give) offence to anyone

I really need some help.

I'm trying to get a simple single host Pilot/Test Cluster up and running. I'm 
using CentOS 8 (fully updated), and Ceph-Octopus (latest version from the Ceph 
Repo). I have both ceph-mon and ceph-mgr working/running (although ceph-mge 
keeps stopping/crashing after about 1-3 hours or so - but that's another 
issue), and my first osd (and only osd at this point) *appears* to be working, 
but when I issue the command 'systemctl start ceph-osd@0' the ceph-osd daemon 
won't spin up and thus when I issue 'ceph -s' the result says the 'osd: 1 osds: 
0 up, 0 in'.

I've gone through the relevant logs but I can't seem to find the issue.

I'm doing this as a Manual Install because I want to actually *learn* what's going on 
during the install/etc. I know I can use cephadmin (in a production environment), but as 
I said, I'm trying to learn how everything "fits together".

I've read and re-read the official Ceph Documentation and followed the 
following steps/commands to get Ceph installed and running:

Ran the following commands:
 su -
 useradd -d /home/ceph -m ceph -p 
 mkdir /home/ceph/.ssh

Added a public SSH Key to /home/ceph/.ssh/authorized_keys.

Ran the following commands:
 chmod 600 /home/ceph/.ssh/*
 chown ceph:ceph -R /home/ceph/.ssh

Added the ceph.repo details to /etc/yum.repos.d/ceph.repo (as per the Ceph 
Documentation).

Ran the following command:
 dnf -y install qemu-kvm qemu-guest-agent libvirt gdisk ceph

Created the /etc/ceph/ceph.conf file (see listing below).

Ran the following commands:
 ceph-authtool --create-keyring /etc/ceph/ceph.mon.keyring --gen-key -n 
mon. --cap mon 'allow *'
 ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring 
--gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 
'allow *' --cap mgr 'allow *'
 ceph-authtool --create-keyring /var/lib/ceph/bootstrap-osd/keyring 
--gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 
'allow r'
 ceph-authtool /etc/ceph/ceph.mon.keyring --import-keyring 
/etc/ceph/ceph.client.admin.keyring
 ceph-authtool /etc/ceph/ceph.mon.keyring --import-keyring 
/var/lib/ceph/bootstrap-osd/keyring
 chown -R ceph:ceph /etc/ceph/
 chown -R ceph:ceph /var/lib/ceph/
 monmaptool --create --add ceph01 192.168.0.10 --fsid 
98e84f97-031f-4958-bd54-22305f6bc738 /etc/ceph/monmap
 mkdir /var/lib/ceph/mon/ceph-ceph01
 chown -R ceph:ceph /var/lib/ceph
 sudo -u ceph ceph-mon --mkfs -i ceph01 --monmap /etc/ceph/monmap 
--keyring /etc/ceph/ceph.mon.keyring
 firewall-cmd --add-service=http --permanent
 firewall-cmd --add-service=ceph --permanent
 firewall-cmd --add-service=ceph-mon --permanent
 firewall-cmd --reload
 chmod -R 750 /var/lib/ceph/
 systemctl start ceph-mon@ceph01
 ceph mon enable-msgr2
 mkdir /var/lib/ceph/mgr/ceph-ceph01
 chown ceph:ceph /var/lib/ceph/mgr/ceph-ceph01
 ceph auth get-or-create mgr.ceph01 mon 'allow profile mgr' mds 'allow 
*' osd 'allow *' -o /var/lib/ceph/mgr/ceph-ceph01/keyring