date:20220613

[ceph-users] error: _ASSERT_H not a pointer

2022-06-13 Thread renjianxinlover





| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
On 6/14/2022 13:21，renjianxinlover wrote：
Ceph version: v12.2.10
OS Destribution: debian9
Kernel Release & Version: 4.9.0-18-amd64 #1 SMP Debian 4.9.303-1 (2022-03-07) 
x86_64 GNU/Linux


But, building ceph failed, error snippet looks like 
...
[ 33%] Built target osdc
Scanning dependencies of target librados_api_obj
[ 33%] Building CXX object 
src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/EventSelect.cc.o
[ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/Stack.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/PosixStack.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/net_handler.cc.o
/mnt/ceph-source-code/ceph/src/librados/librados.cc: In static member function 
‘static librados::AioCompletion* librados::Rados::aio_create_completion(void*, 
librados::callback_t, librados::callback_t)’:
/mnt/ceph-source-code/ceph/src/librados/librados.cc:2747:7: warning: unused 
variable ‘r’ [-Wunused-variable]
  int r = rados_aio_create_completion(cb_arg, cb_complete, cb_safe, (void**));
  ^
In file included from /mnt/ceph-source-code/ceph/src/include/Context.h:19:0,
from /mnt/ceph-source-code/ceph/src/common/Cond.h:19,
from 
/mnt/ceph-source-code/ceph/src/librados/AioCompletionImpl.h:18,
from /mnt/ceph-source-code/ceph/src/librados/librados.cc:29:
/mnt/ceph-source-code/ceph/src/librados/librados.cc: In function ‘int 
rados_conf_read_file(rados_t, const char*)’:
/mnt/ceph-source-code/ceph/src/common/dout.h:80:12: error: base operand of ‘->’ 
is not a pointer
  _ASSERT_H->_log->submit_entry(_dout_e);  \
   ^
/mnt/ceph-source-code/ceph/src/common/dout.h:80:12: note: in definition of 
macro ‘dendl_impl’
  _ASSERT_H->_log->submit_entry(_dout_e);  \
   ^~
/mnt/ceph-source-code/ceph/src/librados/librados.cc:2897:47: note: in expansion 
of macro ‘dendl’
  lderr(client->cct) cct) << warnings.str() << dendl;
  ^
[ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/QueueStrategy.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/rdma/Infiniband.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/rdma/RDMAConnectedSocketImpl.cc.o
src/librados/CMakeFiles/librados_api_obj.dir/build.make:62: recipe for target 
'src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o' failed
make[3]: *** [src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o] Error 
1
CMakeFiles/Makefile2:3814: recipe for target 
'src/librados/CMakeFiles/librados_api_obj.dir/all' failed
make[2]: *** [src/librados/CMakeFiles/librados_api_obj.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/rdma/RDMAServerSocketImpl.cc.o
...


Brs
| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
On 6/14/2022 06:06， wrote：
Send ceph-users mailing list submissions to
ceph-users@ceph.io

To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
ceph-users-requ...@ceph.io

You can reach the person managing the list at
ceph-users-ow...@ceph.io

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

1. Re: something wrong with my monitor database ? (Stefan Kooman)
2. Changes to Crush Weight Causing Degraded PGs instead of Remapped
(Wesley Dillingham)
3. Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped
(Eugen Block)
4. Re: My cluster is down. Two osd:s on different hosts uses all memory on boot 
and then crashes.
(Stefan)
5. Copying and renaming pools (Pardhiv Karri)
6. Ceph Octopus RGW - files vanished from rados while still in bucket index
(Boris Behrens)


--

Message: 1
Date: Mon, 13 Jun 2022 18:37:59 +0200
From: Stefan Kooman 
Subject: [ceph-users] Re: something wrong with my monitor database ?
To: Eric Le Lay , ceph-users@ceph.io
Message-ID: 
Content-Type: text/plain; charset=UTF-8; format=flowed

On 6/13/22 18:21, Eric Le Lay wrote:


Those objects are deleted but have snapshots, even if the pool itself
doesn't have snapshots.
What could cause that?


root@hpc1a:~# rados -p storage stat
rbd_data.5b423b48a4643f.0006a4e5
 error stat-ing storage/rbd_data.5b423b48a4643f.0006a4e5: (2)
No such file or directory
root@hpc1a:~# rados -p storage lssnap
0 snaps
root@hpc1a:~# rados -p storage listsnaps
rbd_data.5b423b48a4643f.0006a4e5
rbd_data.5b423b48a4643f.0006a4e5:
cloneidsnapssizeoverlap
116011604194304
[1048576~32768,1097728~16384,1228800~16384,1409024~16384,1441792~16384,1572864~16384,1720320~16384,1900544~16384,2310144~16384]

136413644194304[]

Do the OSDs still need to trim the snapshots? Does data usage decline
over time?

[ceph-users] Re: ceph-users Digest, Vol 113, Issue 36

2022-06-13 Thread renjianxinlover

Ceph version: v12.2.10
OS Destribution: debian9
Kernel Release & Version: 4.9.0-18-amd64 #1 SMP Debian 4.9.303-1 (2022-03-07) 
x86_64 GNU/Linux


But, building ceph failed, error snippet looks like 
...
[ 33%] Built target osdc
Scanning dependencies of target librados_api_obj
[ 33%] Building CXX object 
src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/EventSelect.cc.o
[ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/async/Stack.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/PosixStack.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/net_handler.cc.o
/mnt/ceph-source-code/ceph/src/librados/librados.cc: In static member function 
‘static librados::AioCompletion* librados::Rados::aio_create_completion(void*, 
librados::callback_t, librados::callback_t)’:
/mnt/ceph-source-code/ceph/src/librados/librados.cc:2747:7: warning: unused 
variable ‘r’ [-Wunused-variable]
  int r = rados_aio_create_completion(cb_arg, cb_complete, cb_safe, (void**));
  ^
In file included from /mnt/ceph-source-code/ceph/src/include/Context.h:19:0,
from /mnt/ceph-source-code/ceph/src/common/Cond.h:19,
from 
/mnt/ceph-source-code/ceph/src/librados/AioCompletionImpl.h:18,
from /mnt/ceph-source-code/ceph/src/librados/librados.cc:29:
/mnt/ceph-source-code/ceph/src/librados/librados.cc: In function ‘int 
rados_conf_read_file(rados_t, const char*)’:
/mnt/ceph-source-code/ceph/src/common/dout.h:80:12: error: base operand of ‘->’ 
is not a pointer
  _ASSERT_H->_log->submit_entry(_dout_e);  \
   ^
/mnt/ceph-source-code/ceph/src/common/dout.h:80:12: note: in definition of 
macro ‘dendl_impl’
  _ASSERT_H->_log->submit_entry(_dout_e);  \
   ^~
/mnt/ceph-source-code/ceph/src/librados/librados.cc:2897:47: note: in expansion 
of macro ‘dendl’
  lderr(client->cct) cct) << warnings.str() << dendl;
  ^
[ 33%] Building CXX object src/CMakeFiles/common-objs.dir/msg/QueueStrategy.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/rdma/Infiniband.cc.o
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/rdma/RDMAConnectedSocketImpl.cc.o
src/librados/CMakeFiles/librados_api_obj.dir/build.make:62: recipe for target 
'src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o' failed
make[3]: *** [src/librados/CMakeFiles/librados_api_obj.dir/librados.cc.o] Error 
1
CMakeFiles/Makefile2:3814: recipe for target 
'src/librados/CMakeFiles/librados_api_obj.dir/all' failed
make[2]: *** [src/librados/CMakeFiles/librados_api_obj.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs
[ 33%] Building CXX object 
src/CMakeFiles/common-objs.dir/msg/async/rdma/RDMAServerSocketImpl.cc.o
...


Brs
| |
renjianxinlover
|
|
renjianxinlo...@163.com
|
On 6/14/2022 06:06， wrote：
Send ceph-users mailing list submissions to
ceph-users@ceph.io

To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
ceph-users-requ...@ceph.io

You can reach the person managing the list at
ceph-users-ow...@ceph.io

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

1. Re: something wrong with my monitor database ? (Stefan Kooman)
2. Changes to Crush Weight Causing Degraded PGs instead of Remapped
(Wesley Dillingham)
3. Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped
(Eugen Block)
4. Re: My cluster is down. Two osd:s on different hosts uses all memory on boot 
and then crashes.
(Stefan)
5. Copying and renaming pools (Pardhiv Karri)
6. Ceph Octopus RGW - files vanished from rados while still in bucket index
(Boris Behrens)


--

Message: 1
Date: Mon, 13 Jun 2022 18:37:59 +0200
From: Stefan Kooman 
Subject: [ceph-users] Re: something wrong with my monitor database ?
To: Eric Le Lay , ceph-users@ceph.io
Message-ID: 
Content-Type: text/plain; charset=UTF-8; format=flowed

On 6/13/22 18:21, Eric Le Lay wrote:


Those objects are deleted but have snapshots, even if the pool itself
doesn't have snapshots.
What could cause that?


root@hpc1a:~# rados -p storage stat
rbd_data.5b423b48a4643f.0006a4e5
 error stat-ing storage/rbd_data.5b423b48a4643f.0006a4e5: (2)
No such file or directory
root@hpc1a:~# rados -p storage lssnap
0 snaps
root@hpc1a:~# rados -p storage listsnaps
rbd_data.5b423b48a4643f.0006a4e5
rbd_data.5b423b48a4643f.0006a4e5:
cloneidsnapssizeoverlap
116011604194304
[1048576~32768,1097728~16384,1228800~16384,1409024~16384,1441792~16384,1572864~16384,1720320~16384,1900544~16384,2310144~16384]

136413644194304[]

Do the OSDs still need to trim the snapshots? Does data usage decline
over time?

Gr. Stefan

--

Message: 2
Date: Mon, 13 Jun 2022 13:37:32 -0400

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-13 Thread Boris Behrens

Hmm.. I will check what the user is deleting. Maybe this is it.
Do you know if this bug is new in 15.2.16?

I can't share the data, but I can share the metadata:
https://pastebin.com/raw/T1YYLuec

For the missing files I have, the multipart file is not available in rados,
but the 0 byte file is.
The rest is more or less identical.

The seem to use the aws-sdk-dotnet (aws-sdk-dotnet-coreclr/3.3.110.57
aws-sdk-dotnet-core/3.3.106.11), but so small multiparts are very strange.
I guess you can really screw up configs but who am I to judge.

Am Di., 14. Juni 2022 um 00:29 Uhr schrieb J. Eric Ivancich <
ivanc...@redhat.com>:

> There is no known bug that would cause the rados objects underlying an RGW
> object to be removed without a user requesting the RGW object be deleted.
>
> There is a known bug where the bucket index might not get updated
> correctly after user-requested operations. So perhaps the user removed the
> rgw object, but it still incorrectly shows up in the bucket index. The PR
> for the fix for that bug merged into the octopus branch, but after 15.2.16.
> See:
>
> https://github.com/ceph/ceph/pull/45902
>
> So it should be in the next octopus release.
>
> I also find it odd that a 250KB file gets a multipart object. What do we
> know about the original object? Do we know it’s size? Could the multipart
> upload never have completed? In that case there could be incomplete
> multipart entries in the bucket index, but they should never have been
> finalized into a regular bucket index entry.
>
> Are you willing to share all the bucket index entries related to this
> object?
>
> Eric
> (he/him)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-13 Thread Wesley Dillingham

Thanks for the reply. I believe regarding "0" vs "0.0" its the same
difference. I will note its not just changing crush weights which induces
this situation. Introducing upmaps manually or via the balancer also causes
the PGs to be degraded instead of the expected remapped PG state.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Jun 13, 2022 at 9:27 PM Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> wrote:

> Isn’t it the correct syntax like this?
>
> ceph osd crush reweight osd.1 0.0 ?
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> On 2022. Jun 14., at 0:38, Wesley Dillingham 
> wrote:
>
> ceph osd crush reweight osd.1 0
>
>
> --
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-13 Thread Boris Behrens

Hi everybody,

are there other ways for rados objects to get removed, other than "rados -p
POOL rm OBJECT"?
We have a customer who got objects in the bucket index, but can't download
it. After checking it seems like the rados object is gone.

Ceph cluster is running ceph octopus 15.2.16

"radosgw-admin bi list --bucket BUCKET" shows the object available.
"radosgw-admin bucket radoslist --bucket BUCKET" shows the object and a
corresponding multipart file.
"rados -p POOL ls" only shows the object, but not the multipart file.

Exporting the rados object hands me an empty file.

I find it very strange that a 250KB file get a multipart object, but what
do I know how the customer uploaded the file and how they work with the RGW
api.

What grinds my gears is that we lost customer data, and I need to know what
ways are there that leads to said problem.

I know there is no recovery, but I am not satisfied with "well, it just
happened. No idea why".
As I am the only one who is working on the the ceph cluster I would remove
"removed via rados command" from the list of possibilities, as the last
orphan objects cleanup was performed a month before the files last MTIME.

Is there ANY way this could happen in some correlation with the GC,
restarting/adding/removing OSDs, sharding bucket indexes, OSD crashes and
other? Anything that isn't "rados -p POOL rm OBJECT"?

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Copying and renaming pools

2022-06-13 Thread Pardhiv Karri

Hi,

Our Ceph is used as backend storage for Openstack. We use the "images" pool
for glance and the "compute" pool for instances. We need to migrate our
images pool which is on HDD drives to SSD drives.

I copied all the data from the "images" pool that is on HDD disks to an
"ssdimages" pool that is on SSD disks, made sure the crush rules are all
good. I used "rbd deep copy" to migrate all the objects. Then I renamed the
pools, "images" to "hddimages" and "ssdimages" to "images".

Our Openstack instances are on the "compute" pool. All the instances that
are created using the image show the parent as an image from the images
pool. I thought renaming would point to the new pool that is on SSD disks
with renamed as "images" but now interestingly all the instances rbd
info are now pointing to the parent "hddimages". How to make sure the
parent pointers stay as "images" only instead of modifying to "hddimages"?

Before renaming pools:

lab [root@ctl01 /]# rbd info
compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk
rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk':
size 100GiB in 12800 objects
order 23 (8MiB objects)
block_name_prefix: rbd_data.8f51c347398c89
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Tue Mar 15 21:36:55 2022
parent: images/909e6734-6f84-466a-b2fa-487b73a1f50a@snap
overlap: 10GiB
lab [root@ctl01 /]#



After renaming pools, the parent value autoamitclaly gets modified:
lab [root@ctl01 /]# rbd info
compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk
rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk':
size 100GiB in 12800 objects
order 23 (8MiB objects)
block_name_prefix: rbd_data.8f51c347398c89
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Tue Mar 15 21:36:55 2022
parent: hddimages/909e6734-6f84-466a-b2fa-487b73a1f50a@snap
overlap: 10GiB
lab [root@ctl01 /]#


Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-13 Thread Eugen Block

I remember someone reporting the same thing but I can’t find the  
thread right now. I’ll try again tomorrow.


Zitat von Wesley Dillingham :


I have a brand new Cluster 16.2.9 running bluestore with 0 client activity.
I am modifying some crush weights to move PGs off of a host for testing
purposes but the result is that the PGs go into a degraded+remapped state
instead of simply a remapped state. This is a strange result to me as in
previous releases (nautilus) this would cause only Remapped PGs. Are there
any known issues around this? Are others running Pacific seeing similar
behavior? Thanks.

"ceph osd crush reweight osd.1 0"

^ Causes degraded PGs which then go into recovery. Expect only remapped PGs

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-13 Thread Wesley Dillingham

I have a brand new Cluster 16.2.9 running bluestore with 0 client activity.
I am modifying some crush weights to move PGs off of a host for testing
purposes but the result is that the PGs go into a degraded+remapped state
instead of simply a remapped state. This is a strange result to me as in
previous releases (nautilus) this would cause only Remapped PGs. Are there
any known issues around this? Are others running Pacific seeing similar
behavior? Thanks.

"ceph osd crush reweight osd.1 0"

^ Causes degraded PGs which then go into recovery. Expect only remapped PGs

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: something wrong with my monitor database ?

2022-06-13 Thread Eric Le Lay

Le 13/06/2022 à 17:54, Eric Le Lay a écrit :

Le 10/06/2022 à 11:58, Stefan Kooman a écrit :
CAUTION: This email originated from outside the organization. Do not
click links or open attachments unless you recognize the sender and
know the content is safe.

On 6/10/22 11:41, Eric Le Lay wrote:

Hello list,

my ceph cluster was upgraded from nautilus to octopus last October,
causing snaptrims
to overload OSDs so I had to disable them
(bluefs_buffered_io=false|true

didn't help).

Now I've copied data elsewhere and removed all clients and try to fix
the cluster.
Scraping it and starting over is possible, but it would be wonderful if
we could
figure out what's wrong with it...

FYI: osd snap trim sleep <- adding some sleep might help alleviate the
impact on the cluster.

If HEALTH is OK I would not expect anything wrong with your cluster.

Does " ceph osd dump |grep require_osd_release" give you
require_osd_release octopus?

Gr. Stefan

Hi Stefan,

thank you for your answer.
Even osd_snap_trim_sleep=10 was not sustainable with normal cluster
load.|
Following your email I've tested bluefs_buffered_io=true again and
indeed it dramatically reduces disk load, but not cpu nor slow ceph io.

Yes, require_osd_release=octopus.

What worries me is the pool is now void of rbd images, but still has
14TiB of object data.

Here is my pool contents. rbd_directory, rbd_trash are empty.

rados -p storage ls | sed 's/\(.*\..*\)\..*/\1/'|sort|uniq -c
1 rbd_children
6 rbd_data.13fc0d1d63c52b
2634 rbd_data.15ab844f62d5
258 rbd_data.15f1f2e2398dc7
133 rbd_data.17d93e1c5a4855
258 rbd_data.1af03e352ec460
2987 rbd_data.236cfc2474b020
206872 rbd_data.31c55ee49f0abb
604593 rbd_data.5b423b48a4643f
90 rbd_data.7b06b7abcc9441
81576 rbd_data.913b398f28d1
18 rbd_data.9662ade11235a
16051 rbd_data.e01609a7a07e20
278 rbd_data.e6b6f855b5172c
90 rbd_data.e85da37e044922
1 rbd_directory
1 rbd_info
1 rbd_trash

Eric

Those objects are deleted but have snapshots, even if the pool itself
doesn't have snapshots.

What could cause that?

root@hpc1a:~# rados -p storage stat rbd_data.5b423b48a4643f.0006a4e5
error stat-ing storage/rbd_data.5b423b48a4643f.0006a4e5: (2)
No such file or directory

root@hpc1a:~# rados -p storage lssnap
0 snaps
root@hpc1a:~# rados -p storage listsnaps
rbd_data.5b423b48a4643f.0006a4e5

rbd_data.5b423b48a4643f.0006a4e5:
cloneid snaps size overlap
1160 1160 4194304
[1048576~32768,1097728~16384,1228800~16384,1409024~16384,1441792~16384,1572864~16384,1720320~16384,1900544~16384,2310144~16384]

1364 1364 4194304 []

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: something wrong with my monitor database ?

2022-06-13 Thread Eric Le Lay


Le 10/06/2022 à 11:58, Stefan Kooman a écrit :
CAUTION: This email originated from outside the organization. Do not 
click links or open attachments unless you recognize the sender and 
know the content is safe.


On 6/10/22 11:41, Eric Le Lay wrote:

Hello list,

my ceph cluster was upgraded from nautilus to octopus last October,
causing snaptrims
to overload OSDs so I had to disable them (bluefs_buffered_io=false|true
didn't help).

Now I've copied data elsewhere and removed all clients and try to fix
the cluster.
Scraping it and starting over is possible, but it would be wonderful if
we could
figure out what's wrong with it...


FYI: osd snap trim sleep <- adding some sleep might help alleviate the
impact on the cluster.

If HEALTH is OK I would not expect anything wrong with your cluster.

Does " ceph osd dump |grep require_osd_release" give you
require_osd_release octopus?

Gr. Stefan

|Hi Stefan,

thank you for your answer.

|

|Even osd_snap_trim_sleep=10 was not sustainable with normal cluster load.|

|
|

||

|Following your email I've tested bluefs_buffered_io=true again and 
indeed it dramatically reduces disk load, but not cpu nor slow ceph io.


Yes, require_osd_release=octopus.

What worries me is the pool is now void of rbd images, but still has 
14TiB of object data.

Here is my pool contents. rbd_directory, rbd_trash are empty.

   rados -p storage ls | sed 's/\(.*\..*\)\..*/\1/'|sort|uniq -c
  1 rbd_children
  6 rbd_data.13fc0d1d63c52b
   2634 rbd_data.15ab844f62d5
    258 rbd_data.15f1f2e2398dc7
    133 rbd_data.17d93e1c5a4855
    258 rbd_data.1af03e352ec460
   2987 rbd_data.236cfc2474b020
 206872 rbd_data.31c55ee49f0abb
 604593 rbd_data.5b423b48a4643f
 90 rbd_data.7b06b7abcc9441
  81576 rbd_data.913b398f28d1
 18 rbd_data.9662ade11235a
  16051 rbd_data.e01609a7a07e20
    278 rbd_data.e6b6f855b5172c
 90 rbd_data.e85da37e044922
  1 rbd_directory
  1 rbd_info
  1 rbd_trash

Eric



|
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph add-repo Unable to find a match epel-release

2022-06-13 Thread Kostadin Bukov


Hello ceph users,
I'm trying to setup ceph-client on one of my ceph-cluster host.
My setup is following:
- 3 bare-metal HP synergy servers
- installed latest ceph release quincy (17.2.0) using curl/cephadm
- RHEL 8.6
- ceph-cluster is working fine and health status is OK

compute1 is the deployment/admin/client node
compute2 is part of the ceph cluster
computer3 is part of the ceph cluster

Two week ago I was able to install ceph-common package perfectly fine, so I 
could use all needed ceph-client tools.

[root@compute1 ceph]# cephadm install ceph-common
Installing packages ['ceph-common']...

Now I'm trying to install ceph-common on my second cluster node (compute2) 
to become a ceph-client.
Unfortunately I hit below error (tried it with root and with cephuser - 
still the same):


---
[root@compute2 ~]# curl --silent --remote-name --location 
https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm

[root@compute2 ~]# chmod +x cephadm
[root@compute2 ~]# ./cephadm add-repo --release quincy
Writing repo to /etc/yum.repos.d/ceph.repo...
Enabling EPEL...
Non-zero exit code 1 from yum install -y epel-release
yum: stdout Updating Subscription Management repositories.
yum: stdout Ceph x86_64 3.0 kB/s | 1.5 
kB 00:00
yum: stdout Ceph noarch 3.1 kB/s | 1.5 
kB 00:00
yum: stdout Ceph SRPMS  3.1 kB/s | 1.5 
kB 00:00

yum: stdout No match for argument: epel-release
yum: stderr Error: Unable to find a match: epel-release
Traceback (most recent call last):
  File "./cephadm", line 9281, in 
main()
  File "./cephadm", line 9269, in main
r = ctx.func(ctx)
  File "./cephadm", line 7819, in command_add_repo
pkg.add_repo()
  File "./cephadm", line 7668, in add_repo
call_throws(self.ctx, [self.tool, 'install', '-y', 'epel-release'])
  File "./cephadm", line 1738, in call_throws
raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
RuntimeError: Failed command: yum install -y epel-release: Error: Unable to 
find a match: epel-release


[root@compute2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.6 (Ootpa) 


[root@compute2 ~]# cat /etc/yum.repos.d/ceph.repo
[Ceph]
name=Ceph $basearch
baseurl=https://download.ceph.com/rpm-quincy/el8/$basearch
enabled=1
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.gpg

[Ceph-noarch]
name=Ceph noarch
baseurl=https://download.ceph.com/rpm-quincy/el8/noarch
enabled=1
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.gpg

[Ceph-source]
name=Ceph SRPMS
baseurl=https://download.ceph.com/rpm-quincy/el8/SRPMS
enabled=1
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.gpg
---

It seems that can't find epel-release rpm and install it...
Did I miss something or ..?
Please can you share your experience if you hit/saw similar error?

Regards,
Kosta
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] snap-schedule reappearing

2022-06-13 Thread Stolte, Felix

Hi folks,

i removed snapshot scheduling on a cephfs path (pacific), but they reappear the 
next day. I didn’t remove the retention for this path though. Does the 
retention on a path trigger the recreation of the snap scheduling if they were 
removed? Is this intended?

regards
Felix

-
-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
-
-

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] error: _ASSERT_H not a pointer

[ceph-users] Re: ceph-users Digest, Vol 113, Issue 36

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

[ceph-users] Ceph Octopus RGW - files vanished from rados while still in bucket index

[ceph-users] Copying and renaming pools

[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

[ceph-users] Changes to Crush Weight Causing Degraded PGs instead of Remapped

[ceph-users] Re: something wrong with my monitor database ?

[ceph-users] Re: something wrong with my monitor database ?

[ceph-users] Ceph add-repo Unable to find a match epel-release

[ceph-users] snap-schedule reappearing

12 matches

Site Navigation

Mail list logo

Footer information