date:20220614

[ceph-users] RGW Bucket Notifications and http push-endpoint

2022-06-14 Thread Mark Selby

I am experimenting with RGW bucket notifications and the simple https endpoint.

 

I have it mostly working expect for the concept of telling RGW that I have 
received the notification.

 

If I set persistent = False when I issue a put from an s3 compatible client the 
command “hangs” as I think that RGW is waiting for an ack from the notification 
event it has just sent. In this test case I am specifying a simple https 
endpoint. I am thinking that I need to “ack back” so that RGW tell the client 
that the request is complete.

 

In async mode, RGW keep hammering my endpoint with notifications. I image it 
will not stop until it gets an ack.

 

How do you send acks back to RGW when you are using simple https notification 
endpoints?

 

Thanks!

 

-- 

Mark Selby

Sr Linux Administrator, The Voleon Group

mse...@voleon.com 

 

 This email is subject to important conditions and disclosures that are listed 
on this web page: https://voleon.com/disclaimer/.

 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Announcing go-ceph v0.16.0

2022-06-14 Thread John Mulligan

On Tuesday, June 14, 2022 4:29:59 PM EDT John Mulligan wrote:
> I'm happy to announce another release of the go-ceph API library. This is a
> regular release following our every-two-months release cadence.
> 
> https://github.com/ceph/go-ceph/releases/tag/v0.64.0
> 

Eventually I was bound to typo that link. The correct link is 
https://github.com/ceph/go-ceph/releases/tag/v0.16.0

Apologies for any confusion.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Announcing go-ceph v0.16.0

2022-06-14 Thread John Mulligan

I'm happy to announce another release of the go-ceph API library. This is a 
regular release following our every-two-months release cadence.

https://github.com/ceph/go-ceph/releases/tag/v0.64.0

Changes include additions to the cephfs admin package and rbd package. 
More details are available at the link above.

The library includes bindings that aim to play a similar role to the "pybind" 
python bindings in the ceph tree but for the Go language. The library also 
includes additional APIs that can be used to administer cephfs, rbd, and rgw 
subsystems.
There are already a few consumers of this library in the wild, including the 
ceph-csi project.


-- 
John Mulligan

phlogistonj...@asynchrono.us
jmulli...@redhat.com



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph on RHEL 9

2022-06-14 Thread Robert W. Eckert

I enabled Centos EPEL Next and downloaded the ceph RPMs from 
https://cbs.centos.org/koji/buildinfo?buildID=39161  - there were  a few 
additional dependencies I had to get for libarrow as well.  My history also 
shows installing thrift as well.

This was everything I downloaded and installed

ceph-17.2.0-2.el9s.x86_64.rpm 
ceph-prometheus-alerts-17.2.0-2.el9s.noarch.rpm  
librgw-devel-17.2.0-2.el9s.x86_64.rpm

cephadm-17.2.0-2.el9s.noarch.rpm  
ceph-radosgw-17.2.0-2.el9s.x86_64.rpm
parquet-libs-7.0.0-3.el9s.x86_64.rpm

ceph-base-17.2.0-2.el9s.x86_64.rpm
ceph-selinux-17.2.0-2.el9s.x86_64.rpm
python3-ceph-argparse-17.2.0-2.el9s.x86_64.rpm

ceph-common-17.2.0-2.el9s.x86_64.rpm  
ceph-volume-17.2.0-2.el9s.noarch.rpm 
python3-ceph-common-17.2.0-2.el9s.x86_64.rpm

cephfs-mirror-17.2.0-2.el9s.x86_64.rpm
libarrow-7.0.0-3.el9s.x86_64.rpm 
python3-cephfs-17.2.0-2.el9s.x86_64.rpm

cephfs-top-17.2.0-2.el9s.noarch.rpm   
libarrow-doc-7.0.0-3.el9s.noarch.rpm 
python3-rados-17.2.0-2.el9s.x86_64.rpm

ceph-fuse-17.2.0-2.el9s.x86_64.rpm
libcephfs2-17.2.0-2.el9s.x86_64.rpm  
python3-rbd-17.2.0-2.el9s.x86_64.rpm

ceph-grafana-dashboards-17.2.0-2.el9s.noarch.rpm  
libcephsqlite-17.2.0-2.el9s.x86_64.rpm   
python3-rgw-17.2.0-2.el9s.x86_64.rpm

ceph-immutable-object-cache-17.2.0-2.el9s.x86_64.rpm  
libcephsqlite-devel-17.2.0-2.el9s.x86_64.rpm 
rados-objclass-devel-17.2.0-2.el9s.x86_64.rpm

ceph-mds-17.2.0-2.el9s.x86_64.rpm 
librados2-17.2.0-2.el9s.x86_64.rpm   
rbd-fuse-17.2.0-2.el9s.x86_64.rpm

ceph-mgr-17.2.0-2.el9s.x86_64.rpm 
librados-devel-17.2.0-2.el9s.x86_64.rpm  
rbd-mirror-17.2.0-2.el9s.x86_64.rpm

ceph-mgr-cephadm-17.2.0-2.el9s.noarch.rpm 
libradospp-devel-17.2.0-2.el9s.x86_64.rpm
rbd-nbd-17.2.0-2.el9s.x86_64.rpm

ceph-mgr-dashboard-17.2.0-2.el9s.noarch.rpm   
libradosstriper1-17.2.0-2.el9s.x86_64.rpmrepodata

ceph-mgr-modules-core-17.2.0-2.el9s.noarch.rpm
librbd1-17.2.0-2.el9s.x86_64.rpm thrift-0.14.0-7.el9s.x86_64.rpm

ceph-mon-17.2.0-2.el9s.x86_64.rpm 
librbd-devel-17.2.0-2.el9s.x86_64.rpm
thrift-glib-0.14.0-7.el9s.x86_64.rpm

ceph-osd-17.2.0-2.el9s.x86_64.rpm 
librgw2-17.2.0-2.el9s.x86_64.rpm



Once I did that and built the repo, the install went smoothly on 3 different 
servers.



I probably pulled more than I needed to, but since the install works, I'll take 
that as win 







-Original Message-
From: Gregory Farnum 
Sent: Friday, June 10, 2022 12:46 PM
To: Robert W. Eckert 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Ceph on RHEL 9



We aren't building for Centos 9 yet, so I guess the python dependency 
declarations don't work with the versions in that release.

I've put updating to 9 on the agenda for the next CLT.



(Do note that we don't test upstream packages against RHEL, so if Centos Stream 
does something which doesn't match the RHEL release it still might get busted.) 
-Greg



On Thu, Jun 9, 2022 at 6:57 PM Robert W. Eckert 
mailto:r...@rob.eckert.name>> wrote:

>

> Does anyone have any pointers to install CEPH on Rhel 9?

>

> -Original Message-

> From: Robert W. Eckert mailto:r...@rob.eckert.name>>

> Sent: Saturday, May 28, 2022 8:28 PM

> To: ceph-users@ceph.io

> Subject: [ceph-users] Ceph on RHEL 9

>

> Hi- I started to update my 3 host cluster to RHEL 9, but came across a bit of 
> a stumbling block.

>

> The upgrade process uses the RHEL leapp process, which ran through a few 
> simple things to clean up, and told me everything was hunky dory, but when I 
> kicked off the first server, the server wouldn't boot because I had a ceph 
> filesystem mounted in /etc/fstab, commenting it out, let the upgrade happen.

>

> Then I went to check on the ceph client which appears to be uninstalled.

>

> When I tried to install ceph,  I got:

>

> [root@story ~]# dnf install ceph

> Updating Subscription Management repositories.

> Last metadata expiration check: 0:07:58 ago on Sat 28 May 2022 08:06:52 PM 
> EDT.

> Error:

> Problem: package ceph-2:17.2.0-0.el8.x86_64 requires ceph-mgr = 
> 2:17.2.0-0.el8, but none of the providers can be installed

>   - conflicting requests

>   - nothing provides libpython3.6m.so.1.0()(64bit) needed by

> ceph-mgr-2:17.2.0-0.el8.x86_64 (try to add '--skip-broken' to skip

> uninstallable packages or '--nobest' to use not only best candidate

> packages)

>

> This is the content of my /etc/yum.repos.d/ceph.conf

>

> [ceph]

> name=Ceph packages for $basearch

> baseurl=https://download.ceph.com/rpm-quincy/el8/$basearch

> enabled=1

> priority=2

> gpgcheck=1

> gpgkey=https://download.ceph.com/keys/release.asc

>

>

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-14 Thread Boris Behrens

Hi Eric,
oh, the pastebin shows how an available file looks (very small file which
got uploaded as multipart).
The unavailable file has the "_multipart" rados object missing, everything
else is very close. I could have phrased that better.

The customer now checked all their files and they are missing only three
files. They've got backups and they are checking if the files are in the
backup.

Around the creation date of these missing files, we had problems with two
OSDs which were marked down multiple times, got killed and restarted by
systemd.
Can this factor in the problem?
We have a lot of OSD restarts, which is explained here:
https://tracker.ceph.com/issues/54434 (we are now in the process of adding
more and larger SSDs for the block.db devices).

I am still not sure how these files could go missing. And when they exist
in the backup, and their logs don't show any deletes of said files I am
even more clueless.

Thank you for your time and your reply.

Cheers
 Boris

Am Di., 14. Juni 2022 um 18:38 Uhr schrieb J. Eric Ivancich <
ivanc...@redhat.com>:

> Hi Boris,
>
> I’m a little confused. The pastebin seems to show that you can stat "
> ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.17600__multipart_available_file1.pdf.2~ve8VhAEvaRSzAPfacz9rI-aLMpLY_Yw.1”,
> but I thought it was missing. Can you clarify?
>
> The bug has been in RGW for a quite a while, well before octopus. It
> involves a race condition with a very narrow window, so normally only
> encountered in large, busy clusters.
>
> Also, I think it’s up to the s3 client whether to use multipart upload. Do
> you know which s3 client the user was using?
>
> Eric
> (he/him)
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Possible to recover deleted files from CephFS?

2022-06-14 Thread Michael Sherman

Thank you, this is extremely helpful!

Unfortunately, none of the inodes mentioned in the `stray/` logs are 
present in the output of `rados -p cephfs_data ls`

Am I correct in assuming that this means they’re gone for good?

I did shut down the MDS with `ceph fs fail cephfs` when we noticed the issue, 
but that appears to have been too slow.

-Mike

From: Gregory Farnum 
Date: Tuesday, June 14, 2022 at 12:15 PM
To: Michael Sherman 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Possible to recover deleted files from CephFS?
On Tue, Jun 14, 2022 at 8:50 AM Michael Sherman  wrote:
>
> Hi,
>
> We discovered that a number of files were deleted from our cephfs filesystem, 
> and haven’t been able to find current backups or snapshots.
>
> Is it possible to “undelete” a file by modifying metadata? Using 
> `cephfs-journal-tool`, I am able to find the `unlink` event for each file, 
> looking like the following:
>
> $ cephfs-journal-tool --rank cephfs:all event get 
> --path="images/060862a9-a648-4e7e-96e3-5ba3dea29eab" list
> …
> 2022-06-09 17:09:20.123155 0x170da7fc UPDATE:  (unlink_local)
>   stray5/1001fee
>   images/060862a9-a648-4e7e-96e3-5ba3dea29eab
>
> I saw the disaster-recovery-tools mentioned 
> here,
>  but didn’t know if they would be helpful in the case of a deletion.
>
> Thank you in advance for any help.

Once files are unlinked they get moved into the stray directory, and
then into the purge queue when they are truly unused.

The purge queue processes them and deletes the backing objects.

So the first thing you should do is turn off the MDS, as that is what
performs the actual deletions.

If you've already found the unlink events, you know the inode numbers
you want. You can look in rados for the backing objects and just copy
them out (and reassemble them if the file was >4MB). CephFS files are
stored in RADOS with the pattern .. If your cluster isn't too big, you can just:
rados -p  ls | grep 1001fee
for the example file you referenced above. (Or more probably, dump the
listing into a file and search that for the inode numbers).

If listing all the objects takes too long, you can construct the
object names in the other direction, which is simple enough but I
can't recall offhand the number of digits you start out with for the
 portion of the object name, so you'll have to
look at one and figure that out yourself. ;)

The disaster recovery tooling is really meant to recover a broken
filesystem; massaging it to get erroneously-deleted files back into
the tree would be rough. The only way I can think of doing that is
using the procedure to recover into a new metadata pool, and
performing just the cephfs-data-scan bits (because recovering the
metadata would obviously delete all the files again). But then your
tree (while self-consistent) would look strange again with files that
are in old locations and things, so I wouldn't recommend it.
-Greg

> -Mike Sherman
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Possible to recover deleted files from CephFS?

2022-06-14 Thread Gregory Farnum

On Tue, Jun 14, 2022 at 8:50 AM Michael Sherman  wrote:
>
> Hi,
>
> We discovered that a number of files were deleted from our cephfs filesystem, 
> and haven’t been able to find current backups or snapshots.
>
> Is it possible to “undelete” a file by modifying metadata? Using 
> `cephfs-journal-tool`, I am able to find the `unlink` event for each file, 
> looking like the following:
>
> $ cephfs-journal-tool --rank cephfs:all event get 
> --path="images/060862a9-a648-4e7e-96e3-5ba3dea29eab" list
> …
> 2022-06-09 17:09:20.123155 0x170da7fc UPDATE:  (unlink_local)
>   stray5/1001fee
>   images/060862a9-a648-4e7e-96e3-5ba3dea29eab
>
> I saw the disaster-recovery-tools mentioned 
> here,
>  but didn’t know if they would be helpful in the case of a deletion.
>
> Thank you in advance for any help.

Once files are unlinked they get moved into the stray directory, and
then into the purge queue when they are truly unused.

The purge queue processes them and deletes the backing objects.

So the first thing you should do is turn off the MDS, as that is what
performs the actual deletions.

If you've already found the unlink events, you know the inode numbers
you want. You can look in rados for the backing objects and just copy
them out (and reassemble them if the file was >4MB). CephFS files are
stored in RADOS with the pattern .. If your cluster isn't too big, you can just:
rados -p  ls | grep 1001fee
for the example file you referenced above. (Or more probably, dump the
listing into a file and search that for the inode numbers).

If listing all the objects takes too long, you can construct the
object names in the other direction, which is simple enough but I
can't recall offhand the number of digits you start out with for the
 portion of the object name, so you'll have to
look at one and figure that out yourself. ;)

The disaster recovery tooling is really meant to recover a broken
filesystem; massaging it to get erroneously-deleted files back into
the tree would be rough. The only way I can think of doing that is
using the procedure to recover into a new metadata pool, and
performing just the cephfs-data-scan bits (because recovering the
metadata would obviously delete all the files again). But then your
tree (while self-consistent) would look strange again with files that
are in old locations and things, so I wouldn't recommend it.
-Greg

> -Mike Sherman
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-14 Thread J. Eric Ivancich

Hi Boris,

I’m a little confused. The pastebin seems to show that you can stat 
"ff7a8b0c-07e6-463a-861b-78f0adeba8ad.81095307.17600__multipart_available_file1.pdf.2~ve8VhAEvaRSzAPfacz9rI-aLMpLY_Yw.1”,
 but I thought it was missing. Can you clarify?

The bug has been in RGW for a quite a while, well before octopus. It involves a 
race condition with a very narrow window, so normally only encountered in 
large, busy clusters.

Also, I think it’s up to the s3 client whether to use multipart upload. Do you 
know which s3 client the user was using?

Eric
(he/him)

> On Jun 14, 2022, at 1:02 AM, Boris Behrens  wrote:
> 
> Hmm.. I will check what the user is deleting. Maybe this is it. 
> Do you know if this bug is new in 15.2.16?
> 
> I can't share the data, but I can share the metadata:
> https://pastebin.com/raw/T1YYLuec 
> 
> For the missing files I have, the multipart file is not available in rados, 
> but the 0 byte file is. 
> The rest is more or less identical.
> 
> The seem to use the aws-sdk-dotnet (aws-sdk-dotnet-coreclr/3.3.110.57 
>  aws-sdk-dotnet-core/3.3.106.11 ), 
> but so small multiparts are very strange. I guess you can really screw up 
> configs but who am I to judge.
> 
> Am Di., 14. Juni 2022 um 00:29 Uhr schrieb J. Eric Ivancich 
> mailto:ivanc...@redhat.com>>:
> There is no known bug that would cause the rados objects underlying an RGW 
> object to be removed without a user requesting the RGW object be deleted.
> 
> There is a known bug where the bucket index might not get updated correctly 
> after user-requested operations. So perhaps the user removed the rgw object, 
> but it still incorrectly shows up in the bucket index. The PR for the fix for 
> that bug merged into the octopus branch, but after 15.2.16. See:
> 
> https://github.com/ceph/ceph/pull/45902 
> 
> 
> So it should be in the next octopus release.
> 
> I also find it odd that a 250KB file gets a multipart object. What do we know 
> about the original object? Do we know it’s size? Could the multipart upload 
> never have completed? In that case there could be incomplete multipart 
> entries in the bucket index, but they should never have been finalized into a 
> regular bucket index entry.
> 
> Are you willing to share all the bucket index entries related to this object?
> 
> Eric
> (he/him)

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How suitable is CEPH for....

2022-06-14 Thread Mark Lehrer

> I'm reading and trying to figure out how crazy
> is using Ceph for all of the above targets [MySQL]

Not crazy at all, it just depends on your performance needs.  16K I/O
is not the best Ceph use case, but the snapshot/qcow2 features may
justify it.

The biggest problem I have with MySQL is that each connection uses a
single CPU core.  Combine this with poor 16K performance, and it's
tough to get good performance unless there are a lot of users.

Mass loading of data is particularly agonizing on MySQL, especially on
Ceph.  The last time I had to do a mass import, it was much faster to
copy the rbd to a local drive partition and run my VM there for the
import and then copy the block device back to the rbd.  This is
because you can use qemu-img to copy the block device with a large
block size and up to 16 threads which can move multiple terabytes an
hour.

My MySQL database is almost always CPU bound and never more than ~20%
iowait, so it can run on Ceph fairly well.

Mark





On Tue, Jun 14, 2022 at 8:14 AM Kostadin Bukov
 wrote:
>
> Greetings to all great people from Ceph community,
> I'm currently digging and trying to collect pros and cons of using CEPH for
> below purposes:
>
> - for MySQL server datastore (InnoDB) using Cephfs or rbd. Let's say we
> have 1 running Mysql server (active) and in case it fails the same InnoDB
> datastore is accessed from a MySQL server 2 (started, access the InnoDB
> from MySQL server 1 and become the new active). Or better to use old-school
> 2 MySQL servers with replication and avoid Ceph at all)?
> - storing application log files from different nodes (something like a
> central place for logs from different bare-metal servers or VMs or
> containers). By the way our applications under heavy load could generate
> gigabytes of log files per hour...
> - for configuration files (for different applications)
> - for etcd
> - for storing backup files from different nodes
>
> I'm reading and trying to figure out how crazy is using Ceph for all of the
> above targets.
> Kindly can you share your opinions if you think this is too complex and I
> can end up with a lot of troubles if Ceph cluster goes down.
> The applications and MySQL server are for production/critical platform
> which might high-availability, redundancy and performance (sometimes apps
> and MySQL are quite hungry when writing to the disk)
> Log files and backup files are not so critical so maybe putting them on
> Ceph with replica x3 would just generate unnecessary ceph traffic between
> the ceph nodes.
> Application configurations are needed only when start/restart application.
> The most critical data from the whole is the MySQL InnoDB data
>
> Would be interesting to me if you share your thoughts/experience or I
> should look elsewhere
>
> Regards,
> Kosta
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Possible to recover deleted files from CephFS?

2022-06-14 Thread Michael Sherman

Hi,

We discovered that a number of files were deleted from our cephfs filesystem, 
and haven’t been able to find current backups or snapshots.

Is it possible to “undelete” a file by modifying metadata? Using 
`cephfs-journal-tool`, I am able to find the `unlink` event for each file, 
looking like the following:

$ cephfs-journal-tool --rank cephfs:all event get 
--path="images/060862a9-a648-4e7e-96e3-5ba3dea29eab" list
…
2022-06-09 17:09:20.123155 0x170da7fc UPDATE:  (unlink_local)
  stray5/1001fee
  images/060862a9-a648-4e7e-96e3-5ba3dea29eab

I saw the disaster-recovery-tools mentioned 
here,
 but didn’t know if they would be helpful in the case of a deletion.

Thank you in advance for any help.
-Mike Sherman
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-14 Thread Wesley Dillingham

I have made https://tracker.ceph.com/issues/56046 regarding the issue I am
observing.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Jun 14, 2022 at 5:32 AM Eugen Block  wrote:

> I found the thread I was referring to [1]. The report was very similar
> to yours, apparently the balancer seems to cause the "degraded"
> messages, but the thread was not concluded. Maybe a tracker ticket
> should be created if it doesn't already exist, I didn't find a ticket
> related to that in a quick search.
>
> [1]
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/H4L5VNQJKIDXXNY2TINEGUGOYLUTT5UL/
>
> Zitat von Wesley Dillingham :
>
> > Thanks for the reply. I believe regarding "0" vs "0.0" its the same
> > difference. I will note its not just changing crush weights which induces
> > this situation. Introducing upmaps manually or via the balancer also
> causes
> > the PGs to be degraded instead of the expected remapped PG state.
> >
> > Respectfully,
> >
> > *Wes Dillingham*
> > w...@wesdillingham.com
> > LinkedIn 
> >
> >
> > On Mon, Jun 13, 2022 at 9:27 PM Szabo, Istvan (Agoda) <
> > istvan.sz...@agoda.com> wrote:
> >
> >> Isn’t it the correct syntax like this?
> >>
> >> ceph osd crush reweight osd.1 0.0 ?
> >>
> >> Istvan Szabo
> >> Senior Infrastructure Engineer
> >> ---
> >> Agoda Services Co., Ltd.
> >> e: istvan.sz...@agoda.com
> >> ---
> >>
> >> On 2022. Jun 14., at 0:38, Wesley Dillingham 
> >> wrote:
> >>
> >> ceph osd crush reweight osd.1 0
> >>
> >>
> >> --
> >> This message is confidential and is for the sole use of the intended
> >> recipient(s). It may also be privileged or otherwise protected by
> copyright
> >> or other legal rules. If you have received it by mistake please let us
> know
> >> by reply email and delete it from your system. It is prohibited to copy
> >> this message or disclose its content to anyone. Any confidentiality or
> >> privilege is not waived or lost by any mistaken delivery or unauthorized
> >> disclosure of the message. All messages sent to and from Agoda may be
> >> monitored to ensure compliance with company policies, to protect the
> >> company's interests and to remove potential malware. Electronic messages
> >> may be intercepted, amended, lost or deleted, or contain viruses.
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] How suitable is CEPH for....

2022-06-14 Thread Kostadin Bukov


Greetings to all great people from Ceph community,
I'm currently digging and trying to collect pros and cons of using CEPH for 
below purposes:


- for MySQL server datastore (InnoDB) using Cephfs or rbd. Let's say we 
have 1 running Mysql server (active) and in case it fails the same InnoDB 
datastore is accessed from a MySQL server 2 (started, access the InnoDB 
from MySQL server 1 and become the new active). Or better to use old-school 
2 MySQL servers with replication and avoid Ceph at all)?
- storing application log files from different nodes (something like a 
central place for logs from different bare-metal servers or VMs or 
containers). By the way our applications under heavy load could generate 
gigabytes of log files per hour...

- for configuration files (for different applications)
- for etcd
- for storing backup files from different nodes

I'm reading and trying to figure out how crazy is using Ceph for all of the 
above targets.
Kindly can you share your opinions if you think this is too complex and I 
can end up with a lot of troubles if Ceph cluster goes down.
The applications and MySQL server are for production/critical platform 
which might high-availability, redundancy and performance (sometimes apps 
and MySQL are quite hungry when writing to the disk)
Log files and backup files are not so critical so maybe putting them on 
Ceph with replica x3 would just generate unnecessary ceph traffic between 
the ceph nodes.

Application configurations are needed only when start/restart application.
The most critical data from the whole is the MySQL InnoDB data

Would be interesting to me if you share your thoughts/experience or I 
should look elsewhere


Regards,
Kosta
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] set configuration options in the cephadm age

2022-06-14 Thread Thomas Roth


https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/

talks about changing 'osd_crush_chooseleaf_type' before creating monitors or OSDs, for the special 
case of a 1-node-cluster.


However, the documentation fails to explain how/where to set this option, seeing that with 'cephadm', 
there is (almost) no /etc/ceph/ceph.conf anymore.



If you search the web for various errors in Ceph, you will come across clever people explaining you 
how to manipulate the DB on the fly, for example "ceph tell mon.* injectargs...".
There should be a paragraph in the documentation mentioning this, along with the corresponding 
paragraph on setting options permanently...



In fact, I would just to have the failure domain 'OSD' instead of 'host'.
Any clever way of doing that?


Regards,
Thomas

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: something wrong with my monitor database ?

2022-06-14 Thread Eric Le Lay


Le 13/06/2022 à 18:37, Stefan Kooman a écrit :
CAUTION: This email originated from outside the organization. Do not 
click links or open attachments unless you recognize the sender and 
know the content is safe.


On 6/13/22 18:21, Eric Le Lay wrote:



Those objects are deleted but have snapshots, even if the pool itself
doesn't have snapshots.
What could cause that?


root@hpc1a:~# rados -p storage stat
rbd_data.5b423b48a4643f.0006a4e5
  error stat-ing storage/rbd_data.5b423b48a4643f.0006a4e5: (2)
No such file or directory
root@hpc1a:~# rados -p storage lssnap
0 snaps
root@hpc1a:~# rados -p storage listsnaps
rbd_data.5b423b48a4643f.0006a4e5
rbd_data.5b423b48a4643f.0006a4e5:
cloneid    snaps    size    overlap
1160    1160    4194304
[1048576~32768,1097728~16384,1228800~16384,1409024~16384,1441792~16384,1572864~16384,1720320~16384,1900544~16384,2310144~16384] 



1364    1364    4194304    []


Do the OSDs still need to trim the snapshots? Does data usage decline
over time?

Gr. Stefan



thanks Stefan for your time!

Snaptrims were re-enabled a week ago but the OSDs only snaptrim newly 
deleted snapshots.

restarting or outing an OSD doesn't trigger them either.

Crush-reweighting to 0 an OSD indeeds results in more storage being used!

I'll drop the cluster and start again from scratch.

Best,
Eric

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-14 Thread Eugen Block

I found the thread I was referring to [1]. The report was very similar
to yours, apparently the balancer seems to cause the "degraded"
messages, but the thread was not concluded. Maybe a tracker ticket
should be created if it doesn't already exist, I didn't find a ticket
related to that in a quick search.

[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/H4L5VNQJKIDXXNY2TINEGUGOYLUTT5UL/

Zitat von Wesley Dillingham :

Thanks for the reply. I believe regarding "0" vs "0.0" its the same
difference. I will note its not just changing crush weights which induces
this situation. Introducing upmaps manually or via the balancer also causes
the PGs to be degraded instead of the expected remapped PG state.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn

On Mon, Jun 13, 2022 at 9:27 PM Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> wrote:

Isn’t it the correct syntax like this?

ceph osd crush reweight osd.1 0.0 ?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2022. Jun 14., at 0:38, Wesley Dillingham
wrote:

ceph osd crush reweight osd.1 0

--
This message is confidential and is for the sole use of the intended
recipient(s). It may also be privileged or otherwise protected by copyright
or other legal rules. If you have received it by mistake please let us know
by reply email and delete it from your system. It is prohibited to copy
this message or disclose its content to anyone. Any confidentiality or
privilege is not waived or lost by any mistaken delivery or unauthorized
disclosure of the message. All messages sent to and from Agoda may be
monitored to ensure compliance with company policies, to protect the
company's interests and to remove potential malware. Electronic messages
may be intercepted, amended, lost or deleted, or contain viruses.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD crash with "no available blob id" and check for Zombie blobs

2022-06-14 Thread tao song

  Thanks , we have backport some PR to 12.2.12,but the problem remain.Are
there any other fixes?
eg:

> os/bluestore: apply garbage collection against excessive blob count growth

 https://github.com/ceph/ceph/pull/28229

AND the ceph-bluestore-tool fsck/repair

https://github.com/ceph/ceph/pull/38050
> 



Konstantin Shalygin  于2022年6月14日周二 15:37写道：

> Hi,
>
> Many of fixes for "zombie blobs" landed in last Nautilus release
> I suggest to upgrade to last Nautilus version
>
>
> k
>
> > On 14 Jun 2022, at 10:23, tao song  wrote:
> >
> > I have a old Cluster 12.2.12 running bluestore ,use iscsi + RBD in EC
> > pools（k:m=2:1） with ec_overwrites flags. Multiple OSD crashes occurred
> due
> > to assert (0 == "no available blob id").
> > The problems occur periodically when the RBD volume is cyclically
> > overwritten.
> >
> > 2022-05-24 22:08:19.950550 7fcb41894700  1 osd.171 pg_epoch: 47676
> >> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
> >> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
> >> 47666/44416/4714 47676/47676/39511)
> >> [115,16,171]/[115,2147483647,2147483647]p115(0) r=-1 lpr=47676
> >> pi=[44415,47676)/2 crt=44260'8455206 lcod 0'0 remapped NOTIFY mbc={}]
> >> state: transitioning to Stray
> >> 2022-05-24 22:08:20.834007 7fcb41894700  1 osd.171 pg_epoch: 47677
> >> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
> >> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
> >> 47666/44416/4714 47676/47677/39511) [115,16,171]p115(0) r=2 lpr=47677
> >> pi=[44415,47677)/2 crt=44260'8455206 lcod 0'0 unknown NOTIFY mbc={}]
> >> start_peering_interval up [115,16,171] -> [115,16,171], acting
> >> [115,2147483647,2147483647] -> [115,16,171], acting_primary 115(0) ->
> 115,
> >> up_primary 115(0) -> 115, role -1 -> 2, features acting
> 4611087853746454523
> >> upacting 4611087853746454523
> >> 2022-05-24 22:08:20.834073 7fcb41894700  1 osd.171 pg_epoch: 47677
> >> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
> >> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
> >> 47666/44416/4714 47676/47677/39511) [115,16,171]p115(0) r=2 lpr=47677
> >> pi=[44415,47677)/2 crt=44260'8455206 lcod 0'0 unknown NOTIFY mbc={}]
> >> state: transitioning to Stray
> >> 2022-05-24 22:08:22.097055 7fcb3a085700 -1
> >> /ceph-12.2.12/src/os/bluestore/BlueStore.cc: In function 'bid_t
> >> BlueStore::ExtentMap::allocate_spanning_blob_id()' thread 7fcb3a085700
> time
> >> 2022-05-24 22:08:22.091806
> >> /ceph-12.2.12/src/os/bluestore/BlueStore.cc: 2083: FAILED assert(0 ==
> "no
> >> available blob id")
> >>
> >> ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous
> >> (stable)
> >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> const*)+0x110) [0x560e41dbd520]
> >> 2: (()+0x8fce4e) [0x560e41c15e4e]
> >> 3: (BlueStore::ExtentMap::reshard(KeyValueDB*,
> >> std::shared_ptr)+0x13da) [0x560e41c6fc6a]
> >> 4: (BlueStore::_txc_write_nodes(BlueStore::TransContext*,
> >> std::shared_ptr)+0x1ab) [0x560e41c7131b]
> >> 5: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> >> std::vector >> std::allocator >&,
> >> boost::intrusive_ptr, ThreadPool::TPHandle*)+0x3fd)
> >> [0x560e41c8cc4d]
> >> 6:
> >> (PrimaryLogPG::queue_transactions(std::vector >> std::allocator >&,
> >> boost::intrusive_ptr)+0x65) [0x560e419efac5]
> >> 7: (ECBackend::handle_sub_write(pg_shard_t,
> >> boost::intrusive_ptr, ECSubWrite&, ZTracer::Trace const&,
> >> Context*)+0x631) [0x560e41b18331]
> >> 8: (ECBackend::_handle_message(boost::intrusive_ptr)+0x349)
> >> [0x560e41b29ba9]
> >> 9: (PGBackend::handle_message(boost::intrusive_ptr)+0x50)
> >> [0x560e41a255f0]
> >> 10: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
> >> ThreadPool::TPHandle&)+0x59c) [0x560e4198f97c]
> >> 11: (OSD::dequeue_op(boost::intrusive_ptr,
> >> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9)
> >> [0x560e4180af59]
> >> 12: (PGQueueable::RunVis::operator()(boost::intrusive_ptr
> >> const&)+0x57) [0x560e41a9ac27]
> >> 13: (OSD::ShardedOpWQ::_process(unsigned int,
> >> ceph::heartbeat_handle_d*)+0xfce) [0x560e4183a20e]
> >> 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x83f)
> >> [0x560e41dc304f]
> >> 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> [0x560e41dc4fe0]
> >> 16: (()+0x7dd5) [0x7fcb5913fdd5]
> >> 17: (clone()+0x6d) [0x7fcb5822fead]
> >> NOTE: a copy of the executable, or `objdump -rdS ` is needed
> >> to interpret this.
> >>
> >
> > Some would not restart with "no available blob id" assertion.We adjust
> the
> > following parameters to to ensure that the OSD can be started
> >  bluestore_extent_map_shard_target_size=2000 default 500，
> >  bluestore_extent_map_shard_target_size_slop=0.30 default 0.20，
> >
> >
> >> We found several related bugs :
> >> https://tracker.ceph.com/issues/48216
> >> https://tracker.ceph.com/issues/38272
> >
> > The PR :
> >
> >

[ceph-users] Re: OSD crash with "no available blob id" and check for Zombie blobs

2022-06-14 Thread Konstantin Shalygin

Hi,

Many of fixes for "zombie blobs" landed in last Nautilus release
I suggest to upgrade to last Nautilus version


k

> On 14 Jun 2022, at 10:23, tao song  wrote:
> 
> I have a old Cluster 12.2.12 running bluestore ,use iscsi + RBD in EC
> pools（k:m=2:1） with ec_overwrites flags. Multiple OSD crashes occurred due
> to assert (0 == "no available blob id").
> The problems occur periodically when the RBD volume is cyclically
> overwritten.
> 
> 2022-05-24 22:08:19.950550 7fcb41894700  1 osd.171 pg_epoch: 47676
>> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
>> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
>> 47666/44416/4714 47676/47676/39511)
>> [115,16,171]/[115,2147483647,2147483647]p115(0) r=-1 lpr=47676
>> pi=[44415,47676)/2 crt=44260'8455206 lcod 0'0 remapped NOTIFY mbc={}]
>> state: transitioning to Stray
>> 2022-05-24 22:08:20.834007 7fcb41894700  1 osd.171 pg_epoch: 47677
>> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
>> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
>> 47666/44416/4714 47676/47677/39511) [115,16,171]p115(0) r=2 lpr=47677
>> pi=[44415,47677)/2 crt=44260'8455206 lcod 0'0 unknown NOTIFY mbc={}]
>> start_peering_interval up [115,16,171] -> [115,16,171], acting
>> [115,2147483647,2147483647] -> [115,16,171], acting_primary 115(0) -> 115,
>> up_primary 115(0) -> 115, role -1 -> 2, features acting 4611087853746454523
>> upacting 4611087853746454523
>> 2022-05-24 22:08:20.834073 7fcb41894700  1 osd.171 pg_epoch: 47677
>> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
>> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
>> 47666/44416/4714 47676/47677/39511) [115,16,171]p115(0) r=2 lpr=47677
>> pi=[44415,47677)/2 crt=44260'8455206 lcod 0'0 unknown NOTIFY mbc={}]
>> state: transitioning to Stray
>> 2022-05-24 22:08:22.097055 7fcb3a085700 -1
>> /ceph-12.2.12/src/os/bluestore/BlueStore.cc: In function 'bid_t
>> BlueStore::ExtentMap::allocate_spanning_blob_id()' thread 7fcb3a085700 time
>> 2022-05-24 22:08:22.091806
>> /ceph-12.2.12/src/os/bluestore/BlueStore.cc: 2083: FAILED assert(0 == "no
>> available blob id")
>> 
>> ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous
>> (stable)
>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x110) [0x560e41dbd520]
>> 2: (()+0x8fce4e) [0x560e41c15e4e]
>> 3: (BlueStore::ExtentMap::reshard(KeyValueDB*,
>> std::shared_ptr)+0x13da) [0x560e41c6fc6a]
>> 4: (BlueStore::_txc_write_nodes(BlueStore::TransContext*,
>> std::shared_ptr)+0x1ab) [0x560e41c7131b]
>> 5: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>> std::vector> std::allocator >&,
>> boost::intrusive_ptr, ThreadPool::TPHandle*)+0x3fd)
>> [0x560e41c8cc4d]
>> 6:
>> (PrimaryLogPG::queue_transactions(std::vector> std::allocator >&,
>> boost::intrusive_ptr)+0x65) [0x560e419efac5]
>> 7: (ECBackend::handle_sub_write(pg_shard_t,
>> boost::intrusive_ptr, ECSubWrite&, ZTracer::Trace const&,
>> Context*)+0x631) [0x560e41b18331]
>> 8: (ECBackend::_handle_message(boost::intrusive_ptr)+0x349)
>> [0x560e41b29ba9]
>> 9: (PGBackend::handle_message(boost::intrusive_ptr)+0x50)
>> [0x560e41a255f0]
>> 10: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
>> ThreadPool::TPHandle&)+0x59c) [0x560e4198f97c]
>> 11: (OSD::dequeue_op(boost::intrusive_ptr,
>> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9)
>> [0x560e4180af59]
>> 12: (PGQueueable::RunVis::operator()(boost::intrusive_ptr
>> const&)+0x57) [0x560e41a9ac27]
>> 13: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0xfce) [0x560e4183a20e]
>> 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x83f)
>> [0x560e41dc304f]
>> 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x560e41dc4fe0]
>> 16: (()+0x7dd5) [0x7fcb5913fdd5]
>> 17: (clone()+0x6d) [0x7fcb5822fead]
>> NOTE: a copy of the executable, or `objdump -rdS ` is needed
>> to interpret this.
>> 
> 
> Some would not restart with "no available blob id" assertion.We adjust the
> following parameters to to ensure that the OSD can be started
>  bluestore_extent_map_shard_target_size=2000 default 500，
>  bluestore_extent_map_shard_target_size_slop=0.30 default 0.20，
> 
> 
>> We found several related bugs :
>> https://tracker.ceph.com/issues/48216
>> https://tracker.ceph.com/issues/38272
> 
> The PR :
> 
> os/bluestore: apply garbage collection against excessive blob count growth
> 
> https://github.com/ceph/ceph/pull/28229
>> we have backport the PR to 12.2.12,but  it didn't solve the problem.
> 
> The workaround that works is to fsck / repair the stopped OSD :
>> ceph-bluestore-tool --path /var/lib/ceph/osd/ceph- --command
>> repair
>> 
> But it's not a long term solution.
>> I have seen a PR merged in 2019 here :
>> https://github.com/ceph/ceph/pull/28229
> 
> The fsck log:
> 
>> 2022-06-11 14:33:00.524108 7ff94ce7eec0 -1
>> bluestore(/var/lib/ceph/osd/ceph-162/) fsck error:
>>

[ceph-users] OSD crash with "no available blob id" and check for Zombie blobs

2022-06-14 Thread tao song

I have a old Cluster 12.2.12 running bluestore ,use iscsi + RBD in EC
pools（k:m=2:1） with ec_overwrites flags. Multiple OSD crashes occurred due
to assert (0 == "no available blob id").
The problems occur periodically when the RBD volume is cyclically
overwritten.

2022-05-24 22:08:19.950550 7fcb41894700  1 osd.171 pg_epoch: 47676
> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
> 47666/44416/4714 47676/47676/39511)
> [115,16,171]/[115,2147483647,2147483647]p115(0) r=-1 lpr=47676
> pi=[44415,47676)/2 crt=44260'8455206 lcod 0'0 remapped NOTIFY mbc={}]
> state: transitioning to Stray
> 2022-05-24 22:08:20.834007 7fcb41894700  1 osd.171 pg_epoch: 47677
> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
> 47666/44416/4714 47676/47677/39511) [115,16,171]p115(0) r=2 lpr=47677
> pi=[44415,47677)/2 crt=44260'8455206 lcod 0'0 unknown NOTIFY mbc={}]
> start_peering_interval up [115,16,171] -> [115,16,171], acting
> [115,2147483647,2147483647] -> [115,16,171], acting_primary 115(0) -> 115,
> up_primary 115(0) -> 115, role -1 -> 2, features acting 4611087853746454523
> upacting 4611087853746454523
> 2022-05-24 22:08:20.834073 7fcb41894700  1 osd.171 pg_epoch: 47677
> pg[4.1467s2( v 44365'8455207 (44045'8453660,44365'8455207]
> local-lis/les=44415/44416 n=21866 ec=16123/349 lis/c 47665/44415 les/c/f
> 47666/44416/4714 47676/47677/39511) [115,16,171]p115(0) r=2 lpr=47677
> pi=[44415,47677)/2 crt=44260'8455206 lcod 0'0 unknown NOTIFY mbc={}]
> state: transitioning to Stray
> 2022-05-24 22:08:22.097055 7fcb3a085700 -1
> /ceph-12.2.12/src/os/bluestore/BlueStore.cc: In function 'bid_t
> BlueStore::ExtentMap::allocate_spanning_blob_id()' thread 7fcb3a085700 time
> 2022-05-24 22:08:22.091806
> /ceph-12.2.12/src/os/bluestore/BlueStore.cc: 2083: FAILED assert(0 == "no
> available blob id")
>
>  ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x110) [0x560e41dbd520]
>  2: (()+0x8fce4e) [0x560e41c15e4e]
>  3: (BlueStore::ExtentMap::reshard(KeyValueDB*,
> std::shared_ptr)+0x13da) [0x560e41c6fc6a]
>  4: (BlueStore::_txc_write_nodes(BlueStore::TransContext*,
> std::shared_ptr)+0x1ab) [0x560e41c7131b]
>  5: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> std::vector std::allocator >&,
> boost::intrusive_ptr, ThreadPool::TPHandle*)+0x3fd)
> [0x560e41c8cc4d]
>  6:
> (PrimaryLogPG::queue_transactions(std::vector std::allocator >&,
> boost::intrusive_ptr)+0x65) [0x560e419efac5]
>  7: (ECBackend::handle_sub_write(pg_shard_t,
> boost::intrusive_ptr, ECSubWrite&, ZTracer::Trace const&,
> Context*)+0x631) [0x560e41b18331]
>  8: (ECBackend::_handle_message(boost::intrusive_ptr)+0x349)
> [0x560e41b29ba9]
>  9: (PGBackend::handle_message(boost::intrusive_ptr)+0x50)
> [0x560e41a255f0]
>  10: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
> ThreadPool::TPHandle&)+0x59c) [0x560e4198f97c]
>  11: (OSD::dequeue_op(boost::intrusive_ptr,
> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9)
> [0x560e4180af59]
>  12: (PGQueueable::RunVis::operator()(boost::intrusive_ptr
> const&)+0x57) [0x560e41a9ac27]
>  13: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0xfce) [0x560e4183a20e]
>  14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x83f)
> [0x560e41dc304f]
>  15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x560e41dc4fe0]
>  16: (()+0x7dd5) [0x7fcb5913fdd5]
>  17: (clone()+0x6d) [0x7fcb5822fead]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>

Some would not restart with "no available blob id" assertion.We adjust the
following parameters to to ensure that the OSD can be started
  bluestore_extent_map_shard_target_size=2000 default 500，
  bluestore_extent_map_shard_target_size_slop=0.30 default 0.20，


> We found several related bugs :
> https://tracker.ceph.com/issues/48216
> https://tracker.ceph.com/issues/38272

 The PR :

 os/bluestore: apply garbage collection against excessive blob count growth

 https://github.com/ceph/ceph/pull/28229
>  we have backport the PR to 12.2.12,but  it didn't solve the problem.

 The workaround that works is to fsck / repair the stopped OSD :
>  ceph-bluestore-tool --path /var/lib/ceph/osd/ceph- --command
>  repair
>
 But it's not a long term solution.
>  I have seen a PR merged in 2019 here :
> https://github.com/ceph/ceph/pull/28229

 The fsck log:

>  2022-06-11 14:33:00.524108 7ff94ce7eec0 -1
> bluestore(/var/lib/ceph/osd/ceph-162/) fsck error:
> 2#4:fbdd648a:::rbd_data.3.3404c86b8b4567.01977896:head# - 1 zombie
> spanning blob(s) found, the first one: Blob(0x5567bd482690 spanning 7
> blob([!~4] csum crc32c/0x1000) use_tracker(0x4*0x1 0x[0,0,0,0])
> SharedBlob(0x5567bd482150 sbid 0x0))
> 2022-06-11 14:33:00.620716

[ceph-users] Re: Copying and renaming pools

2022-06-14 Thread Eugen Block

You asked for advice in your earlier thread [1] and the recommendation
was to simply use a different rule for the images pool to point to the
SSDs. That would have prevented what you're now seeing. IIRC the rbd
children refer to their parents via pool id (not name) so you would
need to fiddle with omapvals. There have been several threads about
this, e.g. [2]. I'm not sure if anything has changed since then but as
I already wrote, the easiest way would have been to simply use a
different crush rule.

[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LPFO2SMQTHTWFB4HENAWWSWLIFPBQXME/#5TN34WE2HYXQSYPMYLEIBK72JDF3RZ7C
[2]
https://ceph-users.ceph.narkive.com/88v7Zjx7/rbd-lost-parents-after-rados-cppool

Zitat von Pardhiv Karri :

Hi,

Our Ceph is used as backend storage for Openstack. We use the "images" pool
for glance and the "compute" pool for instances. We need to migrate our
images pool which is on HDD drives to SSD drives.

I copied all the data from the "images" pool that is on HDD disks to an
"ssdimages" pool that is on SSD disks, made sure the crush rules are all
good. I used "rbd deep copy" to migrate all the objects. Then I renamed the
pools, "images" to "hddimages" and "ssdimages" to "images".

Our Openstack instances are on the "compute" pool. All the instances that
are created using the image show the parent as an image from the images
pool. I thought renaming would point to the new pool that is on SSD disks
with renamed as "images" but now interestingly all the instances rbd
info are now pointing to the parent "hddimages". How to make sure the
parent pointers stay as "images" only instead of modifying to "hddimages"?

Before renaming pools:

lab [root@ctl01 /]# rbd info
compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk
rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk':
size 100GiB in 12800 objects
order 23 (8MiB objects)
block_name_prefix: rbd_data.8f51c347398c89
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Tue Mar 15 21:36:55 2022
parent: images/909e6734-6f84-466a-b2fa-487b73a1f50a@snap
overlap: 10GiB
lab [root@ctl01 /]#

After renaming pools, the parent value autoamitclaly gets modified:
lab [root@ctl01 /]# rbd info
compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk
rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk':
size 100GiB in 12800 objects
order 23 (8MiB objects)
block_name_prefix: rbd_data.8f51c347398c89
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Tue Mar 15 21:36:55 2022
parent: hddimages/909e6734-6f84-466a-b2fa-487b73a1f50a@snap
overlap: 10GiB
lab [root@ctl01 /]#

Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] RGW Bucket Notifications and http push-endpoint

[ceph-users] Re: Announcing go-ceph v0.16.0

[ceph-users] Announcing go-ceph v0.16.0

[ceph-users] Re: Ceph on RHEL 9

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

[ceph-users] Re: Possible to recover deleted files from CephFS?

[ceph-users] Re: Possible to recover deleted files from CephFS?

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

[ceph-users] Re: How suitable is CEPH for....

[ceph-users] Possible to recover deleted files from CephFS?

[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

[ceph-users] How suitable is CEPH for....

[ceph-users] set configuration options in the cephadm age

[ceph-users] Re: something wrong with my monitor database ?

[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

[ceph-users] Re: OSD crash with "no available blob id" and check for Zombie blobs

[ceph-users] Re: OSD crash with "no available blob id" and check for Zombie blobs

[ceph-users] OSD crash with "no available blob id" and check for Zombie blobs

[ceph-users] Re: Copying and renaming pools

19 matches

Site Navigation

Mail list logo

Footer information