[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-13 Thread Szabo, Istvan (Agoda)
Is it possible to extend the block.db lv of that specific osd with lvextend 
command or it needs some special bluestore extend?
I want to extend that lv with the size of the spillover, compact it and migrate 
after.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Tuesday, October 12, 2021 7:15 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; 胡 玮文 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Istvan,

So things with migrations are clear at the moment, right? As I mentioned the 
migrate command in 15.2.14 has a bug which causes corrupted OSD if db->slow 
migration occurs on spilled over OSD. To work around that you might want to 
migrate slow to db first or try manual compaction. Please make sure there is no 
spilled over data left after any of them via bluestore-tool's 
bluestore-bdev-sizes command before proceeding with db->slow migrate...

just a side note - IMO it sounds a bit controversial that you're 
expecting/experiencing better performance without standalone DB and at the same 
time spillovers cause performance issues... Spillover means some data goes to 
main device (which you're trying to achieve by migrating as well) hence it 
would rather improve things... Or the root cause of your performace issues is 
different... Just want to share my thoughts - I don't have any better ideas 
about that so far...



Thanks,

Igor
On 10/12/2021 2:54 PM, Szabo, Istvan (Agoda) wrote:
I’m having 1 billions of objects in the cluster and we are still increasing and 
faced spillovers allover the clusters.
After 15-18 spilledover osds (out of the 42-50) the osds started to die, 
flapping.
Tried to compact manually the spilleovered ones, but didn’t help, however the 
not spilled osds less frequently crashed.
In our design 3 ssd was used 1 nvme for db+wal, but this nvme has 30k iops on 
random write, however the ssds behind this nvme have individually 67k so 
actually the SSDs are faster in write than the nvme which means our config 
suboptimal.

I’ve decided to update the cluster to 15.2.14 to be able to run this 
ceph-volume lvm migrate command and started to use it.

10-20% is the failed migration at the moment, 80-90% is successful.
I want to avoid this spillover in the future so I’ll use bare SSDs as osds 
without wal+db. At the moment my iowait decreased  a lot without nvme drives, I 
just hope didn’t do anything wrong with this migration right?

The failed ones I’m removing from the cluster and add it back after cleaned up.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov <mailto:igor.fedo...@croit.io>
Sent: Tuesday, October 12, 2021 6:45 PM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; 胡 玮文 
<mailto:huw...@outlook.com>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


You mean you run migrate for these 72 OSDs and all of them aren't starting any 
more? Or you just upgraded them to Octopus and experiencing performance issues.

In the latter case and if you have enough space at DB device you might want to 
try to migrate data from slow to db first. Run fsck (just in case) and then 
migrate from DB/WAl back to slow.
Theoretically this should help in avoiding the before-mentioned bug. But  I 
haven't try that personally...

And this wouldn't fix the corrupted OSDs if any though...



Thanks,

Igor
On 10/12/2021 2:36 PM, Szabo, Istvan (Agoda) wrote:
Omg, I’ve already migrated 24x osds in each dc-s (altogether 72).
What should I do then? 12 left (altogether 36). In my case slow device is 
faster in random write iops than the one which is serving it.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---



On 2021. Oct 12., at 13:21, Igor Fedotov 
<mailto:igor.fedo...@croit.io> wrote:
Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Istvan,

you're bitten by

It's not fixed in 15.2.14. This has got a backport to upcoming Oct

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-13 Thread Igor Fedotov

Yes. For DB volume expanding underlying device/lv should be enough...

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

On 10/13/2021 12:03 PM, Szabo, Istvan (Agoda) wrote:


Is it possible to extend the block.db lv of that specific osd with 
lvextend command or it needs some special bluestore extend?


I want to extend that lv with the size of the spillover, compact it 
and migrate after.


Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---

*From:* Igor Fedotov 
*Sent:* Tuesday, October 12, 2021 7:15 PM
*To:* Szabo, Istvan (Agoda) 
*Cc:* ceph-users@ceph.io; 胡 玮文 
*Subject:* Re: [ceph-users] Re: is it possible to remove the db+wal 
from an external device (nvme)


Email received from the internet. If in doubt, don't click any link 
nor open any attachment !




Istvan,

So things with migrations are clear at the moment, right? As I 
mentioned the migrate command in 15.2.14 has a bug which causes 
corrupted OSD if db->slow migration occurs on spilled over OSD. To 
work around that you might want to migrate slow to db first or try 
manual compaction. Please make sure there is no spilled over data left 
after any of them via bluestore-tool's bluestore-bdev-sizes command 
before proceeding with db->slow migrate...


just a side note - IMO it sounds a bit controversial that you're 
expecting/experiencing better performance without standalone DB and at 
the same time spillovers cause performance issues... Spillover means 
some data goes to main device (which you're trying to achieve by 
migrating as well) hence it would rather improve things... Or the root 
cause of your performace issues is different... Just want to share my 
thoughts - I don't have any better ideas about that so far...


Thanks,

Igor

On 10/12/2021 2:54 PM, Szabo, Istvan (Agoda) wrote:

I’m having 1 billions of objects in the cluster and we are still
increasing and faced spillovers allover the clusters.

After 15-18 spilledover osds (out of the 42-50) the osds started
to die, flapping.

Tried to compact manually the spilleovered ones, but didn’t help,
however the not spilled osds less frequently crashed.

In our design 3 ssd was used 1 nvme for db+wal, but this nvme has
30k iops on random write, however the ssds behind this nvme have
individually 67k so actually the SSDs are faster in write than the
nvme which means our config suboptimal.

I’ve decided to update the cluster to 15.2.14 to be able to run
this ceph-volume lvm migrate command and started to use it.

10-20% is the failed migration at the moment, 80-90% is successful.

I want to avoid this spillover in the future so I’ll use bare SSDs
as osds without wal+db. At the moment my iowait decreased  a lot
without nvme drives, I just hope didn’t do anything wrong with
this migration right?

The failed ones I’m removing from the cluster and add it back
after cleaned up.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---

*From:* Igor Fedotov 
<mailto:igor.fedo...@croit.io>
*Sent:* Tuesday, October 12, 2021 6:45 PM
*To:* Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>
*Cc:* ceph-users@ceph.io <mailto:ceph-users@ceph.io>; 胡 玮文
     <mailto:huw...@outlook.com>
    *Subject:* Re: [ceph-users] Re: is it possible to remove the
db+wal from an external device (nvme)

Email received from the internet. If in doubt, don't click any
link nor open any attachment !



You mean you run migrate for these 72 OSDs and all of them aren't
starting any more? Or you just upgraded them to Octopus and
experiencing performance issues.

In the latter case and if you have enough space at DB device you
might want to try to migrate data from slow to db first. Run fsck
(just in case) and then migrate from DB/WAl back to slow.

Theoretically this should help in avoiding the before-mentioned
bug. But  I haven't try that personally...

And this wouldn't fix the corrupted OSDs if any though...

Thanks,

Igor

On 10/12/2021 2:36 PM, Szabo, Istvan (Agoda) wrote:

Omg, I’ve already migrated 24x osds in

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Szabo, Istvan (Agoda)
One more thing, what I’m doing at the moment:

Noout norebalance on 1 host
Stop all osd
Compact all the osds
Migrate the db 1 by 1
Start the osds 1 by 1

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Szabo, Istvan (Agoda)
Sent: Tuesday, October 12, 2021 6:54 PM
To: Igor Fedotov 
Cc: ceph-users@ceph.io; 胡 玮文 
Subject: RE: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

I’m having 1 billions of objects in the cluster and we are still increasing and 
faced spillovers allover the clusters.
After 15-18 spilledover osds (out of the 42-50) the osds started to die, 
flapping.
Tried to compact manually the spilleovered ones, but didn’t help, however the 
not spilled osds less frequently crashed.
In our design 3 ssd was used 1 nvme for db+wal, but this nvme has 30k iops on 
random write, however the ssds behind this nvme have individually 67k so 
actually the SSDs are faster in write than the nvme which means our config 
suboptimal.

I’ve decided to update the cluster to 15.2.14 to be able to run this 
ceph-volume lvm migrate command and started to use it.

10-20% is the failed migration at the moment, 80-90% is successful.
I want to avoid this spillover in the future so I’ll use bare SSDs as osds 
without wal+db. At the moment my iowait decreased  a lot without nvme drives, I 
just hope didn’t do anything wrong with this migration right?

The failed ones I’m removing from the cluster and add it back after cleaned up.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov mailto:igor.fedo...@croit.io>>
Sent: Tuesday, October 12, 2021 6:45 PM
To: Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; 胡 玮文 
mailto:huw...@outlook.com>>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


You mean you run migrate for these 72 OSDs and all of them aren't starting any 
more? Or you just upgraded them to Octopus and experiencing performance issues.

In the latter case and if you have enough space at DB device you might want to 
try to migrate data from slow to db first. Run fsck (just in case) and then 
migrate from DB/WAl back to slow.
Theoretically this should help in avoiding the before-mentioned bug. But  I 
haven't try that personally...

And this wouldn't fix the corrupted OSDs if any though...



Thanks,

Igor
On 10/12/2021 2:36 PM, Szabo, Istvan (Agoda) wrote:
Omg, I’ve already migrated 24x osds in each dc-s (altogether 72).
What should I do then? 12 left (altogether 36). In my case slow device is 
faster in random write iops than the one which is serving it.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

On 2021. Oct 12., at 13:21, Igor Fedotov 
<mailto:igor.fedo...@croit.io> wrote:
Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Istvan,

you're bitten by

It's not fixed in 15.2.14. This has got a backport to upcoming Octopus
minor release. Please do not use 'migrate' command from WAL/DB to slow
volume if some data is already present there...

Thanks,

Igor


On 10/12/2021 12:13 PM, Szabo, Istvan (Agoda) wrote:
Hi Igor,

I’ve attached here, thank you in advance.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Monday, October 11, 2021 10:40 PM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Eugen Block 
<mailto:ebl...@nde.ag>; 胡 玮文 
<mailto:huw...@outlook.com>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


No,

that's just backtrace of the crash - I'd like to see the full OSD log from the 
process startup till the crash instead...

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Szabo, Istvan (Agoda)
I’m having 1 billions of objects in the cluster and we are still increasing and 
faced spillovers allover the clusters.
After 15-18 spilledover osds (out of the 42-50) the osds started to die, 
flapping.
Tried to compact manually the spilleovered ones, but didn’t help, however the 
not spilled osds less frequently crashed.
In our design 3 ssd was used 1 nvme for db+wal, but this nvme has 30k iops on 
random write, however the ssds behind this nvme have individually 67k so 
actually the SSDs are faster in write than the nvme which means our config 
suboptimal.

I’ve decided to update the cluster to 15.2.14 to be able to run this 
ceph-volume lvm migrate command and started to use it.

10-20% is the failed migration at the moment, 80-90% is successful.
I want to avoid this spillover in the future so I’ll use bare SSDs as osds 
without wal+db. At the moment my iowait decreased  a lot without nvme drives, I 
just hope didn’t do anything wrong with this migration right?

The failed ones I’m removing from the cluster and add it back after cleaned up.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Tuesday, October 12, 2021 6:45 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; 胡 玮文 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


You mean you run migrate for these 72 OSDs and all of them aren't starting any 
more? Or you just upgraded them to Octopus and experiencing performance issues.

In the latter case and if you have enough space at DB device you might want to 
try to migrate data from slow to db first. Run fsck (just in case) and then 
migrate from DB/WAl back to slow.
Theoretically this should help in avoiding the before-mentioned bug. But  I 
haven't try that personally...

And this wouldn't fix the corrupted OSDs if any though...



Thanks,

Igor
On 10/12/2021 2:36 PM, Szabo, Istvan (Agoda) wrote:
Omg, I’ve already migrated 24x osds in each dc-s (altogether 72).
What should I do then? 12 left (altogether 36). In my case slow device is 
faster in random write iops than the one which is serving it.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---


On 2021. Oct 12., at 13:21, Igor Fedotov 
<mailto:igor.fedo...@croit.io> wrote:
Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Istvan,

you're bitten by

It's not fixed in 15.2.14. This has got a backport to upcoming Octopus
minor release. Please do not use 'migrate' command from WAL/DB to slow
volume if some data is already present there...

Thanks,

Igor


On 10/12/2021 12:13 PM, Szabo, Istvan (Agoda) wrote:

Hi Igor,

I’ve attached here, thank you in advance.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Monday, October 11, 2021 10:40 PM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Eugen Block 
<mailto:ebl...@nde.ag>; 胡 玮文 
<mailto:huw...@outlook.com>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


No,

that's just backtrace of the crash - I'd like to see the full OSD log from the 
process startup till the crash instead...
On 10/8/2021 4:02 PM, Szabo, Istvan (Agoda) wrote:
Hi Igor,

Here is a bluestore tool fsck output:
https://justpaste.it/7igrb

Is this that you are looking for?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
<mailto:ifedo...@suse.de><mailto:ifedo...@suse.de><mailto:ifedo...@suse.de>
Sent: Tuesday, October 5, 2021 10:02 PM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Szabo, Istvan (Agoda)
Omg, I’ve already migrated 24x osds in each dc-s (altogether 72).
What should I do then? 12 left (altogether 36). In my case slow device is 
faster in random write iops than the one which is serving it.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

On 2021. Oct 12., at 13:21, Igor Fedotov  wrote:

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Istvan,

you're bitten by https://github.com/ceph/ceph/pull/43140

It's not fixed in 15.2.14. This has got a backport to upcoming Octopus
minor release. Please do not use 'migrate' command from WAL/DB to slow
volume if some data is already present there...

Thanks,

Igor


On 10/12/2021 12:13 PM, Szabo, Istvan (Agoda) wrote:
Hi Igor,

I’ve attached here, thank you in advance.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Monday, October 11, 2021 10:40 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; Eugen Block ; 胡 玮文 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


No,

that's just backtrace of the crash - I'd like to see the full OSD log from the 
process startup till the crash instead...
On 10/8/2021 4:02 PM, Szabo, Istvan (Agoda) wrote:
Hi Igor,

Here is a bluestore tool fsck output:
https://justpaste.it/7igrb

Is this that you are looking for?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Tuesday, October 5, 2021 10:02 PM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>; 胡 玮文 
<mailto:huw...@outlook.com>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Eugen Block 
<mailto:ebl...@nde.ag>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.


On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:
Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹
Is there any special thing that we need to do before we migrate db next to the 
block? Our osds are using dmcrypt, is it an issue?

{
"backtrace": [
"(()+0x12b20) [0x7f310aa49b20]",
"(gsignal()+0x10f) [0x7f31096aa37f]",
"(abort()+0x127) [0x7f3109694db5]",
"(()+0x9009b) [0x7f310a06209b]",
"(()+0x9653c) [0x7f310a06853c]",
"(()+0x95559) [0x7f310a067559]",
"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
"(()+0x10b03) [0x7f3109a48b03]",
"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
"(__cxa_throw()+0x3b) [0x7f310a0687eb]",
"(()+0x19fa4) [0x7f310b7b6fa4]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) 
[0x7f310b7d8c96]",
"(()+0x10d0f8e) [0x55ffa520df8e]",
"(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) 
[0x55ffa52efcca]",
"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55ffa52f0568]",
"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
"(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
"(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
"(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
"(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, 
std::__cxx11::basic_string, std::allocator > 
const&, std::vector > const&, 
std::vector >*, rocksdb::DB**, bool)+0x1089) 
[0x55ffa51a57e9]",
"(RocksDBStore::do_open(std::ostream&, bool, bool, 
std::vector 
> const*)+0x14ca) [0x55ffa51285ca]",
"(BlueStore::_op

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Szabo, Istvan (Agoda)
Hi Igor,

I’ve attached here, thank you in advance.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Monday, October 11, 2021 10:40 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; Eugen Block ; 胡 玮文 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


No,

that's just backtrace of the crash - I'd like to see the full OSD log from the 
process startup till the crash instead...
On 10/8/2021 4:02 PM, Szabo, Istvan (Agoda) wrote:
Hi Igor,

Here is a bluestore tool fsck output:
https://justpaste.it/7igrb

Is this that you are looking for?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Tuesday, October 5, 2021 10:02 PM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>; 胡 玮文 
<mailto:huw...@outlook.com>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Eugen Block 
<mailto:ebl...@nde.ag>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.


On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:
Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹
Is there any special thing that we need to do before we migrate db next to the 
block? Our osds are using dmcrypt, is it an issue?

{
"backtrace": [
"(()+0x12b20) [0x7f310aa49b20]",
"(gsignal()+0x10f) [0x7f31096aa37f]",
"(abort()+0x127) [0x7f3109694db5]",
"(()+0x9009b) [0x7f310a06209b]",
"(()+0x9653c) [0x7f310a06853c]",
"(()+0x95559) [0x7f310a067559]",
"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
"(()+0x10b03) [0x7f3109a48b03]",
"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
"(__cxa_throw()+0x3b) [0x7f310a0687eb]",
"(()+0x19fa4) [0x7f310b7b6fa4]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) 
[0x7f310b7d8c96]",
"(()+0x10d0f8e) [0x55ffa520df8e]",
"(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) 
[0x55ffa52efcca]",
"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55ffa52f0568]",
"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
"(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
"(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
"(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
"(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, 
std::__cxx11::basic_string, std::allocator > 
const&, std::vector > const&, 
std::vector >*, rocksdb::DB**, bool)+0x1089) 
[0x55ffa51a57e9]",
"(RocksDBStore::do_open(std::ostream&, bool, bool, 
std::vector 
> const*)+0x14ca) [0x55ffa51285ca]",
"(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",
"(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",
"(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",
"(OSD::init()+0x380) [0x55ffa4753a70]",
"(main()+0x47f1) [0x55ffa46a6901]",
"(__libc_start_main()+0xf3) [0x7f3109696493]",
"(_start()+0x2e) [0x55ffa46d4e3e]"
],
"ceph_version": "15.2.14",
"crash_id": 
"2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",
"entity_name": "osd.48",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-osd",
"stack_sig": 
"6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Igor Fedotov

Istvan,

So things with migrations are clear at the moment, right? As I mentioned 
the migrate command in 15.2.14 has a bug which causes corrupted OSD if 
db->slow migration occurs on spilled over OSD. To work around that you 
might want to migrate slow to db first or try manual compaction. Please 
make sure there is no spilled over data left after any of them via 
bluestore-tool's bluestore-bdev-sizes command before proceeding with 
db->slow migrate...


just a side note - IMO it sounds a bit controversial that you're 
expecting/experiencing better performance without standalone DB and at 
the same time spillovers cause performance issues... Spillover means 
some data goes to main device (which you're trying to achieve by 
migrating as well) hence it would rather improve things... Or the root 
cause of your performace issues is different... Just want to share my 
thoughts - I don't have any better ideas about that so far...



Thanks,

Igor

On 10/12/2021 2:54 PM, Szabo, Istvan (Agoda) wrote:


I’m having 1 billions of objects in the cluster and we are still 
increasing and faced spillovers allover the clusters.


After 15-18 spilledover osds (out of the 42-50) the osds started to 
die, flapping.


Tried to compact manually the spilleovered ones, but didn’t help, 
however the not spilled osds less frequently crashed.


In our design 3 ssd was used 1 nvme for db+wal, but this nvme has 30k 
iops on random write, however the ssds behind this nvme have 
individually 67k so actually the SSDs are faster in write than the 
nvme which means our config suboptimal.


I’ve decided to update the cluster to 15.2.14 to be able to run this 
ceph-volume lvm migrate command and started to use it.


10-20% is the failed migration at the moment, 80-90% is successful.

I want to avoid this spillover in the future so I’ll use bare SSDs as 
osds without wal+db. At the moment my iowait decreased  a lot without 
nvme drives, I just hope didn’t do anything wrong with this migration 
right?


The failed ones I’m removing from the cluster and add it back after 
cleaned up.


Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---

*From:* Igor Fedotov 
*Sent:* Tuesday, October 12, 2021 6:45 PM
*To:* Szabo, Istvan (Agoda) 
*Cc:* ceph-users@ceph.io; 胡 玮文 
*Subject:* Re: [ceph-users] Re: is it possible to remove the db+wal 
from an external device (nvme)


Email received from the internet. If in doubt, don't click any link 
nor open any attachment !




You mean you run migrate for these 72 OSDs and all of them aren't 
starting any more? Or you just upgraded them to Octopus and 
experiencing performance issues.


In the latter case and if you have enough space at DB device you might 
want to try to migrate data from slow to db first. Run fsck (just in 
case) and then migrate from DB/WAl back to slow.


Theoretically this should help in avoiding the before-mentioned bug. 
But  I haven't try that personally...


And this wouldn't fix the corrupted OSDs if any though...

Thanks,

Igor

On 10/12/2021 2:36 PM, Szabo, Istvan (Agoda) wrote:

Omg, I’ve already migrated 24x osds in each dc-s (altogether 72).

What should I do then? 12 left (altogether 36). In my case slow
device is faster in random write iops than the one which is
serving it.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---



On 2021. Oct 12., at 13:21, Igor Fedotov
 <mailto:igor.fedo...@croit.io> wrote:

Email received from the internet. If in doubt, don't click
any link nor open any attachment !


Istvan,

you're bitten by

It's not fixed in 15.2.14. This has got a backport to upcoming
Octopus
minor release. Please do not use 'migrate' command from WAL/DB
to slow
volume if some data is already present there...

Thanks,

Igor


On 10/12/2021 12:13 PM, Szabo, Istvan (Agoda) wrote:

Hi Igor,

I’ve attached here, thank you in advance.

Istvan Szabo

Senior Infrastructure Engineer

---

Agoda Services Co., Ltd.

e: istvan.sz...@agoda.com
<mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>
<mailto:istvan.sz...@agoda.com>

---


[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Igor Fedotov
You mean you run migrate for these 72 OSDs and all of them aren't 
starting any more? Or you just upgraded them to Octopus and experiencing 
performance issues.


In the latter case and if you have enough space at DB device you might 
want to try to migrate data from slow to db first. Run fsck (just in 
case) and then migrate from DB/WAl back to slow.


Theoretically this should help in avoiding the before-mentioned bug. 
But  I haven't try that personally...


And this wouldn't fix the corrupted OSDs if any though...


Thanks,

Igor

On 10/12/2021 2:36 PM, Szabo, Istvan (Agoda) wrote:

Omg, I’ve already migrated 24x osds in each dc-s (altogether 72).
What should I do then? 12 left (altogether 36). In my case slow device 
is faster in random write iops than the one which is serving it.


Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---


On 2021. Oct 12., at 13:21, Igor Fedotov  wrote:

Email received from the internet. If in doubt, don't click any link 
nor open any attachment !



Istvan,

you're bitten by

It's not fixed in 15.2.14. This has got a backport to upcoming Octopus
minor release. Please do not use 'migrate' command from WAL/DB to slow
volume if some data is already present there...

Thanks,

Igor


On 10/12/2021 12:13 PM, Szabo, Istvan (Agoda) wrote:

Hi Igor,

I’ve attached here, thank you in advance.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Monday, October 11, 2021 10:40 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; Eugen Block ; 胡 玮文 

Subject: Re: [ceph-users] Re: is it possible to remove the db+wal 
from an external device (nvme)


Email received from the internet. If in doubt, don't click any link 
nor open any attachment !



No,

that's just backtrace of the crash - I'd like to see the full OSD 
log from the process startup till the crash instead...

On 10/8/2021 4:02 PM, Szabo, Istvan (Agoda) wrote:
Hi Igor,

Here is a bluestore tool fsck output:
https://justpaste.it/7igrb

Is this that you are looking for?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Tuesday, October 5, 2021 10:02 PM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>; 胡 玮文 
<mailto:huw...@outlook.com>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Eugen Block 
<mailto:ebl...@nde.ag>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal 
from an external device (nvme)


Email received from the internet. If in doubt, don't click any link 
nor open any attachment !



Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.


On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:
Hmm, tried another one which hasn’t been spilledover disk, still 
coredumped ☹
Is there any special thing that we need to do before we migrate db 
next to the block? Our osds are using dmcrypt, is it an issue?


{
"backtrace": [
"(()+0x12b20) [0x7f310aa49b20]",
"(gsignal()+0x10f) [0x7f31096aa37f]",
"(abort()+0x127) [0x7f3109694db5]",
"(()+0x9009b) [0x7f310a06209b]",
"(()+0x9653c) [0x7f310a06853c]",
"(()+0x95559) [0x7f310a067559]",
"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
"(()+0x10b03) [0x7f3109a48b03]",
"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
"(__cxa_throw()+0x3b) [0x7f310a0687eb]",
"(()+0x19fa4) [0x7f310b7b6fa4]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned 
long)+0x146) [0x7f310b7d8c96]",

"(()+0x10d0f8e) [0x55ffa520df8e]",
"(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) 
[0x55ffa52efcca]",
"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) 
[0x55ffa52f0568]",

"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
"(rocksdb::DBImpl::CloseHelper()+0x616) [0x

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-12 Thread Igor Fedotov

Istvan,

you're bitten by https://github.com/ceph/ceph/pull/43140

It's not fixed in 15.2.14. This has got a backport to upcoming Octopus 
minor release. Please do not use 'migrate' command from WAL/DB to slow 
volume if some data is already present there...


Thanks,

Igor


On 10/12/2021 12:13 PM, Szabo, Istvan (Agoda) wrote:

Hi Igor,

I’ve attached here, thank you in advance.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Monday, October 11, 2021 10:40 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; Eugen Block ; 胡 玮文 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


No,

that's just backtrace of the crash - I'd like to see the full OSD log from the 
process startup till the crash instead...
On 10/8/2021 4:02 PM, Szabo, Istvan (Agoda) wrote:
Hi Igor,

Here is a bluestore tool fsck output:
https://justpaste.it/7igrb

Is this that you are looking for?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Tuesday, October 5, 2021 10:02 PM
To: Szabo, Istvan (Agoda) <mailto:istvan.sz...@agoda.com>; 胡 玮文 
<mailto:huw...@outlook.com>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>; Eugen Block 
<mailto:ebl...@nde.ag>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.


On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:
Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹
Is there any special thing that we need to do before we migrate db next to the 
block? Our osds are using dmcrypt, is it an issue?

{
 "backtrace": [
 "(()+0x12b20) [0x7f310aa49b20]",
 "(gsignal()+0x10f) [0x7f31096aa37f]",
 "(abort()+0x127) [0x7f3109694db5]",
 "(()+0x9009b) [0x7f310a06209b]",
 "(()+0x9653c) [0x7f310a06853c]",
 "(()+0x95559) [0x7f310a067559]",
 "(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
 "(()+0x10b03) [0x7f3109a48b03]",
 "(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
 "(__cxa_throw()+0x3b) [0x7f310a0687eb]",
 "(()+0x19fa4) [0x7f310b7b6fa4]",
 "(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) 
[0x7f310b7d8c96]",
 "(()+0x10d0f8e) [0x55ffa520df8e]",
 "(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
 "(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
 "(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) 
[0x55ffa52efcca]",
 "(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55ffa52f0568]",
 "(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
 "(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
 "(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
 "(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
 "(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
 "(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, std::__cxx11::basic_string, 
std::allocator > const&, std::vector > const&, std::vector >*, rocksdb::DB**, bool)+0x1089) [0x55ffa51a57e9]",
 "(RocksDBStore::do_open(std::ostream&, bool, bool, 
std::vector > 
const*)+0x14ca) [0x55ffa51285ca]",
 "(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",
 "(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",
 "(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",
 "(OSD::init()+0x380) [0x55ffa4753a70]",
 "(main()+0x47f1) [0x55ffa46a6901]",
 "(__libc_start_main()+0xf3) [0x7f3109696493]",
 "(_start()+0x2e) [0x55ffa46d4e3e]"
 ],
 "ceph_version": "15.2.14",
 "crash_id": 
"2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff3

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-11 Thread Igor Fedotov

No,

that's just backtrace of the crash - I'd like to see the full OSD log 
from the process startup till the crash instead...


On 10/8/2021 4:02 PM, Szabo, Istvan (Agoda) wrote:


Hi Igor,

Here is a bluestore tool fsck output:

https://justpaste.it/7igrb <https://justpaste.it/7igrb>

Is this that you are looking for?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---

*From:*Igor Fedotov 
*Sent:* Tuesday, October 5, 2021 10:02 PM
*To:* Szabo, Istvan (Agoda) ; 
胡玮文

*Cc:* ceph-users@ceph.io; Eugen Block 
*Subject:* Re: [ceph-users] Re: is it possible to remove the db+wal 
from an external device (nvme)


Email received from the internet. If in doubt, don't click any link 
nor open any attachment !




Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.

On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:

Hmm, tried another one which hasn’t been spilledover disk, still
coredumped ☹

Is there any special thing that we need to do before we migrate db
next to the block? Our osds are using dmcrypt, is it an issue?

{

"backtrace": [

"(()+0x12b20) [0x7f310aa49b20]",

"(gsignal()+0x10f) [0x7f31096aa37f]",

"(abort()+0x127) [0x7f3109694db5]",

"(()+0x9009b) [0x7f310a06209b]",

"(()+0x9653c) [0x7f310a06853c]",

"(()+0x95559) [0x7f310a067559]",

"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",

"(()+0x10b03) [0x7f3109a48b03]",

"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",

"(__cxa_throw()+0x3b) [0x7f310a0687eb]",

"(()+0x19fa4) [0x7f310b7b6fa4]",

"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146)
[0x7f310b7d8c96]",

"(()+0x10d0f8e) [0x55ffa520df8e]",

  "(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",

"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",

"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a)
[0x55ffa52efcca]",

"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88)
[0x55ffa52f0568]",

"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",

"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",

"(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",

"(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",

"(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",

"(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&,
std::__cxx11::basic_string,
std::allocator > const&,
std::vector > const&,
std::vector >*, rocksdb::DB**,
bool)+0x1089) [0x55ffa51a57e9]",

"(RocksDBStore::do_open(std::ostream&, bool, bool,
std::vector > const*)+0x14ca)
[0x55ffa51285ca]",

"(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",

"(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",

"(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",

"(OSD::init()+0x380) [0x55ffa4753a70]",

"(main()+0x47f1) [0x55ffa46a6901]",

"(__libc_start_main()+0xf3) [0x7f3109696493]",

"(_start()+0x2e) [0x55ffa46d4e3e]"

],

"ceph_version": "15.2.14",

"crash_id":
"2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",

"entity_name": "osd.48",

"os_id": "centos",

"os_name": "CentOS Linux",

"os_version": "8",

"os_version_id": "8",

"process_name": "ceph-osd",

"stack_sig":
"6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81ddbf",

"timestamp": "2021-10-05T13:31:28.513463Z",

"utsname_hostname": "server-2s07",

"utsname_machine": "x86_64",

"utsname_release": "4.18.0-305.19.1.el8_4.x86_64",

    "utsname_sysname": "Linux",

    "utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021"

}

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---

*From:*胡玮文 <mailto:huw...@outlook.com>
*Sent:* Monday

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-08 Thread Szabo, Istvan (Agoda)
Hi Igor,

Here is a bluestore tool fsck output:
https://justpaste.it/7igrb

Is this that you are looking for?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Tuesday, October 5, 2021 10:02 PM
To: Szabo, Istvan (Agoda) ; 胡 玮文 
Cc: ceph-users@ceph.io; Eugen Block 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.


On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:
Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹
Is there any special thing that we need to do before we migrate db next to the 
block? Our osds are using dmcrypt, is it an issue?

{
"backtrace": [
"(()+0x12b20) [0x7f310aa49b20]",
"(gsignal()+0x10f) [0x7f31096aa37f]",
"(abort()+0x127) [0x7f3109694db5]",
"(()+0x9009b) [0x7f310a06209b]",
"(()+0x9653c) [0x7f310a06853c]",
"(()+0x95559) [0x7f310a067559]",
"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
"(()+0x10b03) [0x7f3109a48b03]",
"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
"(__cxa_throw()+0x3b) [0x7f310a0687eb]",
"(()+0x19fa4) [0x7f310b7b6fa4]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) 
[0x7f310b7d8c96]",
"(()+0x10d0f8e) [0x55ffa520df8e]",
"(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) 
[0x55ffa52efcca]",
"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55ffa52f0568]",
"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
"(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
"(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
"(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
"(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, 
std::__cxx11::basic_string, std::allocator > 
const&, std::vector > const&, 
std::vector >*, rocksdb::DB**, bool)+0x1089) 
[0x55ffa51a57e9]",
"(RocksDBStore::do_open(std::ostream&, bool, bool, 
std::vector 
> const*)+0x14ca) [0x55ffa51285ca]",
"(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",
"(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",
"(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",
"(OSD::init()+0x380) [0x55ffa4753a70]",
"(main()+0x47f1) [0x55ffa46a6901]",
"(__libc_start_main()+0xf3) [0x7f3109696493]",
"(_start()+0x2e) [0x55ffa46d4e3e]"
],
"ceph_version": "15.2.14",
"crash_id": 
"2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",
"entity_name": "osd.48",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-osd",
"stack_sig": 
"6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81ddbf",
"timestamp": "2021-10-05T13:31:28.513463Z",
"utsname_hostname": "server-2s07",
"utsname_machine": "x86_64",
"utsname_release": "4.18.0-305.19.1.el8_4.x86_64",
    "utsname_sysname": "Linux",
"utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021"
}
Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: 胡 玮文 <mailto:huw...@outlook.com>
Sent: Monday, October 4, 2021 12:13 AM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>; Igor Fedotov 
<mailto:ifedo...@suse.de>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: 回复: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email rec

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Szabo, Istvan (Agoda)
This unable to load table properties also interesting before caught signal:

  -16> 2021-10-05T20:31:28.484+0700 7f310cce5f00 2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 247222 --- 
NotFound:


   -15> 2021-10-05T20:31:28.484+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 251966 --- 
NotFound:

   -14> 2021-10-05T20:31:28.484+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 247508 --- 
NotFound:

   -13> 2021-10-05T20:31:28.484+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 252237 --- 
NotFound:

   -12> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 249610 --- 
NotFound:

   -11> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 251798 --- 
NotFound:

   -10> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 251799 --- 
NotFound:

-9> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 252235 --- 
NotFound:

-8> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 252236 --- 
NotFound:

-7> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 244769 --- 
NotFound:

-6> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 242684 --- 
NotFound:

-5> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 241854 --- 
NotFound:

-4> 2021-10-05T20:31:28.486+0700 7f310cce5f00  2 rocksdb: 
[db/version_set.cc:1362] Unable to load table properties for file 241191 --- 
NotFound:

-3> 2021-10-05T20:31:28.492+0700 7f310cce5f00  4 rocksdb: 
[db/version_set.cc:3757] Recovered from manifest file:db/MANIFEST-241072 
succeeded,manifest_file_number is 241072, next_file_number is 252389, 
last_sequence is 5847989279, log_number is 252336,prev_log_number is 
0,max_column_family is 0,min_log_number_to_keep is 0

-2> 2021-10-05T20:31:28.492+0700 7f310cce5f00  4 rocksdb: 
[db/version_set.cc:3766] Column family [default] (ID 0), log number is 252336

-1> 2021-10-05T20:31:28.501+0700 7f310cce5f00  4 rocksdb: 
[db/db_impl.cc:390] Shutdown: canceling all background work
 0> 2021-10-05T20:31:28.512+0700 7f310cce5f00 -1 *** Caught signal 
(Aborted) **
 in thread 7f310cce5f00 thread_name:ceph-osd




Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

On 2021. Oct 5., at 17:19, Szabo, Istvan (Agoda)  wrote:


Hmm, I’ve removed from the cluster, now data rebalance, I’ll do with the next 
one ☹

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Tuesday, October 5, 2021 10:02 PM
To: Szabo, Istvan (Agoda) ; 胡 玮文 
Cc: ceph-users@ceph.io; Eugen Block 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.


On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:
Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹
Is there any special thing that we need to do before we migrate db next to the 
block? Our osds are using dmcrypt, is it an issue?

{
"backtrace": [
"(()+0x12b20) [0x7f310aa49b20]",
"(gsignal()+0x10f) [0x7f31096aa37f]",
"(abort()+0x127) [0x7f3109694db5]",
"(()+0x9009b) [0x7f310a06209b]",
"(()+0x9653c) [0x7f310a06853c]",
"(()+0x95559) [0x7f310a067559]",
"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
"(()+0x10b03) [0x7f3109a48b03]",
"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
"(__cxa_throw()+0x3b) [0x7f310a0687eb]",
"(()+0x19fa4) [0x7f310b7b6fa4]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) 
[0x7f310b7d8c96]",
"(()+0x10d0f8e) [0x55ffa520df8e]"

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Szabo, Istvan (Agoda)
Hmm, I’ve removed from the cluster, now data rebalance, I’ll do with the next 
one ☹

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Tuesday, October 5, 2021 10:02 PM
To: Szabo, Istvan (Agoda) ; 胡 玮文 
Cc: ceph-users@ceph.io; Eugen Block 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.


On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:
Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹
Is there any special thing that we need to do before we migrate db next to the 
block? Our osds are using dmcrypt, is it an issue?

{
"backtrace": [
"(()+0x12b20) [0x7f310aa49b20]",
"(gsignal()+0x10f) [0x7f31096aa37f]",
"(abort()+0x127) [0x7f3109694db5]",
"(()+0x9009b) [0x7f310a06209b]",
"(()+0x9653c) [0x7f310a06853c]",
"(()+0x95559) [0x7f310a067559]",
"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
"(()+0x10b03) [0x7f3109a48b03]",
"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
"(__cxa_throw()+0x3b) [0x7f310a0687eb]",
"(()+0x19fa4) [0x7f310b7b6fa4]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) 
[0x7f310b7d8c96]",
"(()+0x10d0f8e) [0x55ffa520df8e]",
"(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) 
[0x55ffa52efcca]",
"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55ffa52f0568]",
"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
"(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
"(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
"(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
"(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, 
std::__cxx11::basic_string, std::allocator > 
const&, std::vector > const&, 
std::vector >*, rocksdb::DB**, bool)+0x1089) 
[0x55ffa51a57e9]",
"(RocksDBStore::do_open(std::ostream&, bool, bool, 
std::vector 
> const*)+0x14ca) [0x55ffa51285ca]",
"(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",
"(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",
"(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",
"(OSD::init()+0x380) [0x55ffa4753a70]",
"(main()+0x47f1) [0x55ffa46a6901]",
"(__libc_start_main()+0xf3) [0x7f3109696493]",
"(_start()+0x2e) [0x55ffa46d4e3e]"
],
"ceph_version": "15.2.14",
"crash_id": 
"2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",
"entity_name": "osd.48",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-osd",
"stack_sig": 
"6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81ddbf",
"timestamp": "2021-10-05T13:31:28.513463Z",
"utsname_hostname": "server-2s07",
"utsname_machine": "x86_64",
"utsname_release": "4.18.0-305.19.1.el8_4.x86_64",
    "utsname_sysname": "Linux",
"utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021"
}
Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: 胡 玮文 <mailto:huw...@outlook.com>
Sent: Monday, October 4, 2021 12:13 AM
To: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>; Igor Fedotov 
<mailto:ifedo...@suse.de>
Cc: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: 回复: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. 

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Szabo, Istvan (Agoda)
This one is in messages: https://justpaste.it/3x08z

Buffered_io is turned on by default in 15.2.14 octopus FYI.


Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Eugen Block  
Sent: Tuesday, October 5, 2021 9:52 PM
To: Szabo, Istvan (Agoda) 
Cc: 胡 玮文 ; Igor Fedotov ; 
ceph-users@ceph.io
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Do you see oom killers in dmesg on this host? This line indicates it:

  "(tcmalloc::allocate_full_cpp_throw_oom(unsigned
long)+0x146) [0x7f310b7d8c96]",


Zitat von "Szabo, Istvan (Agoda)" :

> Hmm, tried another one which hasn’t been spilledover disk, still 
> coredumped ☹ Is there any special thing that we need to do before we 
> migrate db next to the block? Our osds are using dmcrypt, is it an issue?
>
> {
> "backtrace": [
> "(()+0x12b20) [0x7f310aa49b20]",
> "(gsignal()+0x10f) [0x7f31096aa37f]",
> "(abort()+0x127) [0x7f3109694db5]",
> "(()+0x9009b) [0x7f310a06209b]",
> "(()+0x9653c) [0x7f310a06853c]",
> "(()+0x95559) [0x7f310a067559]",
> "(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
> "(()+0x10b03) [0x7f3109a48b03]",
> "(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
> "(__cxa_throw()+0x3b) [0x7f310a0687eb]",
> "(()+0x19fa4) [0x7f310b7b6fa4]",
> "(tcmalloc::allocate_full_cpp_throw_oom(unsigned
> long)+0x146) [0x7f310b7d8c96]",
> "(()+0x10d0f8e) [0x55ffa520df8e]",
> "(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
> "(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
> "(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a)
> [0x55ffa52efcca]",
> "(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88)
> [0x55ffa52f0568]",
> "(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
> "(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
> "(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
> "(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
> "(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
> "(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, 
> std::__cxx11::basic_string, 
> std::allocator > const&, 
> std::vector std::allocator > const&, 
> std::vector std::allocator >*, rocksdb::DB**,
> bool)+0x1089) [0x55ffa51a57e9]",
> "(RocksDBStore::do_open(std::ostream&, bool, bool, 
> std::vector std::allocator > const*)+0x14ca) 
> [0x55ffa51285ca]",
> "(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",
> "(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",
> "(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",
> "(OSD::init()+0x380) [0x55ffa4753a70]",
> "(main()+0x47f1) [0x55ffa46a6901]",
> "(__libc_start_main()+0xf3) [0x7f3109696493]",
> "(_start()+0x2e) [0x55ffa46d4e3e]"
> ],
> "ceph_version": "15.2.14",
> "crash_id":
> "2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",
> "entity_name": "osd.48",
> "os_id": "centos",
> "os_name": "CentOS Linux",
> "os_version": "8",
> "os_version_id": "8",
> "process_name": "ceph-osd",
> "stack_sig":
> "6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81ddbf",
> "timestamp": "2021-10-05T13:31:28.513463Z",
> "utsname_hostname": "server-2s07",
> "utsname_machine": "x86_64",
> "utsname_release": "4.18.0-305.19.1.el8_4.x86_64",
> "utsname_sysname": "Linux",
> "utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021"
> }
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com<m

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Szabo, Istvan (Agoda)
Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹
Is there any special thing that we need to do before we migrate db next to the 
block? Our osds are using dmcrypt, is it an issue?

{
"backtrace": [
"(()+0x12b20) [0x7f310aa49b20]",
"(gsignal()+0x10f) [0x7f31096aa37f]",
"(abort()+0x127) [0x7f3109694db5]",
"(()+0x9009b) [0x7f310a06209b]",
"(()+0x9653c) [0x7f310a06853c]",
"(()+0x95559) [0x7f310a067559]",
"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
"(()+0x10b03) [0x7f3109a48b03]",
"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
"(__cxa_throw()+0x3b) [0x7f310a0687eb]",
"(()+0x19fa4) [0x7f310b7b6fa4]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) 
[0x7f310b7d8c96]",
"(()+0x10d0f8e) [0x55ffa520df8e]",
"(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) 
[0x55ffa52efcca]",
"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55ffa52f0568]",
"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
"(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
"(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
"(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
"(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, 
std::__cxx11::basic_string, std::allocator > 
const&, std::vector > const&, 
std::vector >*, rocksdb::DB**, bool)+0x1089) 
[0x55ffa51a57e9]",
"(RocksDBStore::do_open(std::ostream&, bool, bool, 
std::vector 
> const*)+0x14ca) [0x55ffa51285ca]",
"(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",
"(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",
"(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",
"(OSD::init()+0x380) [0x55ffa4753a70]",
"(main()+0x47f1) [0x55ffa46a6901]",
"(__libc_start_main()+0xf3) [0x7f3109696493]",
"(_start()+0x2e) [0x55ffa46d4e3e]"
],
"ceph_version": "15.2.14",
"crash_id": 
"2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",
"entity_name": "osd.48",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-osd",
"stack_sig": 
"6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81ddbf",
"timestamp": "2021-10-05T13:31:28.513463Z",
"utsname_hostname": "server-2s07",
"utsname_machine": "x86_64",
    "utsname_release": "4.18.0-305.19.1.el8_4.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021"
}
Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
-------------------

From: 胡 玮文 
Sent: Monday, October 4, 2021 12:13 AM
To: Szabo, Istvan (Agoda) ; Igor Fedotov 

Cc: ceph-users@ceph.io
Subject: 回复: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

The stack trace (tcmalloc::allocate_full_cpp_throw_oom) seems indicating you 
don’t have enough memory.

发件人: Szabo, Istvan (Agoda)<mailto:istvan.sz...@agoda.com>
发送时间: 2021年10月4日 0:46
收件人: Igor Fedotov<mailto:ifedo...@suse.de>
抄送: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: [ceph-users] Re: is it possible to remove the db+wal from an external 
device (nvme)

Seems like it cannot start anymore once migrated ☹

https://justpaste.it/5hkot

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com%3cmailto:istvan.sz...@agoda.com>>
---

From: Igor Fedotov mailto:ifedo...@suse.de>>
Sent: Saturday, October 2, 2021 5:22 AM
To: Szabo, Istvan (Agoda) 
m

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Igor Fedotov

Not sure dmcrypt is a culprit here.

Could you please set debug-bluefs to 20 and collect an OSD startup log.


On 10/5/2021 4:43 PM, Szabo, Istvan (Agoda) wrote:


Hmm, tried another one which hasn’t been spilledover disk, still 
coredumped ☹


Is there any special thing that we need to do before we migrate db 
next to the block? Our osds are using dmcrypt, is it an issue?


{

"backtrace": [

"(()+0x12b20) [0x7f310aa49b20]",

"(gsignal()+0x10f) [0x7f31096aa37f]",

"(abort()+0x127) [0x7f3109694db5]",

"(()+0x9009b) [0x7f310a06209b]",

"(()+0x9653c) [0x7f310a06853c]",

"(()+0x95559) [0x7f310a067559]",

"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",

"(()+0x10b03) [0x7f3109a48b03]",

"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",

"(__cxa_throw()+0x3b) [0x7f310a0687eb]",

"(()+0x19fa4) [0x7f310b7b6fa4]",

"(tcmalloc::allocate_full_cpp_throw_oom(unsigned long)+0x146) 
[0x7f310b7d8c96]",


"(()+0x10d0f8e) [0x55ffa520df8e]",

  "(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",

"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",

"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a) [0x55ffa52efcca]",

"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88) [0x55ffa52f0568]",

"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",

"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",

"(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",

"(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",

"(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",

"(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&, 
std::__cxx11::basic_string, 
std::allocator > const&, 
std::vectorstd::allocator > const&, 
std::vectorstd::allocator >*, rocksdb::DB**, 
bool)+0x1089) [0x55ffa51a57e9]",


"(RocksDBStore::do_open(std::ostream&, bool, bool, 
std::vectorstd::allocator > const*)+0x14ca) 
[0x55ffa51285ca]",


"(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",

"(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",

"(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",

"(OSD::init()+0x380) [0x55ffa4753a70]",

"(main()+0x47f1) [0x55ffa46a6901]",

"(__libc_start_main()+0xf3) [0x7f3109696493]",

"(_start()+0x2e) [0x55ffa46d4e3e]"

],

"ceph_version": "15.2.14",

"crash_id": 
"2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",


"entity_name": "osd.48",

"os_id": "centos",

"os_name": "CentOS Linux",

"os_version": "8",

"os_version_id": "8",

"process_name": "ceph-osd",

"stack_sig": 
"6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81ddbf",


"timestamp": "2021-10-05T13:31:28.513463Z",

"utsname_hostname": "server-2s07",

"utsname_machine": "x86_64",

"utsname_release": "4.18.0-305.19.1.el8_4.x86_64",

"utsname_sysname": "Linux",

"utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021"

}

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---

*From:*胡玮文
*Sent:* Monday, October 4, 2021 12:13 AM
*To:* Szabo, Istvan (Agoda) ; Igor Fedotov 


*Cc:* ceph-users@ceph.io
*Subject:* 回复: [ceph-users] Re: is it possible to remove the db+wal 
from an external device (nvme)


Email received from the internet. If in doubt, don't click any link 
nor open any attachment !




The stack trace (tcmalloc::allocate_full_cpp_throw_oom) seems 
indicating you don’t have enough memory.


*发件人**: *Szabo, Istvan (Agoda) <mailto:istvan.sz...@agoda.com>
*发送时间: *2021年10月4日 0:46
*收件人: *Igor Fedotov <mailto:ifedo...@suse.de>
*抄送: *ceph-users@ceph.io <mailto:ceph-users@ceph.io>
*主题: *[ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)


Seems like it cannot start anymore once migrated ☹

https://justpaste.it/5hkot <https://justpaste.it/5hkot>

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com 
<mailto:istvan.sz...@agoda.com%3cmailto:istvan.sz...@agoda.com>>

---

From: Igor Fedotov mailto:ifedo...@suse.de>>
S

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-05 Thread Eugen Block

Do you see oom killers in dmesg on this host? This line indicates it:

 "(tcmalloc::allocate_full_cpp_throw_oom(unsigned  
long)+0x146) [0x7f310b7d8c96]",



Zitat von "Szabo, Istvan (Agoda)" :


Hmm, tried another one which hasn’t been spilledover disk, still coredumped ☹
Is there any special thing that we need to do before we migrate db  
next to the block? Our osds are using dmcrypt, is it an issue?


{
"backtrace": [
"(()+0x12b20) [0x7f310aa49b20]",
"(gsignal()+0x10f) [0x7f31096aa37f]",
"(abort()+0x127) [0x7f3109694db5]",
"(()+0x9009b) [0x7f310a06209b]",
"(()+0x9653c) [0x7f310a06853c]",
"(()+0x95559) [0x7f310a067559]",
"(__gxx_personality_v0()+0x2a8) [0x7f310a067ed8]",
"(()+0x10b03) [0x7f3109a48b03]",
"(_Unwind_RaiseException()+0x2b1) [0x7f3109a49071]",
"(__cxa_throw()+0x3b) [0x7f310a0687eb]",
"(()+0x19fa4) [0x7f310b7b6fa4]",
"(tcmalloc::allocate_full_cpp_throw_oom(unsigned  
long)+0x146) [0x7f310b7d8c96]",

"(()+0x10d0f8e) [0x55ffa520df8e]",
"(rocksdb::Version::~Version()+0x104) [0x55ffa521d174]",
"(rocksdb::Version::Unref()+0x21) [0x55ffa521d221]",
"(rocksdb::ColumnFamilyData::~ColumnFamilyData()+0x5a)  
[0x55ffa52efcca]",
"(rocksdb::ColumnFamilySet::~ColumnFamilySet()+0x88)  
[0x55ffa52f0568]",

"(rocksdb::VersionSet::~VersionSet()+0x5e) [0x55ffa520e01e]",
"(rocksdb::VersionSet::~VersionSet()+0x11) [0x55ffa520e261]",
"(rocksdb::DBImpl::CloseHelper()+0x616) [0x55ffa5155ed6]",
"(rocksdb::DBImpl::~DBImpl()+0x83b) [0x55ffa515c35b]",
"(rocksdb::DBImplReadOnly::~DBImplReadOnly()+0x11) [0x55ffa51a3bc1]",
"(rocksdb::DB::OpenForReadOnly(rocksdb::DBOptions const&,  
std::__cxx11::basic_string,  
std::allocator > const&,  
std::vectorstd::allocator > const&,  
std::vectorstd::allocator >*, rocksdb::DB**,  
bool)+0x1089) [0x55ffa51a57e9]",
"(RocksDBStore::do_open(std::ostream&, bool, bool,  
std::vectorstd::allocator > const*)+0x14ca)  
[0x55ffa51285ca]",

"(BlueStore::_open_db(bool, bool, bool)+0x1314) [0x55ffa4bc27e4]",
"(BlueStore::_open_db_and_around(bool)+0x4c) [0x55ffa4bd4c5c]",
"(BlueStore::_mount(bool, bool)+0x847) [0x55ffa4c2e047]",
"(OSD::init()+0x380) [0x55ffa4753a70]",
"(main()+0x47f1) [0x55ffa46a6901]",
"(__libc_start_main()+0xf3) [0x7f3109696493]",
"(_start()+0x2e) [0x55ffa46d4e3e]"
],
"ceph_version": "15.2.14",
"crash_id":  
"2021-10-05T13:31:28.513463Z_b6818598-4960-4ed6-942a-d4a7ff37a758",

"entity_name": "osd.48",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-osd",
"stack_sig":  
"6a43b6c219adac393b239fbea4a53ff87c4185bcd213724f0d721b452b81ddbf",

"timestamp": "2021-10-05T13:31:28.513463Z",
"utsname_hostname": "server-2s07",
    "utsname_machine": "x86_64",
"utsname_release": "4.18.0-305.19.1.el8_4.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Wed Sep 15 15:39:39 UTC 2021"
}
Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
-----------------------

From: 胡 玮文 
Sent: Monday, October 4, 2021 12:13 AM
To: Szabo, Istvan (Agoda) ; Igor Fedotov  


Cc: ceph-users@ceph.io
Subject: 回复: [ceph-users] Re: is it possible to remove the db+wal  
from an external device (nvme)


Email received from the internet. If in doubt, don't click any link  
nor open any attachment !


The stack trace (tcmalloc::allocate_full_cpp_throw_oom) seems  
indicating you don’t have enough memory.


发件人: Szabo, Istvan (Agoda)<mailto:istvan.sz...@agoda.com>
发送时间: 2021年10月4日 0:46
收件人: Igor Fedotov<mailto:ifedo...@suse.de>
抄送: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: [ceph-users] Re: is it possible to remove the db+wal from an  
external device (nvme)


Seems like it cannot start anymore once migrated ☹

https://justpaste.it/5hkot

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e:  

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread Szabo, Istvan (Agoda)
Seems like it cannot start anymore once migrated ☹

https://justpaste.it/5hkot

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Saturday, October 2, 2021 5:22 AM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; Eugen Block ; Christian Wuerdig 

Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi Istvan,

yeah both db and wal to slow migration are supported. And spillover state isn't 
a show stopper for that.


On 10/2/2021 1:16 AM, Szabo, Istvan (Agoda) wrote:
Dear Igor,

Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 to be 
able to remove the db (included the wall) from the nvme even if it is 
spilledover? I can’t compact back to normal many disk to not show spillover 
warning.

I think Christian has the truth of the issue, my
Nvme with 30k rand write iops backing 3x ssd with 67k rand write iops each …

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---


On 2021. Oct 1., at 11:47, Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com> wrote:
3x SSD osd /nvme

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

-Original Message-
From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Friday, October 1, 2021 4:35 PM
To: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: [ceph-users] Re: is it possible to remove the db+wal from an external 
device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


And how many OSDs are per single NVMe do you have?

On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote:

I have my dashboards and I can see that the db nvmes are always running on 100% 
utilization (you can monitor with iostat -x 1)  and it generates all the time 
iowaits which is between 1-3.

I’m using nvme in front of the ssds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>
---

From: Victor Hooi <mailto:victorh...@yahoo.com>
Sent: Friday, October 1, 2021 5:30 AM
To: Eugen Block <mailto:ebl...@nde.ag>
Cc: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>; 胡 玮文
<mailto:huw...@outlook.com>; ceph-users 
<mailto:ceph-users@ceph.io>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from
an external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

Hi,

I'm curious - how did you tell that the separate WAL+DB volume was slowing 
things down? I assume you did some benchmarking - is there any chance you'd be 
willing to share results? (Or anybody else that's been in a similar situation).

What sorts of devices are you using for the WAL+DB, versus the data disks?

We're using NAND SSDs, with Optanes for the WAL+DB, and on some
systems I am seeing slowly than expected behaviour - need to dive
deeper into it

In my case, I was running with 4 or 2 OSDs per Optane volume:

https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partition
s_can_you_run_per_optane/

but I couldn't seem to get the results I'd expected - so curious what people 
are seeing in the real world - and of course, we might need to follow the steps 
here to remove them as well.

Thanks,
Victor

On Thu, 30 Sept 2021 at 16:10, Eugen Block 
mailto:ebl...@nde.ag><mailto:ebl...@nde.ag><mailto:ebl...@nde.ag>>
 wrote:
Yes, I believe for you it should work without containers although I
haven't tried the migrate command in a non-containerized cluster yet.
But I believe this is a general issue for containerized clusters with
regards to maintenance. I haven't checked yet if there are existing
tracker issues for this, but maybe this should be worth creating one?


Zitat von "Szabo, Istvan (Agoda)" 
mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>>:

Actually I don't have containerized deployment, my is normal one. So
it

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-03 Thread 胡 玮文
Sorry, I read it again and found “tcmalloc: large alloc 94477368950784 bytes == 
(nil)”. This unrealistic large malloc seems indicating a bug. But I didn’t find 
one in the tracker.


发件人: Szabo, Istvan (Agoda) 
发送时间: Monday, October 4, 2021 12:45:20 AM
收件人: Igor Fedotov 
抄送: ceph-users@ceph.io 
主题: [ceph-users] Re: is it possible to remove the db+wal from an external 
device (nvme)

Seems like it cannot start anymore once migrated ☹

https://justpaste.it/5hkot

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Igor Fedotov 
Sent: Saturday, October 2, 2021 5:22 AM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users@ceph.io; Eugen Block ; Christian Wuerdig 

Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi Istvan,

yeah both db and wal to slow migration are supported. And spillover state isn't 
a show stopper for that.


On 10/2/2021 1:16 AM, Szabo, Istvan (Agoda) wrote:
Dear Igor,

Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 to be 
able to remove the db (included the wall) from the nvme even if it is 
spilledover? I can’t compact back to normal many disk to not show spillover 
warning.

I think Christian has the truth of the issue, my
Nvme with 30k rand write iops backing 3x ssd with 67k rand write iops each …

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---


On 2021. Oct 1., at 11:47, Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com> wrote:
3x SSD osd /nvme

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

-Original Message-
From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Friday, October 1, 2021 4:35 PM
To: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: [ceph-users] Re: is it possible to remove the db+wal from an external 
device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


And how many OSDs are per single NVMe do you have?

On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote:

I have my dashboards and I can see that the db nvmes are always running on 100% 
utilization (you can monitor with iostat -x 1)  and it generates all the time 
iowaits which is between 1-3.

I’m using nvme in front of the ssds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>
---

From: Victor Hooi <mailto:victorh...@yahoo.com>
Sent: Friday, October 1, 2021 5:30 AM
To: Eugen Block <mailto:ebl...@nde.ag>
Cc: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>; 胡 玮文
<mailto:huw...@outlook.com>; ceph-users 
<mailto:ceph-users@ceph.io>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from
an external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

Hi,

I'm curious - how did you tell that the separate WAL+DB volume was slowing 
things down? I assume you did some benchmarking - is there any chance you'd be 
willing to share results? (Or anybody else that's been in a similar situation).

What sorts of devices are you using for the WAL+DB, versus the data disks?

We're using NAND SSDs, with Optanes for the WAL+DB, and on some
systems I am seeing slowly than expected behaviour - need to dive
deeper into it

In my case, I was running with 4 or 2 OSDs per Optane volume:

https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partition
s_can_you_run_per_optane/

but I couldn't seem to get the results I'd expected - so curious what people 
are seeing in the real world - and of course, we might need to follow the steps 
here to remove them as well.

Thanks,
Victor

On Thu, 30 Sept 2021 at 16:10, Eugen Block 
mailto:ebl...@nde.ag><mailto:ebl...@nde.ag><mailto:ebl...@nde.ag>>
 wrote:
Yes, I believe for you it should work without containers although I
haven't tried the migrate command in a non-containerized cluster yet.
But I believe this is a general issu

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-02 Thread Szabo, Istvan (Agoda)
Ok, also spillover and not deep-scrub/scrub less than 5 pg shouldn’t be an 
issue in case of minor update right? Less than 5 pg not scrubbed, I will update 
complete os also with kernel python …. with ceph from 15.2.10. Usually I never 
update ceph alone.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

On 2021. Oct 2., at 0:22, Igor Fedotov  wrote:


Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi Istvan,

yeah both db and wal to slow migration are supported. And spillover state isn't 
a show stopper for that.


On 10/2/2021 1:16 AM, Szabo, Istvan (Agoda) wrote:
Dear Igor,

Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 to be 
able to remove the db (included the wall) from the nvme even if it is 
spilledover? I can’t compact back to normal many disk to not show spillover 
warning.

I think Christian has the truth of the issue, my
Nvme with 30k rand write iops backing 3x ssd with 67k rand write iops each …

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

On 2021. Oct 1., at 11:47, Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com> wrote:

3x SSD osd /nvme

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

-Original Message-
From: Igor Fedotov <mailto:ifedo...@suse.de>
Sent: Friday, October 1, 2021 4:35 PM
To: ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: [ceph-users] Re: is it possible to remove the db+wal from an external 
device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


And how many OSDs are per single NVMe do you have?

On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote:
I have my dashboards and I can see that the db nvmes are always running on 100% 
utilization (you can monitor with iostat -x 1)  and it generates all the time 
iowaits which is between 1-3.

I’m using nvme in front of the ssds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: 
istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>
---

From: Victor Hooi <mailto:victorh...@yahoo.com>
Sent: Friday, October 1, 2021 5:30 AM
To: Eugen Block <mailto:ebl...@nde.ag>
Cc: Szabo, Istvan (Agoda) 
<mailto:istvan.sz...@agoda.com>; 胡 玮文
<mailto:huw...@outlook.com>; ceph-users 
<mailto:ceph-users@ceph.io>
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from
an external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

Hi,

I'm curious - how did you tell that the separate WAL+DB volume was slowing 
things down? I assume you did some benchmarking - is there any chance you'd be 
willing to share results? (Or anybody else that's been in a similar situation).

What sorts of devices are you using for the WAL+DB, versus the data disks?

We're using NAND SSDs, with Optanes for the WAL+DB, and on some
systems I am seeing slowly than expected behaviour - need to dive
deeper into it

In my case, I was running with 4 or 2 OSDs per Optane volume:

https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partition
s_can_you_run_per_optane/

but I couldn't seem to get the results I'd expected - so curious what people 
are seeing in the real world - and of course, we might need to follow the steps 
here to remove them as well.

Thanks,
Victor

On Thu, 30 Sept 2021 at 16:10, Eugen Block 
mailto:ebl...@nde.ag><mailto:ebl...@nde.ag><mailto:ebl...@nde.ag>>
 wrote:
Yes, I believe for you it should work without containers although I
haven't tried the migrate command in a non-containerized cluster yet.
But I believe this is a general issue for containerized clusters with
regards to maintenance. I haven't checked yet if there are existing
tracker issues for this, but maybe this should be worth creating one?


Zitat von "Szabo, Istvan (Agoda)" 
mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com><mailto:istvan.sz...@agoda.com>>:

Actually I don't have containerized deployment, my is normal one. So
it should work 

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
Dear Igor,

Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 to be 
able to remove the db (included the wall) from the nvme even if it is 
spilledover? I can’t compact back to normal many disk to not show spillover 
warning.

I think Christian has the truth of the issue, my
Nvme with 30k rand write iops backing 3x ssd with 67k rand write iops each …

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

On 2021. Oct 1., at 11:47, Szabo, Istvan (Agoda)  wrote:

3x SSD osd /nvme

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Igor Fedotov 
Sent: Friday, October 1, 2021 4:35 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: is it possible to remove the db+wal from an external 
device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


And how many OSDs are per single NVMe do you have?

On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote:
I have my dashboards and I can see that the db nvmes are always running on 100% 
utilization (you can monitor with iostat -x 1)  and it generates all the time 
iowaits which is between 1-3.

I’m using nvme in front of the ssds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Victor Hooi 
Sent: Friday, October 1, 2021 5:30 AM
To: Eugen Block 
Cc: Szabo, Istvan (Agoda) ; 胡 玮文
; ceph-users 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from
an external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

Hi,

I'm curious - how did you tell that the separate WAL+DB volume was slowing 
things down? I assume you did some benchmarking - is there any chance you'd be 
willing to share results? (Or anybody else that's been in a similar situation).

What sorts of devices are you using for the WAL+DB, versus the data disks?

We're using NAND SSDs, with Optanes for the WAL+DB, and on some
systems I am seeing slowly than expected behaviour - need to dive
deeper into it

In my case, I was running with 4 or 2 OSDs per Optane volume:

https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partition
s_can_you_run_per_optane/

but I couldn't seem to get the results I'd expected - so curious what people 
are seeing in the real world - and of course, we might need to follow the steps 
here to remove them as well.

Thanks,
Victor

On Thu, 30 Sept 2021 at 16:10, Eugen Block 
mailto:ebl...@nde.ag>> wrote:
Yes, I believe for you it should work without containers although I
haven't tried the migrate command in a non-containerized cluster yet.
But I believe this is a general issue for containerized clusters with
regards to maintenance. I haven't checked yet if there are existing
tracker issues for this, but maybe this should be worth creating one?


Zitat von "Szabo, Istvan (Agoda)" 
mailto:istvan.sz...@agoda.com>>:

Actually I don't have containerized deployment, my is normal one. So
it should work the lvm migrate.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

-Original Message-
From: Eugen Block mailto:ebl...@nde.ag>>
Sent: Wednesday, September 29, 2021 8:49 PM
To: 胡 玮文 mailto:huw...@outlook.com>>
Cc: Igor Fedotov mailto:ifedo...@suse.de>>; Szabo,
Istvan (Agoda)
mailto:istvan.sz...@agoda.com>>;
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: Re: is it possible to remove the db+wal from an external
device (nvme)

Email received from the internet. If in doubt, don't click any link
nor open any attachment !


That's what I did and pasted the results in my previous comments.


Zitat von 胡 玮文 mailto:huw...@outlook.com>>:

Yes. And “cephadm shell” command does not depend on the running
daemon, it will start a new container. So I think it is perfectly
fine to stop the OSD first then run the “cephadm shell” command, and
run ceph-volume in the new shell.

发件人: Eugen Block<mailto:ebl...@nde.ag<mailto:ebl...@nde.ag>>
发送时间: 2021年9月29日 21:40
收件人: 胡 玮文<mailto:huw...@outlook.com<mailto:huw...@outlook.com>>
抄送: Igor Fedotov<mailto:ifedo...@suse.de<mailto:if

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Igor Fedotov

Hi Istvan,

yeah both db and wal to slow migration are supported. And spillover 
state isn't a show stopper for that.



On 10/2/2021 1:16 AM, Szabo, Istvan (Agoda) wrote:

Dear Igor,

Is the ceph-volume lvm migrate command smart enough in octopus 15.2.14 
to be able to remove the db (included the wall) from the nvme even if 
it is spilledover? I can’t compact back to normal many disk to not 
show spillover warning.


I think Christian has the truth of the issue, my
Nvme with 30k rand write iops backing 3x ssd with 67k rand write iops 
each …


Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com <mailto:istvan.sz...@agoda.com>
---

On 2021. Oct 1., at 11:47, Szabo, Istvan (Agoda) 
 wrote:


3x SSD osd /nvme

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Igor Fedotov 
Sent: Friday, October 1, 2021 4:35 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)


Email received from the internet. If in doubt, don't click any link 
nor open any attachment !



And how many OSDs are per single NVMe do you have?

On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote:
I have my dashboards and I can see that the db nvmes are always 
running on 100% utilization (you can monitor with iostat -x 1)  and 
it generates all the time iowaits which is between 1-3.


I’m using nvme in front of the ssds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Victor Hooi 
Sent: Friday, October 1, 2021 5:30 AM
To: Eugen Block 
Cc: Szabo, Istvan (Agoda) ; 胡 玮文
; ceph-users 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from
an external device (nvme)

Email received from the internet. If in doubt, don't click any link 
nor open any attachment !


Hi,

I'm curious - how did you tell that the separate WAL+DB volume was 
slowing things down? I assume you did some benchmarking - is there 
any chance you'd be willing to share results? (Or anybody else 
that's been in a similar situation).


What sorts of devices are you using for the WAL+DB, versus the data 
disks?


We're using NAND SSDs, with Optanes for the WAL+DB, and on some
systems I am seeing slowly than expected behaviour - need to dive
deeper into it

In my case, I was running with 4 or 2 OSDs per Optane volume:

https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partition
s_can_you_run_per_optane/

but I couldn't seem to get the results I'd expected - so curious 
what people are seeing in the real world - and of course, we might 
need to follow the steps here to remove them as well.


Thanks,
Victor

On Thu, 30 Sept 2021 at 16:10, Eugen Block 
mailto:ebl...@nde.ag>> wrote:

Yes, I believe for you it should work without containers although I
haven't tried the migrate command in a non-containerized cluster yet.
But I believe this is a general issue for containerized clusters with
regards to maintenance. I haven't checked yet if there are existing
tracker issues for this, but maybe this should be worth creating one?


Zitat von "Szabo, Istvan (Agoda)" 
mailto:istvan.sz...@agoda.com>>:



Actually I don't have containerized deployment, my is normal one. So
it should work the lvm migrate.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

-Original Message-
From: Eugen Block mailto:ebl...@nde.ag>>
Sent: Wednesday, September 29, 2021 8:49 PM
To: 胡 玮文 mailto:huw...@outlook.com>>
Cc: Igor Fedotov mailto:ifedo...@suse.de>>; Szabo,
Istvan (Agoda)
mailto:istvan.sz...@agoda.com>>;
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: Re: is it possible to remove the db+wal from an external
device (nvme)

Email received from the internet. If in doubt, don't click any link
nor open any attachment !


That's what I did and pasted the results in my previous comments.


Zitat von 胡 玮文 mailto:huw...@outlook.com>>:


Yes. And “cephadm shell” command does not depend on the running
daemon, it will start a new container. So I think it is perfectly
fine to stop the OSD first then run the “cephadm shell” command, and
run ceph-volume in the new shell.

发件人: Eugen Block&l

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
3x SSD osd /nvme

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Igor Fedotov  
Sent: Friday, October 1, 2021 4:35 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: is it possible to remove the db+wal from an external 
device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


And how many OSDs are per single NVMe do you have?

On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote:
> I have my dashboards and I can see that the db nvmes are always running on 
> 100% utilization (you can monitor with iostat -x 1)  and it generates all the 
> time iowaits which is between 1-3.
>
> I’m using nvme in front of the ssds.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
> ---
>
> From: Victor Hooi 
> Sent: Friday, October 1, 2021 5:30 AM
> To: Eugen Block 
> Cc: Szabo, Istvan (Agoda) ; 胡 玮文 
> ; ceph-users 
> Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from 
> an external device (nvme)
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
> Hi,
>
> I'm curious - how did you tell that the separate WAL+DB volume was slowing 
> things down? I assume you did some benchmarking - is there any chance you'd 
> be willing to share results? (Or anybody else that's been in a similar 
> situation).
>
> What sorts of devices are you using for the WAL+DB, versus the data disks?
>
> We're using NAND SSDs, with Optanes for the WAL+DB, and on some 
> systems I am seeing slowly than expected behaviour - need to dive 
> deeper into it
>
> In my case, I was running with 4 or 2 OSDs per Optane volume:
>
> https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partition
> s_can_you_run_per_optane/
>
> but I couldn't seem to get the results I'd expected - so curious what people 
> are seeing in the real world - and of course, we might need to follow the 
> steps here to remove them as well.
>
> Thanks,
> Victor
>
> On Thu, 30 Sept 2021 at 16:10, Eugen Block 
> mailto:ebl...@nde.ag>> wrote:
> Yes, I believe for you it should work without containers although I 
> haven't tried the migrate command in a non-containerized cluster yet.
> But I believe this is a general issue for containerized clusters with 
> regards to maintenance. I haven't checked yet if there are existing 
> tracker issues for this, but maybe this should be worth creating one?
>
>
> Zitat von "Szabo, Istvan (Agoda)" 
> mailto:istvan.sz...@agoda.com>>:
>
>> Actually I don't have containerized deployment, my is normal one. So 
>> it should work the lvm migrate.
>>
>> Istvan Szabo
>> Senior Infrastructure Engineer
>> ---
>> Agoda Services Co., Ltd.
>> e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
>> ---
>>
>> -Original Message-
>> From: Eugen Block mailto:ebl...@nde.ag>>
>> Sent: Wednesday, September 29, 2021 8:49 PM
>> To: 胡 玮文 mailto:huw...@outlook.com>>
>> Cc: Igor Fedotov mailto:ifedo...@suse.de>>; Szabo, 
>> Istvan (Agoda) 
>> mailto:istvan.sz...@agoda.com>>; 
>> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>> Subject: Re: is it possible to remove the db+wal from an external 
>> device (nvme)
>>
>> Email received from the internet. If in doubt, don't click any link 
>> nor open any attachment !
>> 
>>
>> That's what I did and pasted the results in my previous comments.
>>
>>
>> Zitat von 胡 玮文 mailto:huw...@outlook.com>>:
>>
>>> Yes. And “cephadm shell” command does not depend on the running 
>>> daemon, it will start a new container. So I think it is perfectly 
>>> fine to stop the OSD first then run the “cephadm shell” command, and 
>>> run ceph-volume in the new shell.
>>>
>>> 发件人: Eugen Block<mailto:ebl...@nde.ag<mailto:ebl...@nde.ag>>
>>> 发送时间: 2021年9月29日 21:40
>>> 收件人: 胡 玮文<mailto:huw...@outlook.com<mailto:huw...@outlook.com>>
>>> 抄送: Igor Fedotov<mailto:ifedo...@s

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Szabo, Istvan (Agoda)
I have my dashboards and I can see that the db nvmes are always running on 100% 
utilization (you can monitor with iostat -x 1)  and it generates all the time 
iowaits which is between 1-3.

I’m using nvme in front of the ssds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Victor Hooi 
Sent: Friday, October 1, 2021 5:30 AM
To: Eugen Block 
Cc: Szabo, Istvan (Agoda) ; 胡 玮文 ; 
ceph-users 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

Hi,

I'm curious - how did you tell that the separate WAL+DB volume was slowing 
things down? I assume you did some benchmarking - is there any chance you'd be 
willing to share results? (Or anybody else that's been in a similar situation).

What sorts of devices are you using for the WAL+DB, versus the data disks?

We're using NAND SSDs, with Optanes for the WAL+DB, and on some systems I am 
seeing slowly than expected behaviour - need to dive deeper into it

In my case, I was running with 4 or 2 OSDs per Optane volume:

https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partitions_can_you_run_per_optane/

but I couldn't seem to get the results I'd expected - so curious what people 
are seeing in the real world - and of course, we might need to follow the steps 
here to remove them as well.

Thanks,
Victor

On Thu, 30 Sept 2021 at 16:10, Eugen Block 
mailto:ebl...@nde.ag>> wrote:
Yes, I believe for you it should work without containers although I
haven't tried the migrate command in a non-containerized cluster yet.
But I believe this is a general issue for containerized clusters with
regards to maintenance. I haven't checked yet if there are existing
tracker issues for this, but maybe this should be worth creating one?


Zitat von "Szabo, Istvan (Agoda)" 
mailto:istvan.sz...@agoda.com>>:

> Actually I don't have containerized deployment, my is normal one. So
> it should work the lvm migrate.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
> ---
>
> -Original Message-
> From: Eugen Block mailto:ebl...@nde.ag>>
> Sent: Wednesday, September 29, 2021 8:49 PM
> To: 胡 玮文 mailto:huw...@outlook.com>>
> Cc: Igor Fedotov mailto:ifedo...@suse.de>>; Szabo, Istvan 
> (Agoda)
> mailto:istvan.sz...@agoda.com>>; 
> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> Subject: Re: is it possible to remove the db+wal from an external
> device (nvme)
>
> Email received from the internet. If in doubt, don't click any link
> nor open any attachment !
> 
>
> That's what I did and pasted the results in my previous comments.
>
>
> Zitat von 胡 玮文 mailto:huw...@outlook.com>>:
>
>> Yes. And “cephadm shell” command does not depend on the running
>> daemon, it will start a new container. So I think it is perfectly fine
>> to stop the OSD first then run the “cephadm shell” command, and run
>> ceph-volume in the new shell.
>>
>> 发件人: Eugen Block<mailto:ebl...@nde.ag<mailto:ebl...@nde.ag>>
>> 发送时间: 2021年9月29日 21:40
>> 收件人: 胡 玮文<mailto:huw...@outlook.com<mailto:huw...@outlook.com>>
>> 抄送: Igor Fedotov<mailto:ifedo...@suse.de<mailto:ifedo...@suse.de>>; Szabo, 
>> Istvan
>> (Agoda)<mailto:istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>>;
>> ceph-users@ceph.io<mailto:ceph-users@ceph.io><mailto:ceph-users@ceph.io<mailto:ceph-users@ceph.io>>
>> 主题: Re: is it possible to remove the db+wal from an external device
>> (nvme)
>>
>> The OSD has to be stopped in order to migrate DB/WAL, it can't be done
>> live. ceph-volume requires a lock on the device.
>>
>>
>> Zitat von 胡 玮文 mailto:huw...@outlook.com>>:
>>
>>> I’ve not tried it, but how about:
>>>
>>> cephadm shell -n osd.0
>>>
>>> then run “ceph-volume” commands in the newly opened shell. The
>>> directory structure seems fine.
>>>
>>> $ sudo cephadm shell -n osd.0
>>> Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
>>> Inferring config
>>> /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
>>> Using recent ceph image
>>&

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-01 Thread Igor Fedotov

And how many OSDs are per single NVMe do you have?

On 10/1/2021 9:55 AM, Szabo, Istvan (Agoda) wrote:

I have my dashboards and I can see that the db nvmes are always running on 100% 
utilization (you can monitor with iostat -x 1)  and it generates all the time 
iowaits which is between 1-3.

I’m using nvme in front of the ssds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

From: Victor Hooi 
Sent: Friday, October 1, 2021 5:30 AM
To: Eugen Block 
Cc: Szabo, Istvan (Agoda) ; 胡 玮文 ; 
ceph-users 
Subject: Re: [ceph-users] Re: is it possible to remove the db+wal from an 
external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

Hi,

I'm curious - how did you tell that the separate WAL+DB volume was slowing 
things down? I assume you did some benchmarking - is there any chance you'd be 
willing to share results? (Or anybody else that's been in a similar situation).

What sorts of devices are you using for the WAL+DB, versus the data disks?

We're using NAND SSDs, with Optanes for the WAL+DB, and on some systems I am 
seeing slowly than expected behaviour - need to dive deeper into it

In my case, I was running with 4 or 2 OSDs per Optane volume:

https://www.reddit.com/r/ceph/comments/k2lef1/how_many_waldb_partitions_can_you_run_per_optane/

but I couldn't seem to get the results I'd expected - so curious what people 
are seeing in the real world - and of course, we might need to follow the steps 
here to remove them as well.

Thanks,
Victor

On Thu, 30 Sept 2021 at 16:10, Eugen Block 
mailto:ebl...@nde.ag>> wrote:
Yes, I believe for you it should work without containers although I
haven't tried the migrate command in a non-containerized cluster yet.
But I believe this is a general issue for containerized clusters with
regards to maintenance. I haven't checked yet if there are existing
tracker issues for this, but maybe this should be worth creating one?


Zitat von "Szabo, Istvan (Agoda)" 
mailto:istvan.sz...@agoda.com>>:


Actually I don't have containerized deployment, my is normal one. So
it should work the lvm migrate.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---

-Original Message-
From: Eugen Block mailto:ebl...@nde.ag>>
Sent: Wednesday, September 29, 2021 8:49 PM
To: 胡 玮文 mailto:huw...@outlook.com>>
Cc: Igor Fedotov mailto:ifedo...@suse.de>>; Szabo, Istvan 
(Agoda)
mailto:istvan.sz...@agoda.com>>; 
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
Subject: Re: is it possible to remove the db+wal from an external
device (nvme)

Email received from the internet. If in doubt, don't click any link
nor open any attachment !


That's what I did and pasted the results in my previous comments.


Zitat von 胡 玮文 mailto:huw...@outlook.com>>:


Yes. And “cephadm shell” command does not depend on the running
daemon, it will start a new container. So I think it is perfectly fine
to stop the OSD first then run the “cephadm shell” command, and run
ceph-volume in the new shell.

发件人: Eugen Block<mailto:ebl...@nde.ag<mailto:ebl...@nde.ag>>
发送时间: 2021年9月29日 21:40
收件人: 胡 玮文<mailto:huw...@outlook.com<mailto:huw...@outlook.com>>
抄送: Igor Fedotov<mailto:ifedo...@suse.de<mailto:ifedo...@suse.de>>; Szabo, 
Istvan
(Agoda)<mailto:istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>>;
ceph-users@ceph.io<mailto:ceph-users@ceph.io><mailto:ceph-users@ceph.io<mailto:ceph-users@ceph.io>>
主题: Re: is it possible to remove the db+wal from an external device
(nvme)

The OSD has to be stopped in order to migrate DB/WAL, it can't be done
live. ceph-volume requires a lock on the device.


Zitat von 胡 玮文 mailto:huw...@outlook.com>>:


I’ve not tried it, but how about:

cephadm shell -n osd.0

then run “ceph-volume” commands in the newly opened shell. The
directory structure seems fine.

$ sudo cephadm shell -n osd.0
Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
Inferring config
/var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
Using recent ceph image
cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37<http://cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37>
d7a9b37db1e0ff6691aae6466530 root@host0:/# ll
/var/lib/ceph/osd/ceph-0/ total 68
drwx-- 2 ceph ceph 4096 Sep 20 04:15 ./
drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /d

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-30 Thread Victor Hooi
an: stderr Failed to migrate to :
> >>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483
> >>> d-ae58-0ab97b8d0cc4
> >>> Traceback (most recent call last):
> >>>File "/usr/sbin/cephadm", line 6225, in 
> >>>  r = args.func()
> >>>File "/usr/sbin/cephadm", line 1363, in _infer_fsid
> >>>  return func()
> >>>File "/usr/sbin/cephadm", line 1422, in _infer_image
> >>>  return func()
> >>>File "/usr/sbin/cephadm", line 3687, in command_ceph_volume
> >>>  out, err, code = call_throws(c.run_cmd(),
> >>> verbosity=CallVerbosity.VERBOSE)
> >>>File "/usr/sbin/cephadm", line 1101, in call_throws
> >>>  raise RuntimeError('Failed command: %s' % ' '.join(command))
> >>> [...]
> >>>
> >>>
> >>> I could install the package ceph-osd (where ceph-volume is packaged
> >>> in) but it's not available by default (as you see this is a SES 7
> >>> environment).
> >>>
> >>> I'm not sure what the design is here, it feels like the ceph-volume
> >>> migrate command is not applicable to containers yet.
> >>>
> >>> Regards,
> >>> Eugen
> >>>
> >>>
> >>> Zitat von Igor Fedotov :
> >>>
> >>>> Hi Eugen,
> >>>>
> >>>> indeed this looks like an issue related to containerized deployment,
> >>>> "ceph-volume lvm migrate" expects osd folder to be under
> >>>> /var/lib/ceph/osd:
> >>>>
> >>>>> stderr: 2021-09-29T06:56:24.787+ 7fde05b96180 -1
> >>>>> bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock
> >>>>> /var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still
> >>>>> running?)(11) Resource temporarily unavailable
> >>>>
> >>>> As a workaround you might want to try to create a symlink to your
> >>>> actual location before issuing the migrate command:
> >>>> /var/lib/ceph/osd ->
> >>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
> >>>>
> >>>> More complicated (and more general IMO) way would be to run the
> >>>> migrate command from within a container deployed similarly (i.e.
> >>>> with all the proper subfolder mappings) to ceph-osd one. Just
> >>>> speculating - not a big expert in containers and never tried that
> >>>> with properly deployed production cluster...
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Igor
> >>>>
> >>>> On 9/29/2021 10:07 AM, Eugen Block wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I just tried with 'ceph-volume lvm migrate' in Octopus but it
> >>>>> doesn't really work. I'm not sure if I'm missing something here,
> >>>>> but I believe it's again the already discussed containers issue. To
> >>>>> be able to run the command for an OSD the OSD has to be offline,
> >>>>> but then you don't have access to the block.db because the path is
> >>>>> different from outside the container:
> >>>>>
> >>>>> ---snip---
> >>>>> [ceph: root@host1 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid
> >>>>> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
> >>>>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-4
> >>>>> 83d-ae58-0ab97b8d0cc4 --> Migrate to existing, Source:
> >>>>> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db']
> >>>>> Target:
> >>>>> /var/lib/ceph/osd/ceph-1/block
> >>>>>  stdout: inferring bluefs devices from bluestore path
> >>>>>  stderr:
> >>>>> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/blu
> >>>>> estore/BlueStore.cc: In function 'int
> >>>>> BlueStore::_mount_for_bluefs()' thread
> >>>>> 7fde05b96180
> >>>>> time
> >>>>> 2021-09-29T06:56:24.790161+
> >>>>>  stderr:
> >>>>> /home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/blu
> >>>>> e

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-30 Thread Szabo, Istvan (Agoda)
>>> -rw--- 1 ceph ceph   48 Nov  9  2020 unit.created
>>> -rw--- 1 ceph ceph   35 Sep 17 14:26 unit.image
>>> -rw--- 1 ceph ceph  306 Sep 17 14:26 unit.meta
>>> -rw--- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
>>> -rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run
>>> -rw--- 1 ceph ceph  142 Sep 17 14:26 unit.stop
>>> -rw--- 1 ceph ceph2 Sep 20 04:15 whoami
>>>
>>> 发件人: Eugen Block<mailto:ebl...@nde.ag>
>>> 发送时间: 2021年9月29日 21:29
>>> 收件人: Igor Fedotov<mailto:ifedo...@suse.de>
>>> 抄送: 胡 玮文<mailto:huw...@outlook.com>; Szabo, Istvan 
>>> (Agoda)<mailto:istvan.sz...@agoda.com>;
>>> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>>> 主题: Re: [ceph-users] Re: 回复: [ceph-users] Re: is it possible to 
>>> remove the db+wal from an external device (nvme)
>>>
>>> Hi Igor,
>>>
>>> thanks for your input. I haven't done this in a prod env yet either, 
>>> still playing around in a virtual lab env.
>>> I tried the symlink suggestion but it's not that easy, because it 
>>> looks different underneath the ceph directory than ceph-volume 
>>> expects it. These are the services underneath:
>>>
>>> ses7-host1:~ # ll 
>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
>>> insgesamt 48
>>> drwx-- 3 root   root   4096 16. Sep 16:11 alertmanager.ses7-host1
>>> drwx-- 3 ceph   ceph   4096 29. Sep 09:03 crash
>>> drwx-- 2 ceph   ceph   4096 16. Sep 16:39 crash.ses7-host1
>>> drwx-- 4 messagebus lp 4096 16. Sep 16:23 grafana.ses7-host1
>>> drw-rw 2 root   root   4096 24. Aug 10:00 home
>>> drwx-- 2 ceph   ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
>>> drwx-- 3 ceph   ceph   4096 16. Sep 16:37 mon.ses7-host1
>>> drwx-- 2 nobody nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
>>> drwx-- 2 ceph   ceph   4096 29. Sep 08:43 osd.0
>>> drwx-- 2 ceph   ceph   4096 29. Sep 15:11 osd.1
>>> drwx-- 4 root   root   4096 16. Sep 16:12 prometheus.ses7-host1
>>>
>>>
>>> While the directory in a non-containerized deployment looks like this:
>>>
>>> nautilus:~ # ll /var/lib/ceph/osd/ceph-0/ insgesamt 24 lrwxrwxrwx 1 
>>> ceph ceph 93 29. Sep 12:21 block -> 
>>> /dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54
>>> b
>>> 3-4689-9896-f54d005c535d
>>> -rw--- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
>>> -rw--- 1 ceph ceph 37 29. Sep 12:21 fsid
>>> -rw--- 1 ceph ceph 55 29. Sep 12:21 keyring
>>> -rw--- 1 ceph ceph  6 29. Sep 12:21 ready
>>> -rw--- 1 ceph ceph 10 29. Sep 12:21 type
>>> -rw--- 1 ceph ceph  2 29. Sep 12:21 whoami
>>>
>>>
>>> But even if I create the symlink to the osd directory it fails 
>>> because I only have ceph-volume within the containers where the 
>>> symlink is not visible to cephadm.
>>>
>>>
>>> ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1 lrwxrwxrwx 1 root root 57 
>>> 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 -> 
>>> /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
>>>
>>> ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid
>>> b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
>>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-48
>>> 3
>>> d-ae58-0ab97b8d0cc4 Inferring fsid
>>> 152fd738-01bc-11ec-a7fd-fa163e672db2
>>> [...]
>>> /usr/bin/podman: stderr --> Migrate to existing, Source:
>>> ['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] Target:
>>> /var/lib/ceph/osd/ceph-1/block
>>> /usr/bin/podman: stderr  stdout: inferring bluefs devices from 
>>> bluestore path
>>> /usr/bin/podman: stderr  stderr: can't migrate 
>>> /var/lib/ceph/osd/ceph-1/block.db, not a valid bluefs volume
>>> /usr/bin/podman: stderr --> Failed to migrate device, error code:1
>>> /usr/bin/podman: stderr --> Undoing lv tag set
>>> /usr/bin/podman: stderr Failed to migrate to :
>>> ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-48
>>> 3
>>> d-ae58-0ab97b8d0cc4
>>> Traceback (most recent call last):
>>>File "/usr/sbin/cephadm", line 6225, in 
>>>  r = args.func()
>>>File "/usr/sbin/cephadm", line 1363, in _infer_fsid
>>>  re

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
Yes, I believe for you it should work without containers although I  
haven't tried the migrate command in a non-containerized cluster yet.  
But I believe this is a general issue for containerized clusters with  
regards to maintenance. I haven't checked yet if there are existing  
tracker issues for this, but maybe this should be worth creating one?



Zitat von "Szabo, Istvan (Agoda)" :

Actually I don't have containerized deployment, my is normal one. So  
it should work the lvm migrate.


Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Eugen Block 
Sent: Wednesday, September 29, 2021 8:49 PM
To: 胡 玮文 
Cc: Igor Fedotov ; Szabo, Istvan (Agoda)  
; ceph-users@ceph.io
Subject: Re: is it possible to remove the db+wal from an external  
device (nvme)


Email received from the internet. If in doubt, don't click any link  
nor open any attachment !



That's what I did and pasted the results in my previous comments.


Zitat von 胡 玮文 :


Yes. And “cephadm shell” command does not depend on the running
daemon, it will start a new container. So I think it is perfectly fine
to stop the OSD first then run the “cephadm shell” command, and run
ceph-volume in the new shell.

发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:40
收件人: 胡 玮文<mailto:huw...@outlook.com>
抄送: Igor Fedotov<mailto:ifedo...@suse.de>; Szabo, Istvan
(Agoda)<mailto:istvan.sz...@agoda.com>;
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: Re: is it possible to remove the db+wal from an external device
(nvme)

The OSD has to be stopped in order to migrate DB/WAL, it can't be done
live. ceph-volume requires a lock on the device.


Zitat von 胡 玮文 :


I’ve not tried it, but how about:

cephadm shell -n osd.0

then run “ceph-volume” commands in the newly opened shell. The
directory structure seems fine.

$ sudo cephadm shell -n osd.0
Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
Inferring config
/var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
Using recent ceph image
cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37
d7a9b37db1e0ff6691aae6466530 root@host0:/# ll
/var/lib/ceph/osd/ceph-0/ total 68
drwx-- 2 ceph ceph 4096 Sep 20 04:15 ./
drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db ->  
/dev/ubuntu-vg/osd.0.db

-rw--- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
-rw--- 1 ceph ceph  387 Jun 21 13:24 config
-rw--- 1 ceph ceph   37 Sep 20 04:15 fsid
-rw--- 1 ceph ceph   55 Sep 20 04:15 keyring
-rw--- 1 ceph ceph6 Sep 20 04:15 ready
-rw--- 1 ceph ceph3 Apr  2 01:46 require_osd_release
-rw--- 1 ceph ceph   10 Sep 20 04:15 type
-rw--- 1 ceph ceph   38 Sep 17 14:26 unit.configured
-rw--- 1 ceph ceph   48 Nov  9  2020 unit.created
-rw--- 1 ceph ceph   35 Sep 17 14:26 unit.image
-rw--- 1 ceph ceph  306 Sep 17 14:26 unit.meta
-rw--- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
-rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run
-rw--- 1 ceph ceph  142 Sep 17 14:26 unit.stop
-rw--- 1 ceph ceph2 Sep 20 04:15 whoami

发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:29
收件人: Igor Fedotov<mailto:ifedo...@suse.de>
抄送: 胡 玮文<mailto:huw...@outlook.com>; Szabo, Istvan
(Agoda)<mailto:istvan.sz...@agoda.com>;
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: Re: [ceph-users] Re: 回复: [ceph-users] Re: is it possible to
remove the db+wal from an external device (nvme)

Hi Igor,

thanks for your input. I haven't done this in a prod env yet either,
still playing around in a virtual lab env.
I tried the symlink suggestion but it's not that easy, because it
looks different underneath the ceph directory than ceph-volume
expects it. These are the services underneath:

ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
insgesamt 48
drwx-- 3 root   root   4096 16. Sep 16:11 alertmanager.ses7-host1
drwx-- 3 ceph   ceph   4096 29. Sep 09:03 crash
drwx-- 2 ceph   ceph   4096 16. Sep 16:39 crash.ses7-host1
drwx-- 4 messagebus lp 4096 16. Sep 16:23 grafana.ses7-host1
drw-rw 2 root   root   4096 24. Aug 10:00 home
drwx-- 2 ceph   ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
drwx-- 3 ceph   ceph   4096 16. Sep 16:37 mon.ses7-host1
drwx-- 2 nobody nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
drwx-- 2 ceph   ceph   4096 29. Sep 08:43 osd.0
drwx-- 2 ceph   ceph   4096 29. Sep 15:11 osd.1
drwx-- 4 root   root   4096 16. Sep 16:12 prometheus.ses7-host1


While the directory in a non-containerized deployment lo

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Szabo, Istvan (Agoda)
Actually I don't have containerized deployment, my is normal one. So it should 
work the lvm migrate. 

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Eugen Block  
Sent: Wednesday, September 29, 2021 8:49 PM
To: 胡 玮文 
Cc: Igor Fedotov ; Szabo, Istvan (Agoda) 
; ceph-users@ceph.io
Subject: Re: is it possible to remove the db+wal from an external device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


That's what I did and pasted the results in my previous comments.


Zitat von 胡 玮文 :

> Yes. And “cephadm shell” command does not depend on the running 
> daemon, it will start a new container. So I think it is perfectly fine 
> to stop the OSD first then run the “cephadm shell” command, and run 
> ceph-volume in the new shell.
>
> 发件人: Eugen Block<mailto:ebl...@nde.ag>
> 发送时间: 2021年9月29日 21:40
> 收件人: 胡 玮文<mailto:huw...@outlook.com>
> 抄送: Igor Fedotov<mailto:ifedo...@suse.de>; Szabo, Istvan 
> (Agoda)<mailto:istvan.sz...@agoda.com>;
> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> 主题: Re: is it possible to remove the db+wal from an external device 
> (nvme)
>
> The OSD has to be stopped in order to migrate DB/WAL, it can't be done 
> live. ceph-volume requires a lock on the device.
>
>
> Zitat von 胡 玮文 :
>
>> I’ve not tried it, but how about:
>>
>> cephadm shell -n osd.0
>>
>> then run “ceph-volume” commands in the newly opened shell. The 
>> directory structure seems fine.
>>
>> $ sudo cephadm shell -n osd.0
>> Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
>> Inferring config
>> /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
>> Using recent ceph image
>> cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37
>> d7a9b37db1e0ff6691aae6466530 root@host0:/# ll 
>> /var/lib/ceph/osd/ceph-0/ total 68
>> drwx-- 2 ceph ceph 4096 Sep 20 04:15 ./
>> drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
>> lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
>> lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db -> /dev/ubuntu-vg/osd.0.db
>> -rw--- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
>> -rw--- 1 ceph ceph  387 Jun 21 13:24 config
>> -rw--- 1 ceph ceph   37 Sep 20 04:15 fsid
>> -rw--- 1 ceph ceph   55 Sep 20 04:15 keyring
>> -rw--- 1 ceph ceph6 Sep 20 04:15 ready
>> -rw--- 1 ceph ceph3 Apr  2 01:46 require_osd_release
>> -rw--- 1 ceph ceph   10 Sep 20 04:15 type
>> -rw--- 1 ceph ceph   38 Sep 17 14:26 unit.configured
>> -rw--- 1 ceph ceph   48 Nov  9  2020 unit.created
>> -rw--- 1 ceph ceph   35 Sep 17 14:26 unit.image
>> -rw--- 1 ceph ceph  306 Sep 17 14:26 unit.meta
>> -rw--- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
>> -rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run
>> -rw--- 1 ceph ceph  142 Sep 17 14:26 unit.stop
>> -rw--- 1 ceph ceph    2 Sep 20 04:15 whoami
>>
>> 发件人: Eugen Block<mailto:ebl...@nde.ag>
>> 发送时间: 2021年9月29日 21:29
>> 收件人: Igor Fedotov<mailto:ifedo...@suse.de>
>> 抄送: 胡 玮文<mailto:huw...@outlook.com>; Szabo, Istvan 
>> (Agoda)<mailto:istvan.sz...@agoda.com>;
>> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
>> 主题: Re: [ceph-users] Re: 回复: [ceph-users] Re: is it possible to 
>> remove the db+wal from an external device (nvme)
>>
>> Hi Igor,
>>
>> thanks for your input. I haven't done this in a prod env yet either, 
>> still playing around in a virtual lab env.
>> I tried the symlink suggestion but it's not that easy, because it 
>> looks different underneath the ceph directory than ceph-volume 
>> expects it. These are the services underneath:
>>
>> ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
>> insgesamt 48
>> drwx-- 3 root   root   4096 16. Sep 16:11 alertmanager.ses7-host1
>> drwx-- 3 ceph   ceph   4096 29. Sep 09:03 crash
>> drwx-- 2 ceph   ceph   4096 16. Sep 16:39 crash.ses7-host1
>> drwx-- 4 messagebus lp 4096 16. Sep 16:23 grafana.ses7-host1
>> drw-rw 2 root   root   4096 24. Aug 10:00 home
>> drwx-- 2 ceph   ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
>> drwx-- 3 ceph   ceph   4096 16. Sep 16:37 mon.ses7-host1
>> drwx-- 2 nobody nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
>> drwx

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block

That's what I did and pasted the results in my previous comments.


Zitat von 胡 玮文 :

Yes. And “cephadm shell” command does not depend on the running  
daemon, it will start a new container. So I think it is perfectly  
fine to stop the OSD first then run the “cephadm shell” command, and  
run ceph-volume in the new shell.


发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:40
收件人: 胡 玮文<mailto:huw...@outlook.com>
抄送: Igor Fedotov<mailto:ifedo...@suse.de>; Szabo, Istvan  
(Agoda)<mailto:istvan.sz...@agoda.com>;  
ceph-users@ceph.io<mailto:ceph-users@ceph.io>

主题: Re: is it possible to remove the db+wal from an external device (nvme)

The OSD has to be stopped in order to migrate DB/WAL, it can't be done
live. ceph-volume requires a lock on the device.


Zitat von 胡 玮文 :


I’ve not tried it, but how about:

cephadm shell -n osd.0

then run “ceph-volume” commands in the newly opened shell. The
directory structure seems fine.

$ sudo cephadm shell -n osd.0
Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
Inferring config
/var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
Using recent ceph image
cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530
root@host0:/# ll /var/lib/ceph/osd/ceph-0/
total 68
drwx-- 2 ceph ceph 4096 Sep 20 04:15 ./
drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db -> /dev/ubuntu-vg/osd.0.db
-rw--- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
-rw--- 1 ceph ceph  387 Jun 21 13:24 config
-rw--- 1 ceph ceph   37 Sep 20 04:15 fsid
-rw--- 1 ceph ceph   55 Sep 20 04:15 keyring
-rw--- 1 ceph ceph6 Sep 20 04:15 ready
-rw--- 1 ceph ceph3 Apr  2 01:46 require_osd_release
-rw--- 1 ceph ceph   10 Sep 20 04:15 type
-rw--- 1 ceph ceph   38 Sep 17 14:26 unit.configured
-rw--- 1 ceph ceph   48 Nov  9  2020 unit.created
-rw--- 1 ceph ceph   35 Sep 17 14:26 unit.image
-rw--- 1 ceph ceph  306 Sep 17 14:26 unit.meta
-rw--- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
-rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run
-rw--- 1 ceph ceph  142 Sep 17 14:26 unit.stop
-rw--- 1 ceph ceph2 Sep 20 04:15 whoami

发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:29
收件人: Igor Fedotov<mailto:ifedo...@suse.de>
抄送: 胡 玮文<mailto:huw...@outlook.com>; Szabo, Istvan
(Agoda)<mailto:istvan.sz...@agoda.com>;
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: Re: [ceph-users] Re: 回复: [ceph-users] Re: is it possible to
remove the db+wal from an external device (nvme)

Hi Igor,

thanks for your input. I haven't done this in a prod env yet either,
still playing around in a virtual lab env.
I tried the symlink suggestion but it's not that easy, because it
looks different underneath the ceph directory than ceph-volume expects
it. These are the services underneath:

ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
insgesamt 48
drwx-- 3 root   root   4096 16. Sep 16:11 alertmanager.ses7-host1
drwx-- 3 ceph   ceph   4096 29. Sep 09:03 crash
drwx-- 2 ceph   ceph   4096 16. Sep 16:39 crash.ses7-host1
drwx-- 4 messagebus lp 4096 16. Sep 16:23 grafana.ses7-host1
drw-rw 2 root   root   4096 24. Aug 10:00 home
drwx-- 2 ceph   ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
drwx-- 3 ceph   ceph   4096 16. Sep 16:37 mon.ses7-host1
drwx-- 2 nobody nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
drwx-- 2 ceph   ceph   4096 29. Sep 08:43 osd.0
drwx-- 2 ceph   ceph   4096 29. Sep 15:11 osd.1
drwx-- 4 root   root   4096 16. Sep 16:12 prometheus.ses7-host1


While the directory in a non-containerized deployment looks like this:

nautilus:~ # ll /var/lib/ceph/osd/ceph-0/
insgesamt 24
lrwxrwxrwx 1 ceph ceph 93 29. Sep 12:21 block ->
/dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54b3-4689-9896-f54d005c535d
-rw--- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
-rw--- 1 ceph ceph 37 29. Sep 12:21 fsid
-rw--- 1 ceph ceph 55 29. Sep 12:21 keyring
-rw--- 1 ceph ceph  6 29. Sep 12:21 ready
-rw--- 1 ceph ceph 10 29. Sep 12:21 type
-rw--- 1 ceph ceph  2 29. Sep 12:21 whoami


But even if I create the symlink to the osd directory it fails because
I only have ceph-volume within the containers where the symlink is not
visible to cephadm.


ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1
lrwxrwxrwx 1 root root 57 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 ->
/var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/

ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid
b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
Inferring fsid 152fd738-01bc-11ec-a7fd-fa163e672db2
[..

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread 胡 玮文
Yes. And “cephadm shell” command does not depend on the running daemon, it will 
start a new container. So I think it is perfectly fine to stop the OSD first 
then run the “cephadm shell” command, and run ceph-volume in the new shell.

发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:40
收件人: 胡 玮文<mailto:huw...@outlook.com>
抄送: Igor Fedotov<mailto:ifedo...@suse.de>; Szabo, Istvan 
(Agoda)<mailto:istvan.sz...@agoda.com>; 
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: Re: is it possible to remove the db+wal from an external device (nvme)

The OSD has to be stopped in order to migrate DB/WAL, it can't be done
live. ceph-volume requires a lock on the device.


Zitat von 胡 玮文 :

> I’ve not tried it, but how about:
>
> cephadm shell -n osd.0
>
> then run “ceph-volume” commands in the newly opened shell. The
> directory structure seems fine.
>
> $ sudo cephadm shell -n osd.0
> Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
> Inferring config
> /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
> Using recent ceph image
> cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530
> root@host0:/# ll /var/lib/ceph/osd/ceph-0/
> total 68
> drwx-- 2 ceph ceph 4096 Sep 20 04:15 ./
> drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
> lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
> lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db -> /dev/ubuntu-vg/osd.0.db
> -rw--- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
> -rw--- 1 ceph ceph  387 Jun 21 13:24 config
> -rw--- 1 ceph ceph   37 Sep 20 04:15 fsid
> -rw--- 1 ceph ceph   55 Sep 20 04:15 keyring
> -rw--- 1 ceph ceph6 Sep 20 04:15 ready
> -rw--- 1 ceph ceph3 Apr  2 01:46 require_osd_release
> -rw--- 1 ceph ceph   10 Sep 20 04:15 type
> -rw--- 1 ceph ceph   38 Sep 17 14:26 unit.configured
> -rw--- 1 ceph ceph   48 Nov  9  2020 unit.created
> -rw--- 1 ceph ceph   35 Sep 17 14:26 unit.image
> -rw--- 1 ceph ceph  306 Sep 17 14:26 unit.meta
> -rw--- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
> -rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run
> -rw--- 1 ceph ceph  142 Sep 17 14:26 unit.stop
> -rw--- 1 ceph ceph2 Sep 20 04:15 whoami
>
> 发件人: Eugen Block<mailto:ebl...@nde.ag>
> 发送时间: 2021年9月29日 21:29
> 收件人: Igor Fedotov<mailto:ifedo...@suse.de>
> 抄送: 胡 玮文<mailto:huw...@outlook.com>; Szabo, Istvan
> (Agoda)<mailto:istvan.sz...@agoda.com>;
> ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> 主题: Re: [ceph-users] Re: 回复: [ceph-users] Re: is it possible to
> remove the db+wal from an external device (nvme)
>
> Hi Igor,
>
> thanks for your input. I haven't done this in a prod env yet either,
> still playing around in a virtual lab env.
> I tried the symlink suggestion but it's not that easy, because it
> looks different underneath the ceph directory than ceph-volume expects
> it. These are the services underneath:
>
> ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
> insgesamt 48
> drwx-- 3 root   root   4096 16. Sep 16:11 alertmanager.ses7-host1
> drwx-- 3 ceph   ceph   4096 29. Sep 09:03 crash
> drwx-- 2 ceph   ceph   4096 16. Sep 16:39 crash.ses7-host1
> drwx-- 4 messagebus lp 4096 16. Sep 16:23 grafana.ses7-host1
> drw-rw 2 root   root   4096 24. Aug 10:00 home
> drwx-- 2 ceph   ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
> drwx-- 3 ceph   ceph   4096 16. Sep 16:37 mon.ses7-host1
> drwx-- 2 nobody nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
> drwx-- 2 ceph   ceph   4096 29. Sep 08:43 osd.0
> drwx-- 2 ceph   ceph   4096 29. Sep 15:11 osd.1
> drwx-- 4 root   root   4096 16. Sep 16:12 prometheus.ses7-host1
>
>
> While the directory in a non-containerized deployment looks like this:
>
> nautilus:~ # ll /var/lib/ceph/osd/ceph-0/
> insgesamt 24
> lrwxrwxrwx 1 ceph ceph 93 29. Sep 12:21 block ->
> /dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54b3-4689-9896-f54d005c535d
> -rw--- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
> -rw--- 1 ceph ceph 37 29. Sep 12:21 fsid
> -rw--- 1 ceph ceph 55 29. Sep 12:21 keyring
> -rw--- 1 ceph ceph  6 29. Sep 12:21 ready
> -rw--- 1 ceph ceph 10 29. Sep 12:21 type
> -rw--- 1 ceph ceph  2 29. Sep 12:21 whoami
>
>
> But even if I create the symlink to the osd directory it fails because
> I only have ceph-volume within the containers where the symlink is not
> visible to cephadm.
>
>
> ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1
> lrwxrwxrwx 1 root root 57 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 ->
>

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
The OSD has to be stopped in order to migrate DB/WAL, it can't be done  
live. ceph-volume requires a lock on the device.



Zitat von 胡 玮文 :


I’ve not tried it, but how about:

cephadm shell -n osd.0

then run “ceph-volume” commands in the newly opened shell. The  
directory structure seems fine.


$ sudo cephadm shell -n osd.0
Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
Inferring config  
/var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
Using recent ceph image  
cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530

root@host0:/# ll /var/lib/ceph/osd/ceph-0/
total 68
drwx-- 2 ceph ceph 4096 Sep 20 04:15 ./
drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db -> /dev/ubuntu-vg/osd.0.db
-rw--- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
-rw--- 1 ceph ceph  387 Jun 21 13:24 config
-rw--- 1 ceph ceph   37 Sep 20 04:15 fsid
-rw--- 1 ceph ceph   55 Sep 20 04:15 keyring
-rw--- 1 ceph ceph6 Sep 20 04:15 ready
-rw--- 1 ceph ceph3 Apr  2 01:46 require_osd_release
-rw--- 1 ceph ceph   10 Sep 20 04:15 type
-rw--- 1 ceph ceph   38 Sep 17 14:26 unit.configured
-rw--- 1 ceph ceph   48 Nov  9  2020 unit.created
-rw--- 1 ceph ceph   35 Sep 17 14:26 unit.image
-rw--- 1 ceph ceph  306 Sep 17 14:26 unit.meta
-rw--- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
-rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run
-rw--- 1 ceph ceph  142 Sep 17 14:26 unit.stop
-rw--- 1 ceph ceph2 Sep 20 04:15 whoami

发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:29
收件人: Igor Fedotov<mailto:ifedo...@suse.de>
抄送: 胡 玮文<mailto:huw...@outlook.com>; Szabo, Istvan  
(Agoda)<mailto:istvan.sz...@agoda.com>;  
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: Re: [ceph-users] Re: 回复: [ceph-users] Re: is it possible to  
remove the db+wal from an external device (nvme)


Hi Igor,

thanks for your input. I haven't done this in a prod env yet either,
still playing around in a virtual lab env.
I tried the symlink suggestion but it's not that easy, because it
looks different underneath the ceph directory than ceph-volume expects
it. These are the services underneath:

ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
insgesamt 48
drwx-- 3 root   root   4096 16. Sep 16:11 alertmanager.ses7-host1
drwx-- 3 ceph   ceph   4096 29. Sep 09:03 crash
drwx-- 2 ceph   ceph   4096 16. Sep 16:39 crash.ses7-host1
drwx-- 4 messagebus lp 4096 16. Sep 16:23 grafana.ses7-host1
drw-rw 2 root   root   4096 24. Aug 10:00 home
drwx-- 2 ceph   ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
drwx-- 3 ceph   ceph   4096 16. Sep 16:37 mon.ses7-host1
drwx-- 2 nobody nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
drwx-- 2 ceph   ceph   4096 29. Sep 08:43 osd.0
drwx-- 2 ceph   ceph   4096 29. Sep 15:11 osd.1
drwx-- 4 root   root   4096 16. Sep 16:12 prometheus.ses7-host1


While the directory in a non-containerized deployment looks like this:

nautilus:~ # ll /var/lib/ceph/osd/ceph-0/
insgesamt 24
lrwxrwxrwx 1 ceph ceph 93 29. Sep 12:21 block ->
/dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54b3-4689-9896-f54d005c535d
-rw--- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
-rw--- 1 ceph ceph 37 29. Sep 12:21 fsid
-rw--- 1 ceph ceph 55 29. Sep 12:21 keyring
-rw--- 1 ceph ceph  6 29. Sep 12:21 ready
-rw--- 1 ceph ceph 10 29. Sep 12:21 type
-rw--- 1 ceph ceph  2 29. Sep 12:21 whoami


But even if I create the symlink to the osd directory it fails because
I only have ceph-volume within the containers where the symlink is not
visible to cephadm.


ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1
lrwxrwxrwx 1 root root 57 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 ->
/var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/

ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid
b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
Inferring fsid 152fd738-01bc-11ec-a7fd-fa163e672db2
[...]
/usr/bin/podman: stderr --> Migrate to existing, Source:
['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] Target:
/var/lib/ceph/osd/ceph-1/block
/usr/bin/podman: stderr  stdout: inferring bluefs devices from bluestore path
/usr/bin/podman: stderr  stderr: can't migrate
/var/lib/ceph/osd/ceph-1/block.db, not a valid bluefs volume
/usr/bin/podman: stderr --> Failed to migrate device, error code:1
/usr/bin/podman: stderr --> Undoing lv tag set
/usr/bin/podman: stderr Failed to migrate to :
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
Traceback (most recent call last):
   File

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread 胡 玮文
I’ve not tried it, but how about:

cephadm shell -n osd.0

then run “ceph-volume” commands in the newly opened shell. The directory 
structure seems fine.

$ sudo cephadm shell -n osd.0
Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864
Inferring config /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864/osd.0/config
Using recent ceph image 
cr.example.com/infra/ceph@sha256:8a0f6f285edcd6488e2c91d3f9fa43534d37d7a9b37db1e0ff6691aae6466530
root@host0:/# ll /var/lib/ceph/osd/ceph-0/
total 68
drwx-- 2 ceph ceph 4096 Sep 20 04:15 ./
drwxr-x--- 1 ceph ceph 4096 Sep 29 13:32 ../
lrwxrwxrwx 1 ceph ceph   24 Sep 20 04:15 block -> /dev/ceph-hdd/osd.0.data
lrwxrwxrwx 1 ceph ceph   23 Sep 20 04:15 block.db -> /dev/ubuntu-vg/osd.0.db
-rw--- 1 ceph ceph   37 Sep 20 04:15 ceph_fsid
-rw--- 1 ceph ceph  387 Jun 21 13:24 config
-rw--- 1 ceph ceph   37 Sep 20 04:15 fsid
-rw--- 1 ceph ceph   55 Sep 20 04:15 keyring
-rw--- 1 ceph ceph6 Sep 20 04:15 ready
-rw--- 1 ceph ceph3 Apr  2 01:46 require_osd_release
-rw--- 1 ceph ceph   10 Sep 20 04:15 type
-rw--- 1 ceph ceph   38 Sep 17 14:26 unit.configured
-rw--- 1 ceph ceph   48 Nov  9  2020 unit.created
-rw--- 1 ceph ceph   35 Sep 17 14:26 unit.image
-rw--- 1 ceph ceph  306 Sep 17 14:26 unit.meta
-rw--- 1 ceph ceph 1317 Sep 17 14:26 unit.poststop
-rw--- 1 ceph ceph 3021 Sep 17 14:26 unit.run
-rw--- 1 ceph ceph  142 Sep 17 14:26 unit.stop
-rw--- 1 ceph ceph2 Sep 20 04:15 whoami

发件人: Eugen Block<mailto:ebl...@nde.ag>
发送时间: 2021年9月29日 21:29
收件人: Igor Fedotov<mailto:ifedo...@suse.de>
抄送: 胡 玮文<mailto:huw...@outlook.com>; Szabo, Istvan 
(Agoda)<mailto:istvan.sz...@agoda.com>; 
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: Re: [ceph-users] Re: 回复: [ceph-users] Re: is it possible to remove the 
db+wal from an external device (nvme)

Hi Igor,

thanks for your input. I haven't done this in a prod env yet either,
still playing around in a virtual lab env.
I tried the symlink suggestion but it's not that easy, because it
looks different underneath the ceph directory than ceph-volume expects
it. These are the services underneath:

ses7-host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/
insgesamt 48
drwx-- 3 root   root   4096 16. Sep 16:11 alertmanager.ses7-host1
drwx-- 3 ceph   ceph   4096 29. Sep 09:03 crash
drwx-- 2 ceph   ceph   4096 16. Sep 16:39 crash.ses7-host1
drwx-- 4 messagebus lp 4096 16. Sep 16:23 grafana.ses7-host1
drw-rw 2 root   root   4096 24. Aug 10:00 home
drwx-- 2 ceph   ceph   4096 16. Sep 16:37 mgr.ses7-host1.wmgyit
drwx-- 3 ceph   ceph   4096 16. Sep 16:37 mon.ses7-host1
drwx-- 2 nobody nobody 4096 16. Sep 16:37 node-exporter.ses7-host1
drwx-- 2 ceph   ceph   4096 29. Sep 08:43 osd.0
drwx-- 2 ceph   ceph   4096 29. Sep 15:11 osd.1
drwx-- 4 root   root   4096 16. Sep 16:12 prometheus.ses7-host1


While the directory in a non-containerized deployment looks like this:

nautilus:~ # ll /var/lib/ceph/osd/ceph-0/
insgesamt 24
lrwxrwxrwx 1 ceph ceph 93 29. Sep 12:21 block ->
/dev/ceph-a6d78a29-637f-494b-a839-76251fcff67e/osd-block-39340a48-54b3-4689-9896-f54d005c535d
-rw--- 1 ceph ceph 37 29. Sep 12:21 ceph_fsid
-rw--- 1 ceph ceph 37 29. Sep 12:21 fsid
-rw--- 1 ceph ceph 55 29. Sep 12:21 keyring
-rw--- 1 ceph ceph  6 29. Sep 12:21 ready
-rw--- 1 ceph ceph 10 29. Sep 12:21 type
-rw--- 1 ceph ceph  2 29. Sep 12:21 whoami


But even if I create the symlink to the osd directory it fails because
I only have ceph-volume within the containers where the symlink is not
visible to cephadm.


ses7-host1:~ # ll /var/lib/ceph/osd/ceph-1
lrwxrwxrwx 1 root root 57 29. Sep 15:08 /var/lib/ceph/osd/ceph-1 ->
/var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/

ses7-host1:~ # cephadm ceph-volume lvm migrate --osd-id 1 --osd-fsid
b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
Inferring fsid 152fd738-01bc-11ec-a7fd-fa163e672db2
[...]
/usr/bin/podman: stderr --> Migrate to existing, Source:
['--devs-source', '/var/lib/ceph/osd/ceph-1/block.db'] Target:
/var/lib/ceph/osd/ceph-1/block
/usr/bin/podman: stderr  stdout: inferring bluefs devices from bluestore path
/usr/bin/podman: stderr  stderr: can't migrate
/var/lib/ceph/osd/ceph-1/block.db, not a valid bluefs volume
/usr/bin/podman: stderr --> Failed to migrate device, error code:1
/usr/bin/podman: stderr --> Undoing lv tag set
/usr/bin/podman: stderr Failed to migrate to :
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
Traceback (most recent call last):
   File "/usr/sbin/cephadm", line 6225, in 
 r = args.func()
   File "/usr/sbin/cephadm", line 1363, in _infer_fsid
 return func()
   File

[ceph-users] Re: 回复: [ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block

Hi,

I just tried with 'ceph-volume lvm migrate' in Octopus but it doesn't  
really work. I'm not sure if I'm missing something here, but I believe  
it's again the already discussed containers issue. To be able to run  
the command for an OSD the OSD has to be offline, but then you don't  
have access to the block.db because the path is different from outside  
the container:


---snip---
[ceph: root@host1 /]# ceph-volume lvm migrate --osd-id 1 --osd-fsid  
b4c772aa-07f8-483d-ae58-0ab97b8d0cc4 --from db --target  
ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
--> Migrate to existing, Source: ['--devs-source',  
'/var/lib/ceph/osd/ceph-1/block.db'] Target:  
/var/lib/ceph/osd/ceph-1/block

 stdout: inferring bluefs devices from bluestore path
 stderr:  
/home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_mount_for_bluefs()' thread 7fde05b96180 time  
2021-09-29T06:56:24.790161+
 stderr:  
/home/abuild/rpmbuild/BUILD/ceph-15.2.14-84-gb6e5642e260/src/os/bluestore/BlueStore.cc: 6876: FAILED ceph_assert(r ==  
0)
 stderr: 2021-09-29T06:56:24.787+ 7fde05b96180 -1  
bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock  
/var/lib/ceph/osd/ceph-1/fsid (is another ceph-osd still running?)(11)  
Resource temporarily unavailable



# path outside
host1:~ # ll /var/lib/ceph/152fd738-01bc-11ec-a7fd-fa163e672db2/osd.1/
insgesamt 60
lrwxrwxrwx 1 ceph ceph   93 29. Sep 08:43 block ->  
/dev/ceph-b1ddff4b-95e8-4b91-b451-a3ea35d16ec0/osd-block-b4c772aa-07f8-483d-ae58-0ab97b8d0cc4
lrwxrwxrwx 1 ceph ceph   90 29. Sep 08:43 block.db ->  
/dev/ceph-6f1b8f49-daf2-4631-a2ef-12e9452b01ea/osd-db-69b11aa0-af96-443e-8f03-5afa5272131f

---snip---


But if I shutdown the OSD I can't access the block and block.db  
devices. I'm not even sure how this is supposed to work with cephadm.  
Maybe I'm misunderstanding, though. Or is there a way to provide the  
offline block.db path to 'ceph-volume lvm migrate'?




Zitat von 胡 玮文 :

You may need to use `ceph-volume lvm migrate’ [1] instead of  
ceph-bluestore-tool. If I recall correctly, this is a pretty new  
feature, I’m not sure whether it is available to your version.


If you use ceph-bluestore-tool, then you need to modify the LVM tags  
manually. Please refer to the previous threads, e.g. [2] and some  
more.


[1]: https://docs.ceph.com/en/latest/man/8/ceph-volume/#migrate
[2]:  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VX23NQ66P3PPEX36T3PYYMHPLBSFLMYA/#JLNDFGXR4ZLY27DHD3RJTTZEDHRZJO4Q


发件人: Szabo, Istvan (Agoda)<mailto:istvan.sz...@agoda.com>
发送时间: 2021年9月28日 18:20
收件人: Eugen Block<mailto:ebl...@nde.ag>;  
ceph-users@ceph.io<mailto:ceph-users@ceph.io>
主题: [ceph-users] Re: is it possible to remove the db+wal from an  
external device (nvme)


Gave a try of it, so all the 3 osds finally failed :/ Not sure what  
went wrong.


Do the normal maintenance things, ceph osd set noout, ceph osd set  
norebalance, stop the osd and run this command:
ceph-bluestore-tool bluefs-bdev-migrate --dev-target  
/var/lib/ceph/osd/ceph-0/block --devs-source  
/var/lib/ceph/osd/ceph-8/block.db --path /var/lib/ceph/osd/ceph-8/

Output:
device removed:1 /var/lib/ceph/osd/ceph-8/block.db
device added: 1 /dev/dm-2

When tried to start I got this in the log:
osd.8 0 OSD:init: unable to mount object store
 ** ERROR: osd init failed: (13) Permission denied
set uid:gid to 167:167 (ceph:ceph)
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2)  
octopus (stable), process ceph-osd, pid 1512261

pidfile_write: ignore empty --pid-file

From the another 2 osds the block.db removed and I can start it back.
I've zapped the db drive just to be removed from the device  
completely and after machine restart none of these 2 osds came back,  
I guess missing the db device.


Is there any steps missing?
1.Noout+norebalance
2. Stop osd
3. migrate with the above command the block.db to the block.
4. do on the other osds which is sharing the same db device that  
want to remove.

5. zap the db device
6. start back the osds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-----Original Message-
From: Eugen Block 
Sent: Monday, September 27, 2021 7:42 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: is it possible to remove the db+wal from  
an external device (nvme)


Email received from the internet. If in doubt, don't click any link  
nor open any attachment !



Hi,

I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use  
here. I haven't tried it in a production environment yet, only in  
virtual labs.


Regards,
Eugen


Z

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-28 Thread Eugen Block
I tried this in my lab again with Nautilus and it worked as expected,  
I could start the new OSD immediately. I'll try with Octopus tomorrow  
again.



Zitat von "Szabo, Istvan (Agoda)" :

Gave a try of it, so all the 3 osds finally failed :/ Not sure what  
went wrong.


Do the normal maintenance things, ceph osd set noout, ceph osd set  
norebalance, stop the osd and run this command:
ceph-bluestore-tool bluefs-bdev-migrate --dev-target  
/var/lib/ceph/osd/ceph-0/block --devs-source  
/var/lib/ceph/osd/ceph-8/block.db --path /var/lib/ceph/osd/ceph-8/

Output:
device removed:1 /var/lib/ceph/osd/ceph-8/block.db
device added: 1 /dev/dm-2

When tried to start I got this in the log:
osd.8 0 OSD:init: unable to mount object store
 ** ERROR: osd init failed: (13) Permission denied
set uid:gid to 167:167 (ceph:ceph)
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2)  
octopus (stable), process ceph-osd, pid 1512261

pidfile_write: ignore empty --pid-file

From the another 2 osds the block.db removed and I can start it back.
I've zapped the db drive just to be removed from the device  
completely and after machine restart none of these 2 osds came back,  
I guess missing the db device.


Is there any steps missing?
1.Noout+norebalance
2. Stop osd
3. migrate with the above command the block.db to the block.
4. do on the other osds which is sharing the same db device that  
want to remove.

5. zap the db device
6. start back the osds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Eugen Block 
Sent: Monday, September 27, 2021 7:42 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: is it possible to remove the db+wal from  
an external device (nvme)


Email received from the internet. If in doubt, don't click any link  
nor open any attachment !



Hi,

I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use  
here. I haven't tried it in a production environment yet, only in  
virtual labs.


Regards,
Eugen


Zitat von "Szabo, Istvan (Agoda)" :


Hi,

Seems like in our config the nvme device  as a wal+db in front of the
ssd slowing down the ssds osds.
I'd like to avoid to rebuild all the osd-, is there a way somehow
migrate to the "slower device" the wal+db without reinstall?

Ty
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an  
email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-28 Thread Szabo, Istvan (Agoda)
Gave a try of it, so all the 3 osds finally failed :/ Not sure what went wrong.

Do the normal maintenance things, ceph osd set noout, ceph osd set norebalance, 
stop the osd and run this command:
ceph-bluestore-tool bluefs-bdev-migrate --dev-target 
/var/lib/ceph/osd/ceph-0/block --devs-source /var/lib/ceph/osd/ceph-8/block.db 
--path /var/lib/ceph/osd/ceph-8/
Output:
device removed:1 /var/lib/ceph/osd/ceph-8/block.db
device added: 1 /dev/dm-2

When tried to start I got this in the log:
osd.8 0 OSD:init: unable to mount object store
 ** ERROR: osd init failed: (13) Permission denied
set uid:gid to 167:167 (ceph:ceph)
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus 
(stable), process ceph-osd, pid 1512261
pidfile_write: ignore empty --pid-file

>From the another 2 osds the block.db removed and I can start it back.
I've zapped the db drive just to be removed from the device completely and 
after machine restart none of these 2 osds came back, I guess missing the db 
device.

Is there any steps missing?
1.Noout+norebalance
2. Stop osd
3. migrate with the above command the block.db to the block.
4. do on the other osds which is sharing the same db device that want to remove.
5. zap the db device
6. start back the osds.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Eugen Block  
Sent: Monday, September 27, 2021 7:42 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: is it possible to remove the db+wal from an external 
device (nvme)

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi,

I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use here. I 
haven't tried it in a production environment yet, only in virtual labs.

Regards,
Eugen


Zitat von "Szabo, Istvan (Agoda)" :

> Hi,
>
> Seems like in our config the nvme device  as a wal+db in front of the 
> ssd slowing down the ssds osds.
> I'd like to avoid to rebuild all the osd-, is there a way somehow 
> migrate to the "slower device" the wal+db without reinstall?
>
> Ty
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-27 Thread Eugen Block

Hi,

I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use  
here. I haven't tried it in a production environment yet, only in  
virtual labs.


Regards,
Eugen


Zitat von "Szabo, Istvan (Agoda)" :


Hi,

Seems like in our config the nvme device  as a wal+db in front of  
the ssd slowing down the ssds osds.
I'd like to avoid to rebuild all the osd-, is there a way somehow  
migrate to the "slower device" the wal+db without reinstall?


Ty
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io