[ceph-users] Re: cephfs vs rbd vs rgw

2021-05-25 Thread Fox, Kevin M
The quick answer, is they are optimized for different use cases.

Things like relational databases (mysql, postgresql) benefit from the 
performance that a dedicated filesystem can provide (rbd). Shared filesystems 
are usually counter indicated with such software.

Shared filesystems like cephfs are nice but can't scale quite as well in number 
of filesystems as something like rbd. Latency in certain operations can be 
worse. Posix network filesystems have their drawbacks. Posix wasn't really 
designed around network fs's. But super useful when you need to share 
filesystems across nodes. A lot of existing software assumes shared 
filesystems. Can get pretty good scaling easily out of some software with it.

rgw is a very different protocol (webby). A lot of existing software doesn't 
work with it. So comparability is not as good. But thats changing. Also has 
some assumptions around how data is read/written. Can be scaled quite large. 
http clients are very easy to come by to speak to it though, so for new 
software, its pretty nice.

So, its not necessarily a "which one should I support". One of cephs great 
features is you can support all 3 with the same storage and use them all as 
needed.


From: Jorge Garcia 
Sent: Tuesday, May 25, 2021 4:43 PM
To: ceph-users@ceph.io
Subject: [ceph-users] cephfs vs rbd vs rgw

Check twice before you click! This email originated from outside PNNL.


This may be too broad of a topic, or opening a can of worms, but we are
running a CEPH environment and I was wondering if there's any guidance
about this question:

Given that some group would like to store 50-100 TBs of data on CEPH and
use it from a linux environment, are there any advantages or
disadvantages in terms of performance/ease of use/learning curve to
using cephfs vs using a block device thru rbd vs using object storage
thru rgw? Here are my general thoughts:

cephfs - Until recently, you were not allowed to have multiple
filesystems. Not sure about performance.

rbd - Can only be mounted on one system at a time, but I guess that
filesystem could then be served using NFS.

rgw - A different usage model from regular linux file/directory
structure. Are there advantages to forcing people to use this interface?

I'm tempted to set up 3 separate areas and try them and compare the
results, but I'm wondering if somebody has done some similar experiment
in the past.

Thanks for any help you can provide!

Jorge
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs vs rbd vs rgw

2021-05-25 Thread Cory Hawkvelt
Yeah, agreed. My first question would be how is your user going to consume
the storage?
You'll struggle to run VM's on RadosGW and if they are doing archival
backups then RBD is likely not the best solution.

Each has very different requirements at the hardware level, for example if
you are talking about running dozens of VM's then an SSD\NVME based cluster
exposing RBD is a good solution, if you want to store large amounts of
video files for a security system then a SATA based cluster with some NVME
cache exposing S3 via RadosGW could be a good solution.

On Wed, May 26, 2021 at 9:21 AM Matt Benjamin  wrote:

> Hi Jorge,
>
> I think it depends on your workload.
>
> On Tue, May 25, 2021 at 7:43 PM Jorge Garcia  wrote:
> >
> > This may be too broad of a topic, or opening a can of worms, but we are
> > running a CEPH environment and I was wondering if there's any guidance
> > about this question:
> >
> > Given that some group would like to store 50-100 TBs of data on CEPH and
> > use it from a linux environment, are there any advantages or
> > disadvantages in terms of performance/ease of use/learning curve to
> > using cephfs vs using a block device thru rbd vs using object storage
> > thru rgw? Here are my general thoughts:
> >
> > cephfs - Until recently, you were not allowed to have multiple
> > filesystems. Not sure about performance.
> >
>
> I/O performance can be /very/ good.  Metadata performance has can
> vary.  If you need shared POSIX access ("native" or NFS or SMB), you
> need cephfs.
>
> > rbd - Can only be mounted on one system at a time, but I guess that
> > filesystem could then be served using NFS.
>
> Yes, but it's single attach.
>
> >
> > rgw - A different usage model from regular linux file/directory
> > structure. Are there advantages to forcing people to use this interface?
>
> There are advantages.  S3 has become a preferred interface for some
> applications, especially analytics (e.g., Hadoop, Spark, PrestoSql)).
>
> >
> > I'm tempted to set up 3 separate areas and try them and compare the
> > results, but I'm wondering if somebody has done some similar experiment
> > in the past.
>
> Not sure, good question.
>
> Matt
>
> >
> > Thanks for any help you can provide!
> >
> > Jorge
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs vs rbd vs rgw

2021-05-25 Thread Matt Benjamin
Hi Jorge,

I think it depends on your workload.

On Tue, May 25, 2021 at 7:43 PM Jorge Garcia  wrote:
>
> This may be too broad of a topic, or opening a can of worms, but we are
> running a CEPH environment and I was wondering if there's any guidance
> about this question:
>
> Given that some group would like to store 50-100 TBs of data on CEPH and
> use it from a linux environment, are there any advantages or
> disadvantages in terms of performance/ease of use/learning curve to
> using cephfs vs using a block device thru rbd vs using object storage
> thru rgw? Here are my general thoughts:
>
> cephfs - Until recently, you were not allowed to have multiple
> filesystems. Not sure about performance.
>

I/O performance can be /very/ good.  Metadata performance has can
vary.  If you need shared POSIX access ("native" or NFS or SMB), you
need cephfs.

> rbd - Can only be mounted on one system at a time, but I guess that
> filesystem could then be served using NFS.

Yes, but it's single attach.

>
> rgw - A different usage model from regular linux file/directory
> structure. Are there advantages to forcing people to use this interface?

There are advantages.  S3 has become a preferred interface for some
applications, especially analytics (e.g., Hadoop, Spark, PrestoSql)).

>
> I'm tempted to set up 3 separate areas and try them and compare the
> results, but I'm wondering if somebody has done some similar experiment
> in the past.

Not sure, good question.

Matt

>
> Thanks for any help you can provide!
>
> Jorge
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephfs vs rbd vs rgw

2021-05-25 Thread Jorge Garcia
This may be too broad of a topic, or opening a can of worms, but we are 
running a CEPH environment and I was wondering if there's any guidance 
about this question:


Given that some group would like to store 50-100 TBs of data on CEPH and 
use it from a linux environment, are there any advantages or 
disadvantages in terms of performance/ease of use/learning curve to 
using cephfs vs using a block device thru rbd vs using object storage 
thru rgw? Here are my general thoughts:


cephfs - Until recently, you were not allowed to have multiple 
filesystems. Not sure about performance.


rbd - Can only be mounted on one system at a time, but I guess that 
filesystem could then be served using NFS.


rgw - A different usage model from regular linux file/directory 
structure. Are there advantages to forcing people to use this interface?


I'm tempted to set up 3 separate areas and try them and compare the 
results, but I'm wondering if somebody has done some similar experiment 
in the past.


Thanks for any help you can provide!

Jorge
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-05-25 Thread Adrian Nicolae

Hi,

On my setup I didn't enable a strech cluster. It's just a 3 x VM setup 
running on the same Proxmox node, all the nodes are using a single 
unique network. I installed Ceph using the documented cephadm flow.


Thanks for the confirmation, Greg! I‘ll try with a newer release then.   >That’s why we’re testing, isn’t it? ;-) >Then the OPs issue is 
probably not resolved yet since he didn’t >mention a stretch cluster. 
Sorry for high-jacking the thread.




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Month June Schedule Now Available

2021-05-25 Thread Mike Perez
Hi everyone,

The Ceph Month June schedule is now available:

https://pad.ceph.com/p/ceph-month-june-2021

We have great sessions from component updates, performance best
practices, Ceph on different architectures, BoF sessions to get more
involved with working groups in the community, and more! You may also
leave open discussion topics for the listed talks that we'll get to
each Q/A portion.

I will provide the video stream link on this thread and etherpad once
it's available. You can also add the Ceph community calendar, which
will have the Ceph Month sessions prefixed with "Ceph Month" to get
local timezone conversions.

https://calendar.google.com/calendar/embed?src=9ts9c7lt7u1vic2ijvvqqlfpo0%40group.calendar.google.com

Thank you to our speakers for taking the time to share with us all the
latest best practices and usage with Ceph!

--
Mike Perez
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-05-25 Thread Eugen Block
Thanks for the confirmation, Greg! I‘ll try with a newer release then.  
That’s why we’re testing, isn’t it? ;-)
Then the OPs issue is probably not resolved yet since he didn’t  
mention a stretch cluster. Sorry for high-jacking the thread.


Zitat von Gregory Farnum :


On Tue, May 25, 2021 at 7:17 AM Eugen Block  wrote:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700  
time

2021-05-25T13:44:26.732857+
2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v  
>=

9)
2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]:  ceph version
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]:  1:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x158) [0x7ff3bf61a59c]
2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]:  2:
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]:  3:
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]:  4:
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]:  5:
(OSDMonitor::get_version(unsigned long, unsigned long,
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]:  6:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]:  7:
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
boost::intrusive_ptr)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]:  8:
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]:  9:
(Monitor::handle_subscribe(boost::intrusive_ptr)+0xe82)
[0x55e3779da402]
2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]:  10:
(Monitor::dispatch_op(boost::intrusive_ptr)+0x78d)
[0x55e377a002ed]
2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]:  11:
(Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]:  12:
(Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x5c)
[0x55e377a2ffdc]
2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]:  13:
(DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]:  14:
(DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]:  15:
/lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]:  16: clone()
2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug  0>
2021-05-25T13:44:26.742+ 7ff3b1aa1700 -1 *** Caught signal
(Aborted) **
2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]:  in thread
7ff3b1aa1700 thread_name:ms_dispatch
2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]:  ceph version
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]:  1:
/lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20]
2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]:  2: gsignal()
2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]:  3: abort()
2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]:  4:
(ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a9) [0x7ff3bf61a5ed]
2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]:  5:
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]:  6:
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]:  7:
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]:  8:
(OSDMonitor::get_version(unsigned long, unsigned long,
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]:  9:
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
long)+0x301) 

[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-05-25 Thread Gregory Farnum
On Tue, May 25, 2021 at 7:17 AM Eugen Block  wrote:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc:
>  In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, 
> uint64_t) const' thread 7ff3b1aa1700 time
> 2021-05-25T13:44:26.732857+
> 2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]:
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc:
>  658: FAILED ceph_assert(target_v >=
> 9)
> 2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
> 2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]:  ceph version
> 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
> 2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]:  1:
> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x158) [0x7ff3bf61a59c]
> 2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]:  2:
> /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
> 2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]:  3:
> (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
> long) const+0x539) [0x7ff3bfa529f9]
> 2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]:  4:
> (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
> unsigned long)+0x1c9) [0x55e377b36df9]
> 2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]:  5:
> (OSDMonitor::get_version(unsigned long, unsigned long,
> ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
> 2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]:  6:
> (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
> long)+0x301) [0x55e377b3a3c1]
> 2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]:  7:
> (OSDMonitor::send_incremental(unsigned int, MonSession*, bool,
> boost::intrusive_ptr)+0x104) [0x55e377b3b094]
> 2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]:  8:
> (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
> 2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]:  9:
> (Monitor::handle_subscribe(boost::intrusive_ptr)+0xe82)
> [0x55e3779da402]
> 2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]:  10:
> (Monitor::dispatch_op(boost::intrusive_ptr)+0x78d)
> [0x55e377a002ed]
> 2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]:  11:
> (Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910]
> 2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]:  12:
> (Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x5c)
> [0x55e377a2ffdc]
> 2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]:  13:
> (DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a]
> 2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]:  14:
> (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71]
> 2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]:  15:
> /lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a]
> 2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]:  16: clone()
> 2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]:
> 2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug  0>
> 2021-05-25T13:44:26.742+ 7ff3b1aa1700 -1 *** Caught signal
> (Aborted) **
> 2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]:  in thread
> 7ff3b1aa1700 thread_name:ms_dispatch
> 2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]:
> 2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]:  ceph version
> 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
> 2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]:  1:
> /lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20]
> 2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]:  2: gsignal()
> 2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]:  3: abort()
> 2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]:  4:
> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1a9) [0x7ff3bf61a5ed]
> 2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]:  5:
> /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
> 2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]:  6:
> (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned
> long) const+0x539) [0x7ff3bfa529f9]
> 2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]:  7:
> (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,
> unsigned long)+0x1c9) [0x55e377b36df9]
> 2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]:  8:
> (OSDMonitor::get_version(unsigned long, unsigned long,
> ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
> 2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]:  9:
> (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned
> long)+0x301) [0x55e377b3a3c1]
> 2021-05-25T15:44:26.991557+02:00 pacific1 conmon[5132]:  10:
> 

[ceph-users] Re: Very uneven OSD utilization

2021-05-25 Thread Sergei Genchev
Thank you Janne,
I will give upmap a shot. Need to try it first in some non-prod
cluster. Non-prod clusters are doing much better for me even though
they have a lot fewer OSDs..
Thanks everyone!

On Tue, May 25, 2021 at 12:48 AM Janne Johansson  wrote:
>
> I would suggest enabling the upmap balancer if you haven't done that,
> it should help even data out. Even if it would not do better than some
> manual rebalancing scheme, it will at least do it nicely in the
> background some 8 PGs at a time so it doesn't impact client traffic.
>
> I looks very weird to have such uneven distribution even while having
> lots of PGs (which was my first guess =)
>
> Den tis 25 maj 2021 kl 03:47 skrev Sergei Genchev :
> >
> > Hello,
> > I am running a nautilus cluster with 5 OSD nodes/90 disks that is
> > exclusively used for S3. My disks are identical, but utilization
> > ranges from 9% to 82%, and I am starting to get backfill_toofull
> > errors even though I have only used 150TB out of 650TB of data.
> >  - Other than manually crush reweighting OSDs, is there any other
> > option for me ?
> >  - what would cause this uneven distribution? Is there some
> > documentation on how to track down what's going on?
> > output of 'ceph osd df" is at https://pastebin.com/17HWFR12
> >  Thank you!
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-05-25 Thread Eugen Block

Hi,

I wanted to explore the stretch mode in pacific (16.2.4) and see how  
it behaves with a DC failure. It seems as if I'm hitting the same or  
at least a similar issue here. To verify if it's the stretch mode I  
removed the cluster and rebuilt it without stretch mode, three hosts  
in three DCs and started to reboot. First I rebooted one node, the  
cluster came back to HEALTH_OK. Then I rebooted two of the three nodes  
and again everything recovered successfully.
Then I rebuilt a 5 node cluster, two DCs in stretch mode with three  
MONs, one being a tiebreaker in a virtual third DC. The stretch rule  
was applied (4 replicas across all 4 nodes).


To test a DC failure I simply shut down two nodes from DC2, although  
the pool's min_size was reduced to 1 by ceph I couldn't read or write  
anything to a mapped rbd, althouh ceph still was responsive with two  
active MONs.
When I booted the other two nodes again the cluster was not able to  
recover, it ends up in a loop of restarting the MON containers (the  
OSDs recover eventually) until systemd shuts them down due to too many  
restarts.
For a couple of seconds I get a ceph status, but I never get all three  
MONs up. When there are two MONs up and I restart the missing one a  
different MON is shut down.


I also see the error message mentioned here in this thread

heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7ff3b3aa5700'  
had timed out after 0.0s


I'll add some more information, a stack trace from MON failure:

---snip---
2021-05-25T15:44:26.988562+02:00 pacific1 conmon[5132]: 5  
mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable  
= 1 - now=2021-05-25T13:44:26.730359+  
lease_expire=2021-05-25T13:44:30.270907+ has v0 lc 9839
2021-05-25T15:44:26.988638+02:00 pacific1 conmon[5132]: debug -5>  
2021-05-25T13:44:26.726+ 7ff3b1aa1700  2 mon.pacific1@0(leader)  
e13 send_reply 0x55e37aae3860 0x55e37affa9c0 auth_reply(proto 2 0 (0)  
Success) v1
2021-05-25T15:44:26.988714+02:00 pacific1 conmon[5132]: debug -4>  
2021-05-25T13:44:26.726+ 7ff3b1aa1700  5  
mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable  
= 1 - now=2021-05-25T13:44:26.731084+  
lease_expire=2021-05-25T13:44:30.270907+ has v0 lc 9839
2021-05-25T15:44:26.988790+02:00 pacific1 conmon[5132]: debug -3>  
2021-05-25T13:44:26.726+ 7ff3b1aa1700  2 mon.pacific1@0(leader)  
e13 send_reply 0x55e37b14def0 0x55e37ab11ba0 auth_reply(proto 2 0 (0)  
Success) v1
2021-05-25T15:44:26.988929+02:00 pacific1 conmon[5132]: debug -2>  
2021-05-25T13:44:26.730+ 7ff3b1aa1700  5  
mon.pacific1@0(leader).osd e117 send_incremental [105..117] to  
client.84146
2021-05-25T15:44:26.989012+02:00 pacific1 conmon[5132]: debug -1>  
2021-05-25T13:44:26.734+ 7ff3b1aa1700 -1  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700 time  
2021-05-25T13:44:26.732857+
2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]:  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v >=  
9)

2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]:
2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]:  ceph version  
16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]:  1:  
(ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x158) [0x7ff3bf61a59c]
2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]:  2:  
/usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6]
2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]:  3:  
(OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned  
long) const+0x539) [0x7ff3bfa529f9]
2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]:  4:  
(OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&,  
unsigned long)+0x1c9) [0x55e377b36df9]
2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]:  5:  
(OSDMonitor::get_version(unsigned long, unsigned long,  
ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234]
2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]:  6:  
(OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned  
long)+0x301) [0x55e377b3a3c1]
2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]:  7:  
(OSDMonitor::send_incremental(unsigned int, MonSession*, bool,  
boost::intrusive_ptr)+0x104) [0x55e377b3b094]
2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]:  8:  
(OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792]
2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]:  9:  

[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Am Di., 25. Mai 2021 um 09:23 Uhr schrieb Boris Behrens :
>
> Hi,
> I am still searching for a reason why these two values differ so much.
>
> I am currently deleting a giant amount of orphan objects (43mio, most
> of them under 64kb), but the difference get larger instead of smaller.
>
> This was the state two days ago:
> >
> > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk 
> > '{ print $2 }' | tr -d , | paste -sd+ - | bc
> > 175977343264
> >
> > [root@s3db1 ~]# rados df
> > POOL_NAME  USED   OBJECTS CLONESCOPIES 
> > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS  RDWR_OPS  WR 
> > USED COMPR UNDER COMPR
> > ...
> > eu-central-1.rgw.buckets.data   766 TiB 134632397  0 403897191  
> > 0   00 1076480853  45 TiB 532045864 551 TiB0 B  
> >0 B
> > ...
> > total_objects135866676
> >
> > [root@s3db1 ~]# ceph df...
> > eu-central-1.rgw.buckets.data   11 2048 253 TiB 134.63M 
> > 766 TiB 90.3227 TiB
>
> And this is todays state:
> >
> > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk 
> > '{ print $2 }' | tr -d , | paste -sd+ - | bc
> > 177144806812
> >
> > [root@s3db1 ~]# rados df
> > ...
> > eu-central-1.rgw.buckets.data   786 TiB 120025590  0 360076770
> > ...
> > total_objects121261889
> >
> > [root@s3db1 ~]# ceph df
> > ...
> > eu-central-1.rgw.buckets.data   11 2048 260 TiB 120.02M 
> > 786 TiB 92.5921 TiB
>
> I would love to free up the missing 80TB :)
> Any suggestions?

As Konstatin mentioned, maybe it was the GC, but I just processes all
objects (with --include-all), but the situation did not change.

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd cp versus deep cp?

2021-05-25 Thread Jan Kasprzak
Eugen,

Eugen Block wrote:
: Mykola explained it in this thread [1] a couple of months ago:
: 
: `rbd cp` will copy only one image snapshot (or the image head) to the
: destination.
: 
: `rbd deep cp` will copy all image snapshots and the image head.

Thanks for the explanation. I have created a pull request with the docs
update:

https://github.com/ceph/ceph/pull/41529
https://github.com/ceph/ceph/pull/41529/commits/87bb4917de2eda847479e0bae38cade5af79cc37

Is it OK?

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm: How to replace failed HDD where DB is on SSD

2021-05-25 Thread Kai Stian Olstad

Hi

The server run 15.2.9 and has 15 HDD and 3 SSD.
The OSDs was created with this YAML file

hdd.yml

service_type: osd
service_id: hdd
placement:
  host_pattern: 'pech-hd-*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0


The result was that the 3 SSD is added to 1 VG with 15 LV on it.

# vgs | egrep "VG|dbs"
  VG  #PV #LV #SN Attr   
VSize  VFree
  ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b   3  15   0 wz--n- 
<5.24t 48.00m



One of the osd failed and I run rm with replace

# ceph orch osd rm 178 --replace

and the result is

# ceph osd tree | grep "ID|destroyed"
ID   CLASS  WEIGHT  TYPE NAME STATUS REWEIGHT  
PRI-AFF
178hdd12.82390  osd.178   destroyed 0  
1.0



But I'm not able to replace the disk with the same YAML file as shown 
above.



# ceph orch apply osd -i hdd.yml --dry-run

OSDSPEC PREVIEWS

+-+--+--+--++-+
|SERVICE  |NAME  |HOST  |DATA  |DB  |WAL  |
+-+--+--+--++-+
+-+--+--+--++-+

I guess this is the wrong way to do it, but I can't find the answer in 
the documentation.

So how can I replace this failed disk in Cephadm?


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Am Di., 25. Mai 2021 um 09:39 Uhr schrieb Konstantin Shalygin :
>
> Hi,
>
> On 25 May 2021, at 10:23, Boris Behrens  wrote:
>
> I am still searching for a reason why these two values differ so much.
>
> I am currently deleting a giant amount of orphan objects (43mio, most
> of them under 64kb), but the difference get larger instead of smaller.
>
>
> When user trough API make a delete, objects just marks as deleted, then
> ceph-radosgw gc perform actual delete, you can see queue via `radosgw-admin 
> gc list`
> I think you can speedup process via rgw_gc_ options.
>
>
> Cheers,
> k

Hi K,

I thought about the GC, but it doesn't look like this is the issue:
>
> [root@s3db1 ~]# radosgw-admin gc list --include-all | grep oid | wc -l
> 563598
> [root@s3db1 ~]# radosgw-admin gc list | grep oid | wc -l
> 43768


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph osd will not start.

2021-05-25 Thread Peter Childs
Not sure what I'm doing wrong, I suspect its the way I'm running
ceph-volume.

root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda --dmcrypt
Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
Using recent ceph image ceph/ceph@sha256
:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
/usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key
/usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key
/usr/bin/docker: -->  RuntimeError: No valid ceph configuration file was
loaded.
Traceback (most recent call last):
  File "/usr/sbin/cephadm", line 8029, in 
main()
  File "/usr/sbin/cephadm", line 8017, in main
r = ctx.func(ctx)
  File "/usr/sbin/cephadm", line 1678, in _infer_fsid
return func(ctx)
  File "/usr/sbin/cephadm", line 1738, in _infer_image
return func(ctx)
  File "/usr/sbin/cephadm", line 4514, in command_ceph_volume
out, err, code = call_throws(ctx, c.run_cmd(), verbosity=verbosity)
  File "/usr/sbin/cephadm", line 1464, in call_throws
raise RuntimeError('Failed command: %s' % ' '.join(command))
RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
--net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
--init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t

root@drywood12:~# cephadm shell
Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
Inferring config
/var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config
Using recent ceph image ceph/ceph@sha256
:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
70054a5c-c176-463a-a0ac-b44c5db0987c
 stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 auth: unable to find
a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
 stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1
AuthRegistry(0x7fdef405b378) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 auth: unable to find
a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
 stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1
AuthRegistry(0x7fdef405ef20) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 auth: unable to find
a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or
directory
 stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1
AuthRegistry(0x7fdef8f0bea0) no keyring found at
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2021-05-25T07:46:18.188+ 7fdef2d9d700 -1 monclient(hunting):
handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2021-05-25T07:46:18.188+ 7fdef259c700 -1 monclient(hunting):
handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2021-05-25T07:46:18.188+ 7fdef1d9b700 -1 monclient(hunting):
handle_auth_bad_method server allowed_methods [2] but i only support [1]
 stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 monclient:
authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to the
cluster)
-->  RuntimeError: Unable to create a new OSD id
root@drywood12:/# lsblk /dev/sda
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda8:00  7.3T  0 disk

As far as I can see cephadm gets a little further than this as the disks
have lvm volumes on them just the osd's daemons are not created or started.
So maybe I'm invoking ceph-volume incorrectly.


On Tue, 25 May 2021 at 06:57, Peter Childs  wrote:

>
>
> On Mon, 24 May 2021, 21:08 Marc,  wrote:
>
>> >
>> > I'm attempting to use cephadm and Pacific, currently on debian buster,
>> > mostly because centos7 ain't supported any more and cenotos8 ain't
>> > support
>> > by some of my hardware.
>>
>> Who says centos7 is not supported any more? Afaik centos7/el7 is being
>> supported till its EOL 2024. By then maybe a good alternative for
>> el8/stream has surfaced.
>>
>
> Not supported by ceph Pacific, it's our os of choice otherwise.
>
> My testing says the version available of podman, docker and python3, do
> not work with Pacific.
>
> Given I've needed to upgrade docker on buster can we please have a list of
> versions that work with cephadm, maybe even have cephadm say no, please
> upgrade unless your running the right version or better.
>
>
>
>> > Anyway I have a few nodes with 59x 7.2TB disks but for some reason the
>> > osd
>> > daemons don't start, the disks get formatted and the osd are created but
>> > the daemons never come up.
>>
>> what if you try with

[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Konstantin Shalygin
Hi,

> On 25 May 2021, at 10:23, Boris Behrens  wrote:
> 
> I am still searching for a reason why these two values differ so much.
> 
> I am currently deleting a giant amount of orphan objects (43mio, most
> of them under 64kb), but the difference get larger instead of smaller.

When user trough API make a delete, objects just marks as deleted, then
ceph-radosgw gc perform actual delete, you can see queue via `radosgw-admin gc 
list`
I think you can speedup process via rgw_gc_ options.


Cheers,
k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] summarized radosgw size_kb_actual vs pool stored value doesn't add up

2021-05-25 Thread Boris Behrens
Hi,
I am still searching for a reason why these two values differ so much.

I am currently deleting a giant amount of orphan objects (43mio, most
of them under 64kb), but the difference get larger instead of smaller.

This was the state two days ago:
>
> [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk '{ 
> print $2 }' | tr -d , | paste -sd+ - | bc
> 175977343264
>
> [root@s3db1 ~]# rados df
> POOL_NAME  USED   OBJECTS CLONESCOPIES 
> MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS  RDWR_OPS  WR USED 
> COMPR UNDER COMPR
> ...
> eu-central-1.rgw.buckets.data   766 TiB 134632397  0 403897191
>   0   00 1076480853  45 TiB 532045864 551 TiB0 B  
>0 B
> ...
> total_objects135866676
>
> [root@s3db1 ~]# ceph df...
> eu-central-1.rgw.buckets.data   11 2048 253 TiB 134.63M   
>   766 TiB 90.3227 TiB

And this is todays state:
>
> [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk '{ 
> print $2 }' | tr -d , | paste -sd+ - | bc
> 177144806812
>
> [root@s3db1 ~]# rados df
> ...
> eu-central-1.rgw.buckets.data   786 TiB 120025590  0 360076770
> ...
> total_objects121261889
>
> [root@s3db1 ~]# ceph df
> ...
> eu-central-1.rgw.buckets.data   11 2048 260 TiB 120.02M   
>   786 TiB 92.5921 TiB

I would love to free up the missing 80TB :)
Any suggestions?

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
im groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd cp versus deep cp?

2021-05-25 Thread Eugen Block

Hi,

Mykola explained it in this thread [1] a couple of months ago:

`rbd cp` will copy only one image snapshot (or the image head) to the
destination.

`rbd deep cp` will copy all image snapshots and the image head.

It depends on the number of snapshots that need to be copied, if there  
are none you'd probably be fine with `rbd cp`.


[1]  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3CLRRBX25OGO7ZJYL34Y5WZ6U4OZBUG2/



Zitat von Jan Kasprzak :


Hello, Ceph users,

what is the difference between "rbd cp" and "rbd deep cp"?

What I need to do is to make a copy of the rbd volume one of our users
inadveredly resized to a too big size, shrink the copied image to the
expected size, verify that everything is OK, and then delete the original
image. Would this work with rbd cp?

Thanks,

-Yenya

--
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
We all agree on the necessity of compromise. We just can't agree on
when it's necessary to compromise. --Larry Wall
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io