[ceph-users] Re: cephfs vs rbd vs rgw
The quick answer, is they are optimized for different use cases. Things like relational databases (mysql, postgresql) benefit from the performance that a dedicated filesystem can provide (rbd). Shared filesystems are usually counter indicated with such software. Shared filesystems like cephfs are nice but can't scale quite as well in number of filesystems as something like rbd. Latency in certain operations can be worse. Posix network filesystems have their drawbacks. Posix wasn't really designed around network fs's. But super useful when you need to share filesystems across nodes. A lot of existing software assumes shared filesystems. Can get pretty good scaling easily out of some software with it. rgw is a very different protocol (webby). A lot of existing software doesn't work with it. So comparability is not as good. But thats changing. Also has some assumptions around how data is read/written. Can be scaled quite large. http clients are very easy to come by to speak to it though, so for new software, its pretty nice. So, its not necessarily a "which one should I support". One of cephs great features is you can support all 3 with the same storage and use them all as needed. From: Jorge Garcia Sent: Tuesday, May 25, 2021 4:43 PM To: ceph-users@ceph.io Subject: [ceph-users] cephfs vs rbd vs rgw Check twice before you click! This email originated from outside PNNL. This may be too broad of a topic, or opening a can of worms, but we are running a CEPH environment and I was wondering if there's any guidance about this question: Given that some group would like to store 50-100 TBs of data on CEPH and use it from a linux environment, are there any advantages or disadvantages in terms of performance/ease of use/learning curve to using cephfs vs using a block device thru rbd vs using object storage thru rgw? Here are my general thoughts: cephfs - Until recently, you were not allowed to have multiple filesystems. Not sure about performance. rbd - Can only be mounted on one system at a time, but I guess that filesystem could then be served using NFS. rgw - A different usage model from regular linux file/directory structure. Are there advantages to forcing people to use this interface? I'm tempted to set up 3 separate areas and try them and compare the results, but I'm wondering if somebody has done some similar experiment in the past. Thanks for any help you can provide! Jorge ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs vs rbd vs rgw
Yeah, agreed. My first question would be how is your user going to consume the storage? You'll struggle to run VM's on RadosGW and if they are doing archival backups then RBD is likely not the best solution. Each has very different requirements at the hardware level, for example if you are talking about running dozens of VM's then an SSD\NVME based cluster exposing RBD is a good solution, if you want to store large amounts of video files for a security system then a SATA based cluster with some NVME cache exposing S3 via RadosGW could be a good solution. On Wed, May 26, 2021 at 9:21 AM Matt Benjamin wrote: > Hi Jorge, > > I think it depends on your workload. > > On Tue, May 25, 2021 at 7:43 PM Jorge Garcia wrote: > > > > This may be too broad of a topic, or opening a can of worms, but we are > > running a CEPH environment and I was wondering if there's any guidance > > about this question: > > > > Given that some group would like to store 50-100 TBs of data on CEPH and > > use it from a linux environment, are there any advantages or > > disadvantages in terms of performance/ease of use/learning curve to > > using cephfs vs using a block device thru rbd vs using object storage > > thru rgw? Here are my general thoughts: > > > > cephfs - Until recently, you were not allowed to have multiple > > filesystems. Not sure about performance. > > > > I/O performance can be /very/ good. Metadata performance has can > vary. If you need shared POSIX access ("native" or NFS or SMB), you > need cephfs. > > > rbd - Can only be mounted on one system at a time, but I guess that > > filesystem could then be served using NFS. > > Yes, but it's single attach. > > > > > rgw - A different usage model from regular linux file/directory > > structure. Are there advantages to forcing people to use this interface? > > There are advantages. S3 has become a preferred interface for some > applications, especially analytics (e.g., Hadoop, Spark, PrestoSql)). > > > > > I'm tempted to set up 3 separate areas and try them and compare the > > results, but I'm wondering if somebody has done some similar experiment > > in the past. > > Not sure, good question. > > Matt > > > > > Thanks for any help you can provide! > > > > Jorge > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > -- > > Matt Benjamin > Red Hat, Inc. > 315 West Huron Street, Suite 140A > Ann Arbor, Michigan 48103 > > http://www.redhat.com/en/technologies/storage > > tel. 734-821-5101 > fax. 734-769-8938 > cel. 734-216-5309 > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs vs rbd vs rgw
Hi Jorge, I think it depends on your workload. On Tue, May 25, 2021 at 7:43 PM Jorge Garcia wrote: > > This may be too broad of a topic, or opening a can of worms, but we are > running a CEPH environment and I was wondering if there's any guidance > about this question: > > Given that some group would like to store 50-100 TBs of data on CEPH and > use it from a linux environment, are there any advantages or > disadvantages in terms of performance/ease of use/learning curve to > using cephfs vs using a block device thru rbd vs using object storage > thru rgw? Here are my general thoughts: > > cephfs - Until recently, you were not allowed to have multiple > filesystems. Not sure about performance. > I/O performance can be /very/ good. Metadata performance has can vary. If you need shared POSIX access ("native" or NFS or SMB), you need cephfs. > rbd - Can only be mounted on one system at a time, but I guess that > filesystem could then be served using NFS. Yes, but it's single attach. > > rgw - A different usage model from regular linux file/directory > structure. Are there advantages to forcing people to use this interface? There are advantages. S3 has become a preferred interface for some applications, especially analytics (e.g., Hadoop, Spark, PrestoSql)). > > I'm tempted to set up 3 separate areas and try them and compare the > results, but I'm wondering if somebody has done some similar experiment > in the past. Not sure, good question. Matt > > Thanks for any help you can provide! > > Jorge > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-821-5101 fax. 734-769-8938 cel. 734-216-5309 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] cephfs vs rbd vs rgw
This may be too broad of a topic, or opening a can of worms, but we are running a CEPH environment and I was wondering if there's any guidance about this question: Given that some group would like to store 50-100 TBs of data on CEPH and use it from a linux environment, are there any advantages or disadvantages in terms of performance/ease of use/learning curve to using cephfs vs using a block device thru rbd vs using object storage thru rgw? Here are my general thoughts: cephfs - Until recently, you were not allowed to have multiple filesystems. Not sure about performance. rbd - Can only be mounted on one system at a time, but I guess that filesystem could then be served using NFS. rgw - A different usage model from regular linux file/directory structure. Are there advantages to forcing people to use this interface? I'm tempted to set up 3 separate areas and try them and compare the results, but I'm wondering if somebody has done some similar experiment in the past. Thanks for any help you can provide! Jorge ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Pacific mon is not starting after host reboot
Hi, On my setup I didn't enable a strech cluster. It's just a 3 x VM setup running on the same Proxmox node, all the nodes are using a single unique network. I installed Ceph using the documented cephadm flow. Thanks for the confirmation, Greg! I‘ll try with a newer release then. >That’s why we’re testing, isn’t it? ;-) >Then the OPs issue is probably not resolved yet since he didn’t >mention a stretch cluster. Sorry for high-jacking the thread. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph Month June Schedule Now Available
Hi everyone, The Ceph Month June schedule is now available: https://pad.ceph.com/p/ceph-month-june-2021 We have great sessions from component updates, performance best practices, Ceph on different architectures, BoF sessions to get more involved with working groups in the community, and more! You may also leave open discussion topics for the listed talks that we'll get to each Q/A portion. I will provide the video stream link on this thread and etherpad once it's available. You can also add the Ceph community calendar, which will have the Ceph Month sessions prefixed with "Ceph Month" to get local timezone conversions. https://calendar.google.com/calendar/embed?src=9ts9c7lt7u1vic2ijvvqqlfpo0%40group.calendar.google.com Thank you to our speakers for taking the time to share with us all the latest best practices and usage with Ceph! -- Mike Perez ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Pacific mon is not starting after host reboot
Thanks for the confirmation, Greg! I‘ll try with a newer release then. That’s why we’re testing, isn’t it? ;-) Then the OPs issue is probably not resolved yet since he didn’t mention a stretch cluster. Sorry for high-jacking the thread. Zitat von Gregory Farnum : On Tue, May 25, 2021 at 7:17 AM Eugen Block wrote: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700 time 2021-05-25T13:44:26.732857+ 2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v >= 9) 2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]: 2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]: ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) 2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff3bf61a59c] 2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]: 2: /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6] 2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]: 3: (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x539) [0x7ff3bfa529f9] 2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]: 4: (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&, unsigned long)+0x1c9) [0x55e377b36df9] 2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]: 5: (OSDMonitor::get_version(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234] 2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]: 6: (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned long)+0x301) [0x55e377b3a3c1] 2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]: 7: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, boost::intrusive_ptr)+0x104) [0x55e377b3b094] 2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]: 8: (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792] 2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]: 9: (Monitor::handle_subscribe(boost::intrusive_ptr)+0xe82) [0x55e3779da402] 2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]: 10: (Monitor::dispatch_op(boost::intrusive_ptr)+0x78d) [0x55e377a002ed] 2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]: 11: (Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910] 2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]: 12: (Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x5c) [0x55e377a2ffdc] 2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]: 13: (DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a] 2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]: 14: (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71] 2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]: 15: /lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a] 2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]: 16: clone() 2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]: 2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug 0> 2021-05-25T13:44:26.742+ 7ff3b1aa1700 -1 *** Caught signal (Aborted) ** 2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]: in thread 7ff3b1aa1700 thread_name:ms_dispatch 2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]: 2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]: ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) 2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]: 1: /lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20] 2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]: 2: gsignal() 2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]: 3: abort() 2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7ff3bf61a5ed] 2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]: 5: /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6] 2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]: 6: (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x539) [0x7ff3bfa529f9] 2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]: 7: (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&, unsigned long)+0x1c9) [0x55e377b36df9] 2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]: 8: (OSDMonitor::get_version(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234] 2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]: 9: (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned long)+0x301)
[ceph-users] Re: Ceph Pacific mon is not starting after host reboot
On Tue, May 25, 2021 at 7:17 AM Eugen Block wrote: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: > In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, > uint64_t) const' thread 7ff3b1aa1700 time > 2021-05-25T13:44:26.732857+ > 2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: > 658: FAILED ceph_assert(target_v >= > 9) > 2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]: > 2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]: ceph version > 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) > 2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]: 1: > (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x158) [0x7ff3bf61a59c] > 2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]: 2: > /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6] > 2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]: 3: > (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned > long) const+0x539) [0x7ff3bfa529f9] > 2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]: 4: > (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&, > unsigned long)+0x1c9) [0x55e377b36df9] > 2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]: 5: > (OSDMonitor::get_version(unsigned long, unsigned long, > ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234] > 2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]: 6: > (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned > long)+0x301) [0x55e377b3a3c1] > 2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]: 7: > (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, > boost::intrusive_ptr)+0x104) [0x55e377b3b094] > 2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]: 8: > (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792] > 2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]: 9: > (Monitor::handle_subscribe(boost::intrusive_ptr)+0xe82) > [0x55e3779da402] > 2021-05-25T15:44:26.989967+02:00 pacific1 conmon[5132]: 10: > (Monitor::dispatch_op(boost::intrusive_ptr)+0x78d) > [0x55e377a002ed] > 2021-05-25T15:44:26.990046+02:00 pacific1 conmon[5132]: 11: > (Monitor::_ms_dispatch(Message*)+0x670) [0x55e377a01910] > 2021-05-25T15:44:26.990113+02:00 pacific1 conmon[5132]: 12: > (Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x5c) > [0x55e377a2ffdc] > 2021-05-25T15:44:26.990179+02:00 pacific1 conmon[5132]: 13: > (DispatchQueue::entry()+0x126a) [0x7ff3bf854b1a] > 2021-05-25T15:44:26.990255+02:00 pacific1 conmon[5132]: 14: > (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff3bf904b71] > 2021-05-25T15:44:26.990330+02:00 pacific1 conmon[5132]: 15: > /lib64/libpthread.so.0(+0x814a) [0x7ff3bd10a14a] > 2021-05-25T15:44:26.990420+02:00 pacific1 conmon[5132]: 16: clone() > 2021-05-25T15:44:26.990497+02:00 pacific1 conmon[5132]: > 2021-05-25T15:44:26.990573+02:00 pacific1 conmon[5132]: debug 0> > 2021-05-25T13:44:26.742+ 7ff3b1aa1700 -1 *** Caught signal > (Aborted) ** > 2021-05-25T15:44:26.990648+02:00 pacific1 conmon[5132]: in thread > 7ff3b1aa1700 thread_name:ms_dispatch > 2021-05-25T15:44:26.990723+02:00 pacific1 conmon[5132]: > 2021-05-25T15:44:26.990806+02:00 pacific1 conmon[5132]: ceph version > 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) > 2021-05-25T15:44:26.990883+02:00 pacific1 conmon[5132]: 1: > /lib64/libpthread.so.0(+0x12b20) [0x7ff3bd114b20] > 2021-05-25T15:44:26.990958+02:00 pacific1 conmon[5132]: 2: gsignal() > 2021-05-25T15:44:26.991034+02:00 pacific1 conmon[5132]: 3: abort() > 2021-05-25T15:44:26.991110+02:00 pacific1 conmon[5132]: 4: > (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x1a9) [0x7ff3bf61a5ed] > 2021-05-25T15:44:26.991176+02:00 pacific1 conmon[5132]: 5: > /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6] > 2021-05-25T15:44:26.991251+02:00 pacific1 conmon[5132]: 6: > (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned > long) const+0x539) [0x7ff3bfa529f9] > 2021-05-25T15:44:26.991326+02:00 pacific1 conmon[5132]: 7: > (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&, > unsigned long)+0x1c9) [0x55e377b36df9] > 2021-05-25T15:44:26.991393+02:00 pacific1 conmon[5132]: 8: > (OSDMonitor::get_version(unsigned long, unsigned long, > ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234] > 2021-05-25T15:44:26.991460+02:00 pacific1 conmon[5132]: 9: > (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned > long)+0x301) [0x55e377b3a3c1] > 2021-05-25T15:44:26.991557+02:00 pacific1 conmon[5132]: 10: >
[ceph-users] Re: Very uneven OSD utilization
Thank you Janne, I will give upmap a shot. Need to try it first in some non-prod cluster. Non-prod clusters are doing much better for me even though they have a lot fewer OSDs.. Thanks everyone! On Tue, May 25, 2021 at 12:48 AM Janne Johansson wrote: > > I would suggest enabling the upmap balancer if you haven't done that, > it should help even data out. Even if it would not do better than some > manual rebalancing scheme, it will at least do it nicely in the > background some 8 PGs at a time so it doesn't impact client traffic. > > I looks very weird to have such uneven distribution even while having > lots of PGs (which was my first guess =) > > Den tis 25 maj 2021 kl 03:47 skrev Sergei Genchev : > > > > Hello, > > I am running a nautilus cluster with 5 OSD nodes/90 disks that is > > exclusively used for S3. My disks are identical, but utilization > > ranges from 9% to 82%, and I am starting to get backfill_toofull > > errors even though I have only used 150TB out of 650TB of data. > > - Other than manually crush reweighting OSDs, is there any other > > option for me ? > > - what would cause this uneven distribution? Is there some > > documentation on how to track down what's going on? > > output of 'ceph osd df" is at https://pastebin.com/17HWFR12 > > Thank you! > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Pacific mon is not starting after host reboot
Hi, I wanted to explore the stretch mode in pacific (16.2.4) and see how it behaves with a DC failure. It seems as if I'm hitting the same or at least a similar issue here. To verify if it's the stretch mode I removed the cluster and rebuilt it without stretch mode, three hosts in three DCs and started to reboot. First I rebooted one node, the cluster came back to HEALTH_OK. Then I rebooted two of the three nodes and again everything recovered successfully. Then I rebuilt a 5 node cluster, two DCs in stretch mode with three MONs, one being a tiebreaker in a virtual third DC. The stretch rule was applied (4 replicas across all 4 nodes). To test a DC failure I simply shut down two nodes from DC2, although the pool's min_size was reduced to 1 by ceph I couldn't read or write anything to a mapped rbd, althouh ceph still was responsive with two active MONs. When I booted the other two nodes again the cluster was not able to recover, it ends up in a loop of restarting the MON containers (the OSDs recover eventually) until systemd shuts them down due to too many restarts. For a couple of seconds I get a ceph status, but I never get all three MONs up. When there are two MONs up and I restart the missing one a different MON is shut down. I also see the error message mentioned here in this thread heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7ff3b3aa5700' had timed out after 0.0s I'll add some more information, a stack trace from MON failure: ---snip--- 2021-05-25T15:44:26.988562+02:00 pacific1 conmon[5132]: 5 mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable = 1 - now=2021-05-25T13:44:26.730359+ lease_expire=2021-05-25T13:44:30.270907+ has v0 lc 9839 2021-05-25T15:44:26.988638+02:00 pacific1 conmon[5132]: debug -5> 2021-05-25T13:44:26.726+ 7ff3b1aa1700 2 mon.pacific1@0(leader) e13 send_reply 0x55e37aae3860 0x55e37affa9c0 auth_reply(proto 2 0 (0) Success) v1 2021-05-25T15:44:26.988714+02:00 pacific1 conmon[5132]: debug -4> 2021-05-25T13:44:26.726+ 7ff3b1aa1700 5 mon.pacific1@0(leader).paxos(paxos updating c 9288..9839) is_readable = 1 - now=2021-05-25T13:44:26.731084+ lease_expire=2021-05-25T13:44:30.270907+ has v0 lc 9839 2021-05-25T15:44:26.988790+02:00 pacific1 conmon[5132]: debug -3> 2021-05-25T13:44:26.726+ 7ff3b1aa1700 2 mon.pacific1@0(leader) e13 send_reply 0x55e37b14def0 0x55e37ab11ba0 auth_reply(proto 2 0 (0) Success) v1 2021-05-25T15:44:26.988929+02:00 pacific1 conmon[5132]: debug -2> 2021-05-25T13:44:26.730+ 7ff3b1aa1700 5 mon.pacific1@0(leader).osd e117 send_incremental [105..117] to client.84146 2021-05-25T15:44:26.989012+02:00 pacific1 conmon[5132]: debug -1> 2021-05-25T13:44:26.734+ 7ff3b1aa1700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: In function 'void OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7ff3b1aa1700 time 2021-05-25T13:44:26.732857+ 2021-05-25T15:44:26.989087+02:00 pacific1 conmon[5132]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/OSDMap.cc: 658: FAILED ceph_assert(target_v >= 9) 2021-05-25T15:44:26.989163+02:00 pacific1 conmon[5132]: 2021-05-25T15:44:26.989239+02:00 pacific1 conmon[5132]: ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) 2021-05-25T15:44:26.989314+02:00 pacific1 conmon[5132]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7ff3bf61a59c] 2021-05-25T15:44:26.989388+02:00 pacific1 conmon[5132]: 2: /usr/lib64/ceph/libceph-common.so.2(+0x2767b6) [0x7ff3bf61a7b6] 2021-05-25T15:44:26.989489+02:00 pacific1 conmon[5132]: 3: (OSDMap::Incremental::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x539) [0x7ff3bfa529f9] 2021-05-25T15:44:26.989560+02:00 pacific1 conmon[5132]: 4: (OSDMonitor::reencode_incremental_map(ceph::buffer::v15_2_0::list&, unsigned long)+0x1c9) [0x55e377b36df9] 2021-05-25T15:44:26.989627+02:00 pacific1 conmon[5132]: 5: (OSDMonitor::get_version(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1f4) [0x55e377b37234] 2021-05-25T15:44:26.989693+02:00 pacific1 conmon[5132]: 6: (OSDMonitor::build_incremental(unsigned int, unsigned int, unsigned long)+0x301) [0x55e377b3a3c1] 2021-05-25T15:44:26.989759+02:00 pacific1 conmon[5132]: 7: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, boost::intrusive_ptr)+0x104) [0x55e377b3b094] 2021-05-25T15:44:26.989825+02:00 pacific1 conmon[5132]: 8: (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55e377b42792] 2021-05-25T15:44:26.989891+02:00 pacific1 conmon[5132]: 9:
[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up
Am Di., 25. Mai 2021 um 09:23 Uhr schrieb Boris Behrens : > > Hi, > I am still searching for a reason why these two values differ so much. > > I am currently deleting a giant amount of orphan objects (43mio, most > of them under 64kb), but the difference get larger instead of smaller. > > This was the state two days ago: > > > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk > > '{ print $2 }' | tr -d , | paste -sd+ - | bc > > 175977343264 > > > > [root@s3db1 ~]# rados df > > POOL_NAME USED OBJECTS CLONESCOPIES > > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RDWR_OPS WR > > USED COMPR UNDER COMPR > > ... > > eu-central-1.rgw.buckets.data 766 TiB 134632397 0 403897191 > > 0 00 1076480853 45 TiB 532045864 551 TiB0 B > >0 B > > ... > > total_objects135866676 > > > > [root@s3db1 ~]# ceph df... > > eu-central-1.rgw.buckets.data 11 2048 253 TiB 134.63M > > 766 TiB 90.3227 TiB > > And this is todays state: > > > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk > > '{ print $2 }' | tr -d , | paste -sd+ - | bc > > 177144806812 > > > > [root@s3db1 ~]# rados df > > ... > > eu-central-1.rgw.buckets.data 786 TiB 120025590 0 360076770 > > ... > > total_objects121261889 > > > > [root@s3db1 ~]# ceph df > > ... > > eu-central-1.rgw.buckets.data 11 2048 260 TiB 120.02M > > 786 TiB 92.5921 TiB > > I would love to free up the missing 80TB :) > Any suggestions? As Konstatin mentioned, maybe it was the GC, but I just processes all objects (with --include-all), but the situation did not change. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: rbd cp versus deep cp?
Eugen, Eugen Block wrote: : Mykola explained it in this thread [1] a couple of months ago: : : `rbd cp` will copy only one image snapshot (or the image head) to the : destination. : : `rbd deep cp` will copy all image snapshots and the image head. Thanks for the explanation. I have created a pull request with the docs update: https://github.com/ceph/ceph/pull/41529 https://github.com/ceph/ceph/pull/41529/commits/87bb4917de2eda847479e0bae38cade5af79cc37 Is it OK? -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise. --Larry Wall ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] cephadm: How to replace failed HDD where DB is on SSD
Hi The server run 15.2.9 and has 15 HDD and 3 SSD. The OSDs was created with this YAML file hdd.yml service_type: osd service_id: hdd placement: host_pattern: 'pech-hd-*' data_devices: rotational: 1 db_devices: rotational: 0 The result was that the 3 SSD is added to 1 VG with 15 LV on it. # vgs | egrep "VG|dbs" VG #PV #LV #SN Attr VSize VFree ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b 3 15 0 wz--n- <5.24t 48.00m One of the osd failed and I run rm with replace # ceph orch osd rm 178 --replace and the result is # ceph osd tree | grep "ID|destroyed" ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF 178hdd12.82390 osd.178 destroyed 0 1.0 But I'm not able to replace the disk with the same YAML file as shown above. # ceph orch apply osd -i hdd.yml --dry-run OSDSPEC PREVIEWS +-+--+--+--++-+ |SERVICE |NAME |HOST |DATA |DB |WAL | +-+--+--+--++-+ +-+--+--+--++-+ I guess this is the wrong way to do it, but I can't find the answer in the documentation. So how can I replace this failed disk in Cephadm? -- Kai Stian Olstad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up
Am Di., 25. Mai 2021 um 09:39 Uhr schrieb Konstantin Shalygin : > > Hi, > > On 25 May 2021, at 10:23, Boris Behrens wrote: > > I am still searching for a reason why these two values differ so much. > > I am currently deleting a giant amount of orphan objects (43mio, most > of them under 64kb), but the difference get larger instead of smaller. > > > When user trough API make a delete, objects just marks as deleted, then > ceph-radosgw gc perform actual delete, you can see queue via `radosgw-admin > gc list` > I think you can speedup process via rgw_gc_ options. > > > Cheers, > k Hi K, I thought about the GC, but it doesn't look like this is the issue: > > [root@s3db1 ~]# radosgw-admin gc list --include-all | grep oid | wc -l > 563598 > [root@s3db1 ~]# radosgw-admin gc list | grep oid | wc -l > 43768 -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph osd will not start.
Not sure what I'm doing wrong, I suspect its the way I'm running ceph-volume. root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda --dmcrypt Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea Using recent ceph image ceph/ceph@sha256 :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 /usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/docker: --> RuntimeError: No valid ceph configuration file was loaded. Traceback (most recent call last): File "/usr/sbin/cephadm", line 8029, in main() File "/usr/sbin/cephadm", line 8017, in main r = ctx.func(ctx) File "/usr/sbin/cephadm", line 1678, in _infer_fsid return func(ctx) File "/usr/sbin/cephadm", line 1738, in _infer_image return func(ctx) File "/usr/sbin/cephadm", line 4514, in command_ceph_volume out, err, code = call_throws(ctx, c.run_cmd(), verbosity=verbosity) File "/usr/sbin/cephadm", line 1464, in call_throws raise RuntimeError('Failed command: %s' % ' '.join(command)) RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t root@drywood12:~# cephadm shell Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea Inferring config /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config Using recent ceph image ceph/ceph@sha256 :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 70054a5c-c176-463a-a0ac-b44c5db0987c stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 AuthRegistry(0x7fdef405b378) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 AuthRegistry(0x7fdef405ef20) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 AuthRegistry(0x7fdef8f0bea0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+ 7fdef2d9d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+ 7fdef259c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+ 7fdef1d9b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+ 7fdef8f0d700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication stderr: [errno 13] RADOS permission denied (error connecting to the cluster) --> RuntimeError: Unable to create a new OSD id root@drywood12:/# lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda8:00 7.3T 0 disk As far as I can see cephadm gets a little further than this as the disks have lvm volumes on them just the osd's daemons are not created or started. So maybe I'm invoking ceph-volume incorrectly. On Tue, 25 May 2021 at 06:57, Peter Childs wrote: > > > On Mon, 24 May 2021, 21:08 Marc, wrote: > >> > >> > I'm attempting to use cephadm and Pacific, currently on debian buster, >> > mostly because centos7 ain't supported any more and cenotos8 ain't >> > support >> > by some of my hardware. >> >> Who says centos7 is not supported any more? Afaik centos7/el7 is being >> supported till its EOL 2024. By then maybe a good alternative for >> el8/stream has surfaced. >> > > Not supported by ceph Pacific, it's our os of choice otherwise. > > My testing says the version available of podman, docker and python3, do > not work with Pacific. > > Given I've needed to upgrade docker on buster can we please have a list of > versions that work with cephadm, maybe even have cephadm say no, please > upgrade unless your running the right version or better. > > > >> > Anyway I have a few nodes with 59x 7.2TB disks but for some reason the >> > osd >> > daemons don't start, the disks get formatted and the osd are created but >> > the daemons never come up. >> >> what if you try with
[ceph-users] Re: summarized radosgw size_kb_actual vs pool stored value doesn't add up
Hi, > On 25 May 2021, at 10:23, Boris Behrens wrote: > > I am still searching for a reason why these two values differ so much. > > I am currently deleting a giant amount of orphan objects (43mio, most > of them under 64kb), but the difference get larger instead of smaller. When user trough API make a delete, objects just marks as deleted, then ceph-radosgw gc perform actual delete, you can see queue via `radosgw-admin gc list` I think you can speedup process via rgw_gc_ options. Cheers, k ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] summarized radosgw size_kb_actual vs pool stored value doesn't add up
Hi, I am still searching for a reason why these two values differ so much. I am currently deleting a giant amount of orphan objects (43mio, most of them under 64kb), but the difference get larger instead of smaller. This was the state two days ago: > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk '{ > print $2 }' | tr -d , | paste -sd+ - | bc > 175977343264 > > [root@s3db1 ~]# rados df > POOL_NAME USED OBJECTS CLONESCOPIES > MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RDWR_OPS WR USED > COMPR UNDER COMPR > ... > eu-central-1.rgw.buckets.data 766 TiB 134632397 0 403897191 > 0 00 1076480853 45 TiB 532045864 551 TiB0 B >0 B > ... > total_objects135866676 > > [root@s3db1 ~]# ceph df... > eu-central-1.rgw.buckets.data 11 2048 253 TiB 134.63M > 766 TiB 90.3227 TiB And this is todays state: > > [root@s3db1 ~]# radosgw-admin bucket stats | grep '"size_kb_actual"' | awk '{ > print $2 }' | tr -d , | paste -sd+ - | bc > 177144806812 > > [root@s3db1 ~]# rados df > ... > eu-central-1.rgw.buckets.data 786 TiB 120025590 0 360076770 > ... > total_objects121261889 > > [root@s3db1 ~]# ceph df > ... > eu-central-1.rgw.buckets.data 11 2048 260 TiB 120.02M > 786 TiB 92.5921 TiB I would love to free up the missing 80TB :) Any suggestions? -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: rbd cp versus deep cp?
Hi, Mykola explained it in this thread [1] a couple of months ago: `rbd cp` will copy only one image snapshot (or the image head) to the destination. `rbd deep cp` will copy all image snapshots and the image head. It depends on the number of snapshots that need to be copied, if there are none you'd probably be fine with `rbd cp`. [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/3CLRRBX25OGO7ZJYL34Y5WZ6U4OZBUG2/ Zitat von Jan Kasprzak : Hello, Ceph users, what is the difference between "rbd cp" and "rbd deep cp"? What I need to do is to make a copy of the rbd volume one of our users inadveredly resized to a too big size, shrink the copied image to the expected size, verify that everything is OK, and then delete the original image. Would this work with rbd cp? Thanks, -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise. --Larry Wall ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io