[ceph-users] Multi-datacenter filesystem
Hi Team We have grown out of our current solution, and we plan to migrate to multiple data centers. Our setup is a mix of radosgw data and filesystem data. But we have many legacy systems that require a filesystem at the moment, so we will probably run it for some of our data for at least 3-5 years. At the moment, we have about 0.5 Petabytes of data, so it is a small cluster. Still, we want more redundancy, so we will partner with a company with multiple data centers within the city and have redundant fiber between the locations. Our current center has multiple 10GB connections, so the communication between the new locations and our existing data center will be slower. Still, I hope the network traffic will suffice for a multi-datacenter setup. Currently, I plan to assign OSDs to different sites and racks so we can configure a good replication rule to keep a copy of the data in each data center. My question is how to handle the monitor setup for good redundancy. For example, should I set up two new monitors in each new location and have one in our existing data center, so I get five monitors in total, or should I keep it as three monitors, one for each data center? Or should I go for nine monitors 3 in each data center? Should I use a Stretch set up to define the location of each monitor? Could you do the same for MDS:es? Do I need to configure the mounting of the filesystem differently to signal in which data center the client is located? Does anyone know of a partner we could consult on these issues? Best regards Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Changing PG size of cache pool
Hi Eugen. I've tried. The system says it's not recommended but I may force it. Forcing something with the risk of losing data is not something I'm going to do. Best regards Daniel On Sat, Mar 26, 2022 at 8:55 PM Eugen Block wrote: > Hi, > > just because the autoscaler doesn’t increase the pg_num doesn’t mean > you can’t increase it manually. Have you tried that? > > Zitat von Daniel Persson : > > > Hi Team. > > > > We are currently in the process of changing the size of our cache pool. > > Currently it's set to 32 PGs and distributed weirdly on our OSDs. The > > system has tried automatically to scale it up to 256 PGs without > succeeding > > and I read that cache pools are not automatically scaled so we are in the > > process of scaling. Our plan is to remove the old one and create a new > one > > with more PGs. > > > > I've run the pool in readproxy now for a week so most of the objects > should > > be available in cold storage but I want to be totally sure so we don't > lose > > any data. > > > > I read in the documentation that you could remove the overlay and that > > would redirect clients to cold storage. > > > > Is a preferred strategy to remove the overlay and then run > > cache-flush-evict-all to clear it and then replace or should I be fine > just > > to remove overlay and tiering and replace it with a new pool? > > > > Currently we have configured it to have a write caching of 0.5 hours and > > read cache of 2 days. > > > > -- > > ceph osd pool set cephfs_data_cache cache_min_flush_age 1800 > > ceph osd pool set cephfs_data_cache cache_min_evict_age 172800 > > > > > > The cache is still 25Tb in size and would be sad to lose if we have > > unwritten data. > > > > Best regards > > Daniel > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Changing PG size of cache pool
Hi Team. We are currently in the process of changing the size of our cache pool. Currently it's set to 32 PGs and distributed weirdly on our OSDs. The system has tried automatically to scale it up to 256 PGs without succeeding and I read that cache pools are not automatically scaled so we are in the process of scaling. Our plan is to remove the old one and create a new one with more PGs. I've run the pool in readproxy now for a week so most of the objects should be available in cold storage but I want to be totally sure so we don't lose any data. I read in the documentation that you could remove the overlay and that would redirect clients to cold storage. Is a preferred strategy to remove the overlay and then run cache-flush-evict-all to clear it and then replace or should I be fine just to remove overlay and tiering and replace it with a new pool? Currently we have configured it to have a write caching of 0.5 hours and read cache of 2 days. -- ceph osd pool set cephfs_data_cache cache_min_flush_age 1800 ceph osd pool set cephfs_data_cache cache_min_evict_age 172800 The cache is still 25Tb in size and would be sad to lose if we have unwritten data. Best regards Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Question about cephadm, WAL and DB devices.
Hi. I'm currently trying out cephadm, and I got into a state that was a bit unexpected for me. I created three host machines in VirtualBox to try out cephadm. All drives I made for OSD are 20GB in size for simplicity. Bootstrapped one host with one drive and then added the other two. Then they directly added all available drives, so now I had 3x20 GB OSDs. Watching the GUI, it said I could add WAL and DB devices, I never figured out how to do that in the GUI, but I tried to do it manually. ceph osd destroy osd.2 --force ceph-volume lvm zap --destroy /dev/sdb ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring ceph-volume lvm prepare --data /dev/sdb --block.db /dev/sdc --block.wal /dev/sdd ceph-volume lvm activate 2 d4a590eb-c0f6-47bc-a5fa-221bf8541e09 It worked, and I got the new OSD registered, but the strange thing was that it was 40 GB and half full. Is this expected? Best regards Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Cache tiers hit_set values
Hi everyone. I added a lot more storage to our cluster, and we now have a lot of slower hard drives that could contain archival data. So I thought setting up a cache tier for the fast drives should be a good idea. We want to retain data for about a week in the cache pool as the data could be interesting for at least a week, and then it will probably rarely be accessed. You can see my config below. Would you change any of these values? Do you have any better suggestions? This is my interpretation: I keep the data for 30 minutes before writing it back to the slower drives. We will retain the data for seven days before we evict them from the cache. All accessed objects will be promoted to cache Now we only have the hit_set values left, and I'm not sure the right strategy here. I've read a paper from SUSE where they explain it. https://documentation.suse.com/ses/6/html/ses-all/cha-ceph-tiered.html#ses-tiered-hitset And my understanding from that paper I should change the hit_set_count to 42, so I will have a hit set for the entire duration I want to keep data in my cache pool. Is this correct? We have a lot of overhead when it comes to disk, and the hit sets should be pretty small in this context, so I don't think there should be a reason for not keeping 42 of them. - ceph osd pool set cephfs_data_cache cache_min_flush_age 1800 ceph osd pool set cephfs_data_cache cache_min_evict_age 604800 ceph osd pool set cephfs_data_cache hit_set_count 12 ceph osd pool set cephfs_data_cache hit_set_period 14400 ceph osd pool set cephfs_data_cache hit_set_fpp 0.01 ceph osd pool set cephfs_data_cache min_write_recency_for_promote 0 ceph osd pool set cephfs_data_cache min_read_recency_for_promote 0 - I appreciate any help you can provide. Best regards Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MDS replay questions
Hi Brian I'm not sure if it applies to your application, and I'm not an expert. However, we have been running our solution for about a year now, and we have one of our MDS's in standby-replay. Sadly we have found a bug with extensive memory usage, and when we needed to replay, it took up to a minute, even with standby-replay. However, our solution is an online site, so a minute is the same as ages. I have also read that you are now allowed to run multiple MDS's on a file system. It will, of course, add more memory usage, and it could also lead to less performance. But the feature should be fully supported in the Pacific release. I'm curious what the correct solution is here. Best regards Daniel On Wed, Oct 6, 2021 at 12:07 AM Brian Kim wrote: > Dear ceph-users, > > We have a ceph cluster with 3 MDS's and recently had to replay our cache > which is taking an extremely long time to complete. Is there some way to > speed up this process as well as apply some checkpoint so it doesn't have > to start all the way from the beginning? > > -- > Best Wishes, > Brian Kim > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MacOS Ceph Filesystem client
Hi Duncan Great, thank you for the tip. I never open the graphical user interface as we use this as a server. When logging in with SSH you sometimes miss popups on the desktop. Current status: Mac Mini Intel Catalina - Connected and working fine. Mac Mini M1 BigSur - Can't compile brew cask, no popups for extra permissions in the GUI. As I really only need one of the machines connected at the moment because the job I want to run on Apple hardware work similarly on both. But if there is any need or want from the community to test the software on M1 hardware I could test other things. Best regards Daniel On Tue, Sep 28, 2021 at 12:00 AM Duncan Bellamy wrote: > Hi Daniel, > Is it a Mac firewall or security access issue for the machine that was > able to build? > > Regards, > > Duncan > > On 27 Sep 2021, at 22:43, Daniel Persson wrote: > > > Hi Duncan. > > I've tried with a couple of different libraries. > > brew install osxfuse > brew install macfuse > brew install fuse > > But none of them helped with installation or connection for the machine > that was able to build the client. > > Thank you for helping. > > Best regards > Daniel > > On Mon, Sep 27, 2021 at 11:31 PM Duncan Bellamy > wrote: > >> Hi Daniel, >> Have you got macFuse installed? https://osxfuse.github.io/ >> >> Or maybe there is a brew version, if you search for fuse on home brew >> this is the result: >> >> https://formulae.brew.sh/cask/fuse#default >> >> Regards, >> Duncan >> >> >> On 27 Sep 2021, at 22:24, Daniel Persson wrote: >> >> Hi Duncan. >> >> Great suggestion. Thank you for the link. I've run it on both the M1 >> BigSur >> mac and it did not compile because it didn't have a FUSE:FUSE target >> whatever that meant. >> >> === >> Last 15 lines from >> /Users/danielp/Library/Logs/Homebrew/ceph-client/03.cmake: >> >> CMake Error at src/rbd_fuse/CMakeLists.txt:1 (add_executable): >> Target "rbd-fuse" links to target "FUSE::FUSE" but the target was not >> found. Perhaps a find_package() call is missing for an IMPORTED target, >> or >> an ALIAS target is missing? >> === >> >> On the Catalina Intel-mac I was able to build it and ran ceph-fuse >> multiple >> times and it seems that it connects because the client is added to the >> client list in my Cluster but the drive is never mapped to the directory I >> try to map it too. So it seems not to work there either sadly. >> >> === >> admin$ sudo ceph-fuse -r /mydirectory -m 192.168.6.6:6789 >> /Users/admin/cephfs >> & >> >> ceph-fuse[60646]: starting ceph client >> 2021-09-27T23:16:20.498+0200 113a7ddc0 -1 init, newargv = 0x7f8ad6449d90 >> newargc=5 >> ======= >> >> Does anyone have any ideas on how I could proceed? >> >> Best regards >> Daniel >> >> On Mon, Sep 27, 2021 at 8:48 PM Duncan Bellamy >> wrote: >> >> Hi, >> >> It’s in brew: >> >> >> >> >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SEEM7QGCPAWDPIIW3P4ZYU432N6BQJGT/ >> >> >> Regards, >> >> Duncan >> >> >> On 27 Sep 2021, at 17:46, Daniel Persson wrote: >> >> >> Hi >> >> >> I'm running some tests on a couple of Mac Mini machines. One of them is an >> >> M1 with BigSur, and the other one is a regular Intel Mac with Catalina. >> >> >> I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times >> with >> >> different parameters and added many dependencies to the systems but have >> >> not been able to build the software. >> >> >> Has anyone tried to connect a Mac to your Ceph filesystem before? Do I >> need >> >> to build the packages on the machine, or is there a more straightforward >> >> way? >> >> >> Thank you for reading this. >> >> >> Best regards >> >> Daniel >> >> ___ >> >> ceph-users mailing list -- ceph-users@ceph.io >> >> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MacOS Ceph Filesystem client
Hi Duncan. I've tried with a couple of different libraries. brew install osxfuse brew install macfuse brew install fuse But none of them helped with installation or connection for the machine that was able to build the client. Thank you for helping. Best regards Daniel On Mon, Sep 27, 2021 at 11:31 PM Duncan Bellamy wrote: > Hi Daniel, > Have you got macFuse installed? https://osxfuse.github.io/ > > Or maybe there is a brew version, if you search for fuse on home brew this > is the result: > > https://formulae.brew.sh/cask/fuse#default > > Regards, > Duncan > > > On 27 Sep 2021, at 22:24, Daniel Persson wrote: > > Hi Duncan. > > Great suggestion. Thank you for the link. I've run it on both the M1 BigSur > mac and it did not compile because it didn't have a FUSE:FUSE target > whatever that meant. > > === > Last 15 lines from > /Users/danielp/Library/Logs/Homebrew/ceph-client/03.cmake: > > CMake Error at src/rbd_fuse/CMakeLists.txt:1 (add_executable): > Target "rbd-fuse" links to target "FUSE::FUSE" but the target was not > found. Perhaps a find_package() call is missing for an IMPORTED target, > or > an ALIAS target is missing? > === > > On the Catalina Intel-mac I was able to build it and ran ceph-fuse multiple > times and it seems that it connects because the client is added to the > client list in my Cluster but the drive is never mapped to the directory I > try to map it too. So it seems not to work there either sadly. > > === > admin$ sudo ceph-fuse -r /mydirectory -m 192.168.6.6:6789 > /Users/admin/cephfs > & > > ceph-fuse[60646]: starting ceph client > 2021-09-27T23:16:20.498+0200 113a7ddc0 -1 init, newargv = 0x7f8ad6449d90 > newargc=5 > === > > Does anyone have any ideas on how I could proceed? > > Best regards > Daniel > > On Mon, Sep 27, 2021 at 8:48 PM Duncan Bellamy > wrote: > > Hi, > > It’s in brew: > > > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SEEM7QGCPAWDPIIW3P4ZYU432N6BQJGT/ > > > Regards, > > Duncan > > > On 27 Sep 2021, at 17:46, Daniel Persson wrote: > > > Hi > > > I'm running some tests on a couple of Mac Mini machines. One of them is an > > M1 with BigSur, and the other one is a regular Intel Mac with Catalina. > > > I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times with > > different parameters and added many dependencies to the systems but have > > not been able to build the software. > > > Has anyone tried to connect a Mac to your Ceph filesystem before? Do I need > > to build the packages on the machine, or is there a more straightforward > > way? > > > Thank you for reading this. > > > Best regards > > Daniel > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MacOS Ceph Filesystem client
Hi Duncan. Great suggestion. Thank you for the link. I've run it on both the M1 BigSur mac and it did not compile because it didn't have a FUSE:FUSE target whatever that meant. === Last 15 lines from /Users/danielp/Library/Logs/Homebrew/ceph-client/03.cmake: CMake Error at src/rbd_fuse/CMakeLists.txt:1 (add_executable): Target "rbd-fuse" links to target "FUSE::FUSE" but the target was not found. Perhaps a find_package() call is missing for an IMPORTED target, or an ALIAS target is missing? === On the Catalina Intel-mac I was able to build it and ran ceph-fuse multiple times and it seems that it connects because the client is added to the client list in my Cluster but the drive is never mapped to the directory I try to map it too. So it seems not to work there either sadly. === admin$ sudo ceph-fuse -r /mydirectory -m 192.168.6.6:6789 /Users/admin/cephfs & ceph-fuse[60646]: starting ceph client 2021-09-27T23:16:20.498+0200 113a7ddc0 -1 init, newargv = 0x7f8ad6449d90 newargc=5 === Does anyone have any ideas on how I could proceed? Best regards Daniel On Mon, Sep 27, 2021 at 8:48 PM Duncan Bellamy wrote: > Hi, > It’s in brew: > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SEEM7QGCPAWDPIIW3P4ZYU432N6BQJGT/ > > Regards, > Duncan > > On 27 Sep 2021, at 17:46, Daniel Persson wrote: > > Hi > > I'm running some tests on a couple of Mac Mini machines. One of them is an > M1 with BigSur, and the other one is a regular Intel Mac with Catalina. > > I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times with > different parameters and added many dependencies to the systems but have > not been able to build the software. > > Has anyone tried to connect a Mac to your Ceph filesystem before? Do I need > to build the packages on the machine, or is there a more straightforward > way? > > Thank you for reading this. > > Best regards > Daniel > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] MacOS Ceph Filesystem client
Hi I'm running some tests on a couple of Mac Mini machines. One of them is an M1 with BigSur, and the other one is a regular Intel Mac with Catalina. I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times with different parameters and added many dependencies to the systems but have not been able to build the software. Has anyone tried to connect a Mac to your Ceph filesystem before? Do I need to build the packages on the machine, or is there a more straightforward way? Thank you for reading this. Best regards Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Cache tiering adding a storage tier
Hi Everyone. I'm "new" to Ceph, only been administrating a cluster for about a year now, so there is a lot more for me to learn about the subject. The latest concept I've been looking into is Cache Tiering. I added it to my home cluster without a problem and didn't see a degradation in performance, and it seemed just to work. The documentation is pretty straightforward to follow. First, add a hot pool and then enable the tier and overlay for that hot pool. Then configure a bunch of values, and it will do its thing. In our production environment, we are running the hot pool, a lot of NVME disks with low latency. But as we work with a lot of archival data, it would be nice to move it over to spinning disks for a long time archive. How do I add cold storage to a running cluster without disrupting the current operation? Thank you for reading my email. Best regards Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Fwd: Module 'devicehealth' has failed
Hi David It's hard to say with so little information what could be wrong, and I have not seen any response yet, so I thought I could give you something that might help you. I've done a video about setting up the Ceph, Grafana, and Prometheus triangle from scratch, the components responsible for hardware metrics and monitoring. Maybe that could help? https://youtu.be/c8R64LF3JjU And I've also done a separate video about disk prediction and smart data in a Ceph cluster that could give you some insights. https://youtu.be/KFBuqTyxalM I hope this helps. Best regards Daniel On Sun, Sep 5, 2021 at 7:26 AM David Yang wrote: > hi, buddy > > I have a ceph file system cluster, using ceph version 15.2.14. > > But the current status of the cluster is HEALTH_ERR. > > health: HEALTH_ERR > Module 'devicehealth' has failed: > > The content in the mgr log is as follows: > > 2021-09-05T13:20:32.922+0800 7f2b8621b700 0 log_channel(audit) log [DBG]: > from='client.2109753 -'entity='client.admin' cmd=[{"prefix": "fs status", > "target": ["mon-mgr", ""]}]: dispatch > 2021-09-05T13:20:32.922+0800 7f2b86a1c700 0 [status ERROR root] > handle_command > > > How to fix this error, please help, thank you > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [Ceph Dashboard] Alert configuration.
Hi Lokendra There are a lot of ways to see the status of your cluster. The main way to see it is to watch the dashboard alerts to see the most pressing matters to handle. You can also follow the log that the manager will keep as notifications. I usually use the "ceph health detail" to get the information in the terminal, which gives you full information about the cluster and sometimes even more handy tips on what to look into. The AlertManager is good if you want to have an alert sent to you so you can keep track of it on the go but not necessary for your daily use. I don't know if I've missed something or if I miss understood your question, but I hope this helps. Best regards Daniel On Tue, Aug 24, 2021 at 6:17 AM Lokendra Rathour wrote: > Hello Everyone, > We have deployed Ceph ceph-ansible (Pacific Release). > Query: > Is it possible (if yes then what is the way), to view/verify the alerts > (health/System both) directly without AlertManager? > Or > Can Ceph Dashboard only Only can help us see the Alerts in the Ceph > Cluster(Health/System)? > > > Please advise. > -- > ~ Lokendra > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Why you might want packages not containers for Ceph deployments
Hi Everyone. I thought I put in my 5 cents as I believe this is an exciting topic. I'm also a newbie, only running a cluster for about a year. I did some research before that and also have created a couple of videos on the topic. One of them was upgrading a cluster using cephadm. - ABOUT MY SETUP Currently, I manage a cluster with ten hosts and 29 OSDs which is not that large but critical for our operations as it is the backbone of our web application. We made a move in a hurry when we realized that the disc drive in the machine where the application was hosted was too slow to handle all the requests, and we also had the issue that we needed more compute power and distribution to ensure fault tolerance. This leads to us moving all data while buying hardware and migrating clusters in one marathon upgrade. After that experience, I was delighted with the stability that a Ceph solution gave us. And it has been working quite well since then. To do more research and prepare for the future, my company bought a couple of machines for my home, so now I have a small cluster with four hosts / four OSDs at home to store my backup of youtube video material and also try new technologies. - ABOUT MY SETUP END Now back to the experience of using cephadm. I installed a test cluster locally with nine hosts in a VirtualBox environment, running Debian. Setting up cephadm was pretty straight forward and doing the upgrade was also "easy". But I was not fond of it at all as I felt that I lost control. I had set up a couple of machines with different hardware profiles to run various services on each, and when I put hosts into the cluster and deployed services, cephadm choose to put things on machines not well suited to handle that kind of work. Also, future more running the upgrade, you got one line of text on the current progress, so I felt I was not in control of what happened. Currently, I run with the built packages for Debian and use the same operating system and packages on all machines, and upgrading a cluster is as easy as running the apt update and apt upgrade. After reboot, that machine is done. By doing that in the correct order, you will have complete control. And if anything goes wrong on the way, you can manage that machine by machine. I understand that this works well for a small cluster with less than ten hosts, as in my case. And might not be feasible if you have a server park with 1000 servers, but then again, controlling and managing your cluster is a part of the work, so perhaps you don't want an automatic solution there either. A minor other issue is that Docker adds complexity and takes some resources that you might want to the cluster instead. What comes to mind is a solution running hundreds of OSDs hosts on Raspberry PI switchblades in a rack over POE+. Also, I saw a solution to mount 16 PIs in 1U with M2 ports for a large SSD, which could be a fun solution for a cluster. Best regards Daniel On Tue, Aug 17, 2021 at 8:09 PM Andrew Walker-Brown < andrew_jbr...@hotmail.com> wrote: > Hi, > > I’m coming at this from the position of a newbie to Ceph. I had some > experience of it as part of Proxmox, but not as a standalone solution. > > I really don’t care whether Ceph is contained or not, I don’t have the > depth of knowledge or experience to argue it either way. I can see that > containers may well offer a more consistent deployment scenario with fewer > dependencies on the external host OS. Upgrades/patches to the host OS may > not impact the container deployment etc., with the two systems not held in > any lock-step. > > The challenge for me hasn’t been Ceph its self. Ceph has worked > brilliantly, I have a fully resilient architecture split between two active > datacentres and my storage can survive up-to 50% node/OSD hardware failure. > > No, the challenge has been documentation. I’ve run off down multiple > rabbit holes trying to find solutions to problems or just background > information. I’ve been tripped up by not spotting the Ceph documentation > was “v: latest” rather than “v: octopus”...so features didn’t exist or > commands were structured slightly differently. > > Also just not being obvious whether the bit of documentation I was looking > at related to a native Ceph package deployment or a container one. Plus > you get the Ceph/Suse/Redhat/Proxmox/IBM etc..etc.. flavour answer > depending on which Google link you click. Yes I know, its part of the joy > of working with open sourcebut still, not what you need when I chunk of > infrastructure has failed and you don’t know why. > > I’m truly in awe of what the Ceph community has produced and is planning > for the future, so don’t think I’m any kind of hater. > > My biggest request is for the documentation to take on some > restructuring. Keep the different deployment methods documented > separately, yes an intro covering the various options and recommendations > is great, but
[ceph-users] Re: BUG #51821 - client is using insecure global_id reclaim
Hi again. I've now solved my issue with help from people in this group. Thank you for helping out. I thought the process was a bit complicated so I created a short video describing the process. https://youtu.be/Ds4Wvvo79-M I hope this helps someone else, and again thank you. Best regards Daniel On Mon, Aug 9, 2021 at 5:43 PM Ilya Dryomov wrote: > On Mon, Aug 9, 2021 at 5:14 PM Robert W. Eckert > wrote: > > > > I have had the same issue with the windows client. > > I had to issue > > ceph config set mon auth_expose_insecure_global_id_reclaim false > > Which allows the other clients to connect. > > I think you need to restart the monitors as well, because the first few > times I tried this, I still couldn't connect. > > For archive's sake, I'd like to mention that disabling > auth_expose_insecure_global_id_reclaim isn't right and it wasn't > intended for this. Enabling auth_allow_insecure_global_id_reclaim > should be enough to allow all (however old) clients to connect. > The fact that it wasn't enough for the available Windows build > suggests that there is some subtle breakage in it because all "expose" > does is it forces the client to connect twice instead of just once. > It doesn't actually refuse old unpatched clients. > > (The breakage isn't surprising given that the available build is > more or less a random development snapshot with some pending at the > time Windows-specific patches applied. I'll try to escalate issue > and get the linked MSI bundle updated.) > > Thanks, > > Ilya > > > > > -Original Message- > > From: Richard Bade > > Sent: Sunday, August 8, 2021 8:27 PM > > To: Daniel Persson > > Cc: Ceph Users > > Subject: [ceph-users] Re: BUG #51821 - client is using insecure > global_id reclaim > > > > Hi Daniel, > > I had a similar issue last week after upgrading my test cluster from > > 14.2.13 to 14.2.22 which included this fix for Global ID reclaim in .20. > My issue was a rados gw that I was re-deploying on the latest version. The > problem seemed to be related with cephx authentication. > > It kept displaying the error message you have and the service wouldn't > start. > > I ended up stopping and removing the old rgw service, deleting all the > keys in /etc/ceph/ and all data in /var/lib/ceph/radosgw/ and re-deploying > the radosgw. This used the new rgw bootstrap keys and new key for this > radosgw. > > So, I would suggest you double and triple check which keys your clients > are using and that cephx is enabled correctly on your cluster. > > Check your admin key in /etc/ceph as well, as that's what's being used > for ceph status. > > > > Regards, > > Rich > > > > On Sun, 8 Aug 2021 at 05:01, Daniel Persson > wrote: > > > > > > Hi everyone. > > > > > > I suggested asking for help here instead of in the bug tracker so that > > > I will try it. > > > > > > https://tracker.ceph.com/issues/51821?next_issue_id=51820_issue_i > > > d=51824 > > > > > > I have a problem that I can't seem to figure out how to resolve the > issue. > > > > > > AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id > > > reclaim > > > AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure > > > global_id reclaim > > > > > > > > > Both of these have to do with reclaiming ID and securing that no > > > client could steal or reuse another client's ID. I understand the > > > reason for this and want to resolve the issue. > > > > > > Currently, I have three different clients. > > > > > > * One Windows client using the latest Ceph-Dokan build. (ceph version > > > 15.0.0-22274-g5656003758 (5656003758614f8fd2a8c49c2e7d4f5cd637b0ea) > > > pacific > > > (rc)) > > > * One Linux Debian build using the built packages for that kernel. ( > > > 4.19.0-17-amd64) > > > * And one client that I've built from source for a raspberry PI as > > > there is no arm build for the Pacific release. (5.11.0-1015-raspi) > > > > > > If I switch over to not allow global id reclaim, none of these clients > > > could connect, and using the command "ceph status" on one of my nodes > > > will also fail. > > > > > > All of them giving the same error message: > > > > > > monclient(hunting): handle_auth_bad_method server allowed_methods [2] > > > but i only support [2] > > > > > > > > > Has anyone encountered this problem and have any
[ceph-users] Re: BUG #51821 - client is using insecure global_id reclaim
Hi Tobias and Richard. Thank you for answering my questions. I got the link suggested by Tobias on the issue report, which led me to further investigation. It was hard to see what version the kernel version on the system was using, but looking at the result of "ceph health detail" and ldd librados2.so could give me some information. It seemed that one of my Linux environments used the old buster kernel model, which was 12.2.* and not compatible with the new global ID reclaim. Another issue I got was that the windows client available for download uses a strange version 15.0.0 Pacific, which is just not correct. After reading and searching on GitHub, I realized that the windows executables could be built in a Linux environment using the ceph source code. So I've now built new binaries to windows that work just fine except for a libwnbd.dll which were never built. But adding it from the old installation, I got it to work. Now ceph-dokan reports a version of 16.2.5, which was the version I built. Building this was not straightforward, and something I think could be interesting for the community. So I'm planning to create an instruction video on the subject that I will publish next week. Again thank you for your help. Best regards Daniel On Mon, Aug 9, 2021 at 11:46 AM Tobias Urdin wrote: > Hello, > > Did you follow the fix/recommendation when applying patches as per > the documentation in the CVE security post [1] ? > > Best regards > > [1] https://docs.ceph.com/en/latest/security/CVE-2021-20288/ > > > On 9 Aug 2021, at 02:26, Richard Bade wrote: > > > > Hi Daniel, > > I had a similar issue last week after upgrading my test cluster from > > 14.2.13 to 14.2.22 which included this fix for Global ID reclaim in > > .20. My issue was a rados gw that I was re-deploying on the latest > > version. The problem seemed to be related with cephx authentication. > > It kept displaying the error message you have and the service wouldn't > > start. > > I ended up stopping and removing the old rgw service, deleting all the > > keys in /etc/ceph/ and all data in /var/lib/ceph/radosgw/ and > > re-deploying the radosgw. This used the new rgw bootstrap keys and new > > key for this radosgw. > > So, I would suggest you double and triple check which keys your > > clients are using and that cephx is enabled correctly on your cluster. > > Check your admin key in /etc/ceph as well, as that's what's being used > > for ceph status. > > > > Regards, > > Rich > > > > On Sun, 8 Aug 2021 at 05:01, Daniel Persson > wrote: > >> > >> Hi everyone. > >> > >> I suggested asking for help here instead of in the bug tracker so that I > >> will try it. > >> > >> > https://tracker.ceph.com/issues/51821?next_issue_id=51820_issue_id=51824 > >> > >> I have a problem that I can't seem to figure out how to resolve the > issue. > >> > >> AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id > reclaim > >> AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure > >> global_id reclaim > >> > >> > >> Both of these have to do with reclaiming ID and securing that no client > >> could steal or reuse another client's ID. I understand the reason for > this > >> and want to resolve the issue. > >> > >> Currently, I have three different clients. > >> > >> * One Windows client using the latest Ceph-Dokan build. (ceph version > >> 15.0.0-22274-g5656003758 (5656003758614f8fd2a8c49c2e7d4f5cd637b0ea) > pacific > >> (rc)) > >> * One Linux Debian build using the built packages for that kernel. ( > >> 4.19.0-17-amd64) > >> * And one client that I've built from source for a raspberry PI as > there is > >> no arm build for the Pacific release. (5.11.0-1015-raspi) > >> > >> If I switch over to not allow global id reclaim, none of these clients > >> could connect, and using the command "ceph status" on one of my nodes > will > >> also fail. > >> > >> All of them giving the same error message: > >> > >> monclient(hunting): handle_auth_bad_method server allowed_methods [2] > >> but i only support [2] > >> > >> > >> Has anyone encountered this problem and have any suggestions? > >> > >> PS. The reason I have 3 different hosts is that this is a test > environment > >> where I try to resolve and look at issues before we upgrade our > production > >> environment to pacific. DS. > >> > >> Best regards > >> Daniel > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] BUG #51821 - client is using insecure global_id reclaim
Hi everyone. I suggested asking for help here instead of in the bug tracker so that I will try it. https://tracker.ceph.com/issues/51821?next_issue_id=51820_issue_id=51824 I have a problem that I can't seem to figure out how to resolve the issue. AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id reclaim AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim Both of these have to do with reclaiming ID and securing that no client could steal or reuse another client's ID. I understand the reason for this and want to resolve the issue. Currently, I have three different clients. * One Windows client using the latest Ceph-Dokan build. (ceph version 15.0.0-22274-g5656003758 (5656003758614f8fd2a8c49c2e7d4f5cd637b0ea) pacific (rc)) * One Linux Debian build using the built packages for that kernel. ( 4.19.0-17-amd64) * And one client that I've built from source for a raspberry PI as there is no arm build for the Pacific release. (5.11.0-1015-raspi) If I switch over to not allow global id reclaim, none of these clients could connect, and using the command "ceph status" on one of my nodes will also fail. All of them giving the same error message: monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2] Has anyone encountered this problem and have any suggestions? PS. The reason I have 3 different hosts is that this is a test environment where I try to resolve and look at issues before we upgrade our production environment to pacific. DS. Best regards Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io