Re: [ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging
We've also seen some problems with FileStore on newer kernels; 4.9 is the last kernel that worked reliably with FileStore in my experience. But I haven't seen problems with BlueStore related to the kernel version (well, except for that scrub bug, but my work-around for that is in all release versions). Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Mon, Jan 6, 2020 at 8:44 PM Jelle de Jong wrote: > Hello everybody, > > I have issues with very slow requests a simple tree node cluster here, > four WDC enterprise disks and Intel Optane NVMe journal on identical > high memory nodes, with 10GB networking. > > It was working all good with Ceph Hammer on Debian Wheezy, but I wanted > to upgrade to a supported version and test out bluestore as well. So I > upgraded to luminous on Debian Stretch and used ceph-volume to create > bluestore osds, everything went downhill from there. > > I went back to filestore on all nodes but I still have slow requests and > I can not pinpoint a good reason I tried to debug and gathered > information to look at: > > https://paste.debian.net/hidden/acc5d204/ > > First I thought it was the balancing that was making things slow, then I > thought it might be the LVM layer, so I recreated the nodes without LVM > by switching from ceph-volume to ceph-disk, no different still slow > request. Then I changed back from bluestore to filestore but still the a > very slow cluster. Then I thought it was a CPU scheduling issue and > downgraded the 5.x kernel and CPU performance is full speed again. I > thought maybe there is something weird with an osd and taking them out > one by one, but slow request are still showing up and client performance > from vms is really poor. > > I just feel a burst of small requests keeps blocking for a while then > recovers again. > > Many thanks for helping out looking at the URL. > > If there are options which I should tune for a hdd with nvme journal > setup please share. > > Jelle > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Random slow requests without any load
Hi, What are the full commands you used to setup this iptables config? iptables --table raw --append OUTPUT --jump NOTRACK iptables --table raw --append PREROUTING --jump NOTRACK Does not create the same output, it needs some more. Kind regards, Jelle de Jong On 2019-07-17 14:59, Kees Meijs wrote: Hi, Experienced similar issues. Our cluster internal network (completely separated) now has NOTRACK (no connection state tracking) iptables rules. In full: # iptables-save # Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019 *filter :FORWARD DROP [0:0] :OUTPUT ACCEPT [0:0] :INPUT ACCEPT [0:0] COMMIT # Completed on Wed Jul 17 14:57:38 2019 # Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019 *raw :OUTPUT ACCEPT [0:0] :PREROUTING ACCEPT [0:0] -A OUTPUT -j NOTRACK -A PREROUTING -j NOTRACK COMMIT # Completed on Wed Jul 17 14:57:38 2019 Ceph uses IPv4 in our case, but to be complete: # ip6tables-save # Generated by xtables-save v1.8.2 on Wed Jul 17 14:58:20 2019 *filter :OUTPUT ACCEPT [0:0] :INPUT ACCEPT [0:0] :FORWARD DROP [0:0] COMMIT # Completed on Wed Jul 17 14:58:20 2019 # Generated by xtables-save v1.8.2 on Wed Jul 17 14:58:20 2019 *raw :OUTPUT ACCEPT [0:0] :PREROUTING ACCEPT [0:0] -A OUTPUT -j NOTRACK -A PREROUTING -j NOTRACK COMMIT # Completed on Wed Jul 17 14:58:20 2019 Using this configuration, state tables never ever can fill up with dropped connections as effect. Cheers, Kees On 17-07-2019 11:27, Maximilien Cuony wrote: Just a quick update about this if somebody else get the same issue: The problem was with the firewall. Port range and established connection are allowed, but for some reasons it seems the tracking of connections are lost, leading to a strange state where one machine refuse data (RST are replied) and the sender never get the RST packed (even with 'related' packets allowed). There was a similar post on this list in February ("Ceph and TCP States") where lossing of connections in conntrack created issues, but the fix, net.netfilter.nf_conntrack_tcp_be_liberal=1 did not improve that particular case. As a workaround, we installed lighter rules for the firewall (allowing all packets from machines inside the cluster by default) and that "fixed" the issue :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging
Hello everybody, I have issues with very slow requests a simple tree node cluster here, four WDC enterprise disks and Intel Optane NVMe journal on identical high memory nodes, with 10GB networking. It was working all good with Ceph Hammer on Debian Wheezy, but I wanted to upgrade to a supported version and test out bluestore as well. So I upgraded to luminous on Debian Stretch and used ceph-volume to create bluestore osds, everything went downhill from there. I went back to filestore on all nodes but I still have slow requests and I can not pinpoint a good reason I tried to debug and gathered information to look at: https://paste.debian.net/hidden/acc5d204/ First I thought it was the balancing that was making things slow, then I thought it might be the LVM layer, so I recreated the nodes without LVM by switching from ceph-volume to ceph-disk, no different still slow request. Then I changed back from bluestore to filestore but still the a very slow cluster. Then I thought it was a CPU scheduling issue and downgraded the 5.x kernel and CPU performance is full speed again. I thought maybe there is something weird with an osd and taking them out one by one, but slow request are still showing up and client performance from vms is really poor. I just feel a burst of small requests keeps blocking for a while then recovers again. Many thanks for helping out looking at the URL. If there are options which I should tune for a hdd with nvme journal setup please share. Jelle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Dashboard RBD Image listing takes forever
Hi Matt, On 1/6/20 4:33 PM, Matt Dunavant wrote: > I was hoping there was some update on this bug: > https://tracker.ceph.com/issues/39140 > > In all recent versions of the dashboard, the RBD image page takes > forever to populate due to this bug. All our images have fast-diff > enabled, so it can take 15-20 min to populate this page with about > 20-30 images. Thanks for bringing this up and the reminder. I've just updated the tracker issue by pointing it to the current pull request that intends to address this: https://github.com/ceph/ceph/pull/28387 - looks like this approach needs further testing/review before we can merge it, it currently is still marked as "Draft". @Ernesto - any news/thoughts about this from your POV? Thanks, Lenz -- SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg GF: Felix Imendörffer, HRB 36809 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Dashboard RBD Image listing takes forever
Hi, I was hoping there was some update on this bug: https://tracker.ceph.com/issues/39140 In all recent versions of the dashboard, the RBD image page takes forever to populate due to this bug. All our images have fast-diff enabled, so it can take 15-20 min to populate this page with about 20-30 images. Thanks, -Matt ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Install specific version using ansible
Hello all! I'm trying to install a specific version of luminous (12.2.4). In the directory group_vars/all.yml I can specify the luminous version, but i didn't find a place where I can be more specific about the version. The ansible installs the latest version (12.2.12 at this time). I'm using ceph ansible stable-3.1 Is it possible, or I have to downgrade? Thanks in advance, Marcelo. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd du command
On Mon, Jan 6, 2020 at 2:51 PM M Ranga Swami Reddy wrote: > > Thank you. > Can you please share a simple example here? > > Thanks > Swami > > On Mon, Jan 6, 2020 at 4:02 PM wrote: >> >> Hi, >> >> rbd are thin provisionned, you need to trim on the upper level, either >> via the fstrim command, or the discard option (on Linux) >> >> Unless you trim, the rbd layer does not know that data has been removed >> and are thus no longer needed >> >> >> >> On 1/6/20 10:30 AM, M Ranga Swami Reddy wrote: >> > Hello, >> > I ran the "rbd du /image" command. Its shows increasing, when I add >> > data to the image. That looks good. But when I removed data from the image, >> > its not showing the decreasing the size. >> > >> > Is this expected with "rbd du" or its not implemented? >> > >> > NOTE: Expected behavior is the same as " Linux du command" >> > >> > Thanks >> > Swami Literally just "sudo fstrim ". Another alternative is to mount with "-o discard", but that can negatively affect performance. I wrote up a detailed explanation of what is reported by "rbd du" in another thread: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg57186.html Thanks, Ilya ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd du command
Thank you. Can you please share a simple example here? Thanks Swami On Mon, Jan 6, 2020 at 4:02 PM wrote: > Hi, > > rbd are thin provisionned, you need to trim on the upper level, either > via the fstrim command, or the discard option (on Linux) > > Unless you trim, the rbd layer does not know that data has been removed > and are thus no longer needed > > > > On 1/6/20 10:30 AM, M Ranga Swami Reddy wrote: > > Hello, > > I ran the "rbd du /image" command. Its shows increasing, when I add > > data to the image. That looks good. But when I removed data from the > image, > > its not showing the decreasing the size. > > > > Is this expected with "rbd du" or its not implemented? > > > > NOTE: Expected behavior is the same as " Linux du command" > > > > Thanks > > Swami > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd du command
Hi, rbd are thin provisionned, you need to trim on the upper level, either via the fstrim command, or the discard option (on Linux) Unless you trim, the rbd layer does not know that data has been removed and are thus no longer needed On 1/6/20 10:30 AM, M Ranga Swami Reddy wrote: > Hello, > I ran the "rbd du /image" command. Its shows increasing, when I add > data to the image. That looks good. But when I removed data from the image, > its not showing the decreasing the size. > > Is this expected with "rbd du" or its not implemented? > > NOTE: Expected behavior is the same as " Linux du command" > > Thanks > Swami > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Infiniband backend OSD communication
>From my understanding, the basic idea is that ceph exchange rdma information(qp,gid and so) through ip address on rdma device, and then communicate with each other throng rdma. But in my tests, there seemed to be some issues in that codes. On Fri, Jan 3, 2020 at 2:24 AM Nathan Stratton wrote: > > I am working on upgrading my current ethernet only ceph cluster to a combined > ethernet frontend and infiniband backend. From my research I understand that > I set: > > ms_cluster_type = async+rdma > ms_async_rdma_device_name = mlx4_0 > > What I don't understand is how does ceph know how to reach each OSD over > RDMA? Do I have to run IPoIB on top of infiniband and use that for OSD > addresses? > > Is there a way to use infiniband on backend without IPoIB and just use rdma > verbs? > > ><> > nathan stratton > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd du command
Hello, I ran the "rbd du /image" command. Its shows increasing, when I add data to the image. That looks good. But when I removed data from the image, its not showing the decreasing the size. Is this expected with "rbd du" or its not implemented? NOTE: Expected behavior is the same as " Linux du command" Thanks Swami ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Architecture - Recommendations
Quoting Radhakrishnan2 S (radhakrishnan...@tcs.com): > Where hypervisor would be your Ceph nodes. I.e. you can connect your > Ceph nodes on L2 or make them part of the L3 setup (more modern way of > doing it). You can use "ECMP" to add more network capacity when you need > it. Setting up a BGP EVPN VXLAN network is not trivial ... I advise on > getting networking expertise in your team. > > Radha: Thanks for the reference. We are planning to have a dedicated > set of nodes, for our ceph cluster and not make it hyperconverged. Do > you see that as a recommended option ? Since we might also have > baremetal servers for workloads, we want to make the storage as a > separate dedicated one. I would definately recommend that, especially for larger deployments. Although this might look old fashioned as the rest of the world is changing everything in containers. It makes (performance) debugging *a lot* easier as you can actually isolate things. Something which is way more difficult to achieve in servers where you have a complex workload going on ... I guess (no proof of that) that performance will be more consistent as well. Gr. Stefan -- | BIT BV https://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com