Re: [ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-06 Thread Paul Emmerich
We've also seen some problems with FileStore on newer kernels; 4.9 is the
last kernel that worked reliably with FileStore in my experience.

But I haven't seen problems with BlueStore related to the kernel version
(well, except for that scrub bug, but my work-around for that is in all
release versions).

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Mon, Jan 6, 2020 at 8:44 PM Jelle de Jong 
wrote:

> Hello everybody,
>
> I have issues with very slow requests a simple tree node cluster here,
> four WDC enterprise disks and Intel Optane NVMe journal on identical
> high memory nodes, with 10GB networking.
>
> It was working all good with Ceph Hammer on Debian Wheezy, but I wanted
> to upgrade to a supported version and test out bluestore as well. So I
> upgraded to luminous on Debian Stretch and used ceph-volume to create
> bluestore osds, everything went downhill from there.
>
> I went back to filestore on all nodes but I still have slow requests and
> I can not pinpoint a good reason I tried to debug and gathered
> information to look at:
>
> https://paste.debian.net/hidden/acc5d204/
>
> First I thought it was the balancing that was making things slow, then I
> thought it might be the LVM layer, so I recreated the nodes without LVM
> by switching from ceph-volume to ceph-disk, no different still slow
> request. Then I changed back from bluestore to filestore but still the a
> very slow cluster. Then I thought it was a CPU scheduling issue and
> downgraded the 5.x kernel and CPU performance is full speed again. I
> thought maybe there is something weird with an osd and taking them out
> one by one, but slow request are still showing up and client performance
> from vms is really poor.
>
> I just feel a burst of small requests keeps blocking for a while then
> recovers again.
>
> Many thanks for helping out looking at the URL.
>
> If there are options which I should tune for a hdd with nvme journal
> setup please share.
>
> Jelle
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Random slow requests without any load

2020-01-06 Thread Jelle de Jong

Hi,

What are the full commands you used to setup this iptables config?

iptables --table raw --append OUTPUT --jump NOTRACK
iptables --table raw --append PREROUTING --jump NOTRACK

Does not create the same output, it needs some more.

Kind regards,

Jelle de Jong



On 2019-07-17 14:59, Kees Meijs wrote:

Hi,

Experienced similar issues. Our cluster internal network (completely
separated) now has NOTRACK (no connection state tracking) iptables rules.

In full:


# iptables-save
# Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019
*filter
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:INPUT ACCEPT [0:0]
COMMIT
# Completed on Wed Jul 17 14:57:38 2019
# Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019
*raw
:OUTPUT ACCEPT [0:0]
:PREROUTING ACCEPT [0:0]
-A OUTPUT -j NOTRACK
-A PREROUTING -j NOTRACK
COMMIT
# Completed on Wed Jul 17 14:57:38 2019


Ceph uses IPv4 in our case, but to be complete:


# ip6tables-save
# Generated by xtables-save v1.8.2 on Wed Jul 17 14:58:20 2019
*filter
:OUTPUT ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD DROP [0:0]
COMMIT
# Completed on Wed Jul 17 14:58:20 2019
# Generated by xtables-save v1.8.2 on Wed Jul 17 14:58:20 2019
*raw
:OUTPUT ACCEPT [0:0]
:PREROUTING ACCEPT [0:0]
-A OUTPUT -j NOTRACK
-A PREROUTING -j NOTRACK
COMMIT
# Completed on Wed Jul 17 14:58:20 2019


Using this configuration, state tables never ever can fill up with
dropped connections as effect.

Cheers,
Kees

On 17-07-2019 11:27, Maximilien Cuony wrote:

Just a quick update about this if somebody else get the same issue:

The problem was with the firewall. Port range and established
connection are allowed, but for some reasons it seems the tracking of
connections are lost, leading to a strange state where one machine
refuse data (RST are replied) and the sender never get the RST packed
(even with 'related' packets allowed).

There was a similar post on this list in February ("Ceph and TCP
States") where lossing of connections in conntrack created issues, but
the fix, net.netfilter.nf_conntrack_tcp_be_liberal=1 did not improve
that particular case.

As a workaround, we installed lighter rules for the firewall (allowing
all packets from machines inside the cluster by default) and that
"fixed" the issue :)



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow request and unresponsive kvm guests after upgrading ceph cluster and os, please help debugging

2020-01-06 Thread Jelle de Jong

Hello everybody,

I have issues with very slow requests a simple tree node cluster here, 
four WDC enterprise disks and Intel Optane NVMe journal on identical 
high memory nodes, with 10GB networking.


It was working all good with Ceph Hammer on Debian Wheezy, but I wanted 
to upgrade to a supported version and test out bluestore as well. So I 
upgraded to luminous on Debian Stretch and used ceph-volume to create 
bluestore osds, everything went downhill from there.


I went back to filestore on all nodes but I still have slow requests and 
I can not pinpoint a good reason I tried to debug and gathered 
information to look at:


https://paste.debian.net/hidden/acc5d204/

First I thought it was the balancing that was making things slow, then I 
thought it might be the LVM layer, so I recreated the nodes without LVM 
by switching from ceph-volume to ceph-disk, no different still slow 
request. Then I changed back from bluestore to filestore but still the a 
very slow cluster. Then I thought it was a CPU scheduling issue and 
downgraded the 5.x kernel and CPU performance is full speed again. I 
thought maybe there is something weird with an osd and taking them out 
one by one, but slow request are still showing up and client performance 
from vms is really poor.


I just feel a burst of small requests keeps blocking for a while then 
recovers again.


Many thanks for helping out looking at the URL.

If there are options which I should tune for a hdd with nvme journal 
setup please share.


Jelle
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dashboard RBD Image listing takes forever

2020-01-06 Thread Lenz Grimmer
Hi Matt,

On 1/6/20 4:33 PM, Matt Dunavant wrote:

> I was hoping there was some update on this bug:
> https://tracker.ceph.com/issues/39140
> 
> In all recent versions of the dashboard, the RBD image page takes 
> forever to populate due to this bug. All our images have fast-diff 
> enabled, so it can take 15-20 min to populate this page with about
> 20-30 images.

Thanks for bringing this up and the reminder. I've just updated the
tracker issue by pointing it to the current pull request that intends to
address this: https://github.com/ceph/ceph/pull/28387 - looks like this
approach needs further testing/review before we can merge it, it
currently is still marked as "Draft".

@Ernesto - any news/thoughts about this from your POV?

Thanks,

Lenz

-- 
SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg
GF: Felix Imendörffer, HRB 36809 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Dashboard RBD Image listing takes forever

2020-01-06 Thread Matt Dunavant
Hi,


I was hoping there was some update on this bug: 
https://tracker.ceph.com/issues/39140


In all recent versions of the dashboard, the RBD image page takes forever to 
populate due to this bug. All our images have fast-diff enabled, so it can take 
15-20 min to populate this page with about 20-30 images.


Thanks,

-Matt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Install specific version using ansible

2020-01-06 Thread Marcelo Miziara
Hello all!
I'm trying to install a specific version of luminous (12.2.4). In the
directory group_vars/all.yml I can specify the luminous version, but i
didn't find a place where I can be more specific about the version.

The ansible installs the latest version (12.2.12 at this time).

I'm using ceph ansible stable-3.1

Is it possible, or I have to downgrade?

Thanks in advance, Marcelo.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd du command

2020-01-06 Thread Ilya Dryomov
On Mon, Jan 6, 2020 at 2:51 PM M Ranga Swami Reddy  wrote:
>
> Thank you.
> Can you please share a simple example here?
>
> Thanks
> Swami
>
> On Mon, Jan 6, 2020 at 4:02 PM  wrote:
>>
>> Hi,
>>
>> rbd are thin provisionned, you need to trim on the upper level, either
>> via the fstrim command, or the discard option (on Linux)
>>
>> Unless you trim, the rbd layer does not know that data has been removed
>> and are thus no longer needed
>>
>>
>>
>> On 1/6/20 10:30 AM, M Ranga Swami Reddy wrote:
>> > Hello,
>> > I ran the "rbd du /image" command. Its shows increasing, when I add
>> > data to the image. That looks good. But when I removed data from the image,
>> > its not showing the decreasing the size.
>> >
>> > Is this expected with "rbd du" or its not implemented?
>> >
>> > NOTE: Expected behavior is the same as " Linux du command"
>> >
>> > Thanks
>> > Swami

Literally just "sudo fstrim ".  Another alternative is to
mount with "-o discard", but that can negatively affect performance.

I wrote up a detailed explanation of what is reported by "rbd du" in
another thread:

  https://www.mail-archive.com/ceph-users@lists.ceph.com/msg57186.html

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd du command

2020-01-06 Thread M Ranga Swami Reddy
Thank you.
Can you please share a simple example here?

Thanks
Swami

On Mon, Jan 6, 2020 at 4:02 PM  wrote:

> Hi,
>
> rbd are thin provisionned, you need to trim on the upper level, either
> via the fstrim command, or the discard option (on Linux)
>
> Unless you trim, the rbd layer does not know that data has been removed
> and are thus no longer needed
>
>
>
> On 1/6/20 10:30 AM, M Ranga Swami Reddy wrote:
> > Hello,
> > I ran the "rbd du /image" command. Its shows increasing, when I add
> > data to the image. That looks good. But when I removed data from the
> image,
> > its not showing the decreasing the size.
> >
> > Is this expected with "rbd du" or its not implemented?
> >
> > NOTE: Expected behavior is the same as " Linux du command"
> >
> > Thanks
> > Swami
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd du command

2020-01-06 Thread ceph
Hi,

rbd are thin provisionned, you need to trim on the upper level, either
via the fstrim command, or the discard option (on Linux)

Unless you trim, the rbd layer does not know that data has been removed
and are thus no longer needed



On 1/6/20 10:30 AM, M Ranga Swami Reddy wrote:
> Hello,
> I ran the "rbd du /image" command. Its shows increasing, when I add
> data to the image. That looks good. But when I removed data from the image,
> its not showing the decreasing the size.
> 
> Is this expected with "rbd du" or its not implemented?
> 
> NOTE: Expected behavior is the same as " Linux du command"
> 
> Thanks
> Swami
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Infiniband backend OSD communication

2020-01-06 Thread Wei Zhao
>From my understanding, the basic idea is that ceph exchange rdma
information(qp,gid and so) through ip address on rdma device, and then
communicate with each other throng rdma. But in my tests,  there
seemed to be some issues in that codes.

On Fri, Jan 3, 2020 at 2:24 AM Nathan Stratton  wrote:
>
> I am working on upgrading my current ethernet only ceph cluster to a combined 
> ethernet frontend and infiniband backend. From my research I understand that 
> I set:
>
> ms_cluster_type = async+rdma
> ms_async_rdma_device_name = mlx4_0
>
> What I don't understand is how does ceph know how to reach each OSD over 
> RDMA? Do I have to run IPoIB on top of infiniband and use that for OSD 
> addresses?
>
> Is there a way to use infiniband on backend without IPoIB and just use rdma 
> verbs?
>
> ><>
> nathan stratton
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd du command

2020-01-06 Thread M Ranga Swami Reddy
Hello,
I ran the "rbd du /image" command. Its shows increasing, when I add
data to the image. That looks good. But when I removed data from the image,
its not showing the decreasing the size.

Is this expected with "rbd du" or its not implemented?

NOTE: Expected behavior is the same as " Linux du command"

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Architecture - Recommendations

2020-01-06 Thread Stefan Kooman
Quoting Radhakrishnan2 S (radhakrishnan...@tcs.com):
> Where hypervisor would be your Ceph nodes. I.e. you can connect your
> Ceph nodes on L2 or make them part of the L3 setup (more modern way of
> doing it). You can use "ECMP" to add more network capacity when you need
> it. Setting up a BGP EVPN VXLAN network is not trivial ... I advise on
> getting networking expertise in your team.
> 
> Radha: Thanks for the reference. We are planning to have a dedicated
> set of nodes, for our ceph cluster and not make it hyperconverged. Do
> you see that as a recommended option ? Since we might also have
> baremetal servers for workloads, we want to make the storage as a
> separate dedicated one. 

I would definately recommend that, especially for larger deployments.
Although this might look old fashioned as the rest of the world is
changing everything in containers. It makes (performance) debugging *a
lot* easier as you can actually isolate things. Something which is way
more difficult to achieve in servers where you have a complex workload
going on ...

I guess (no proof of that) that performance will be more consistent as
well.

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com