Re: [ceph-users] Check networking first?

Josef Johansson Sat, 01 Aug 2015 14:16:11 -0700

Hi,

I did a "big-ping" test to verify the network after last major network
problem. If anyone wants to take a peek I could share.


Cheers
Josef

lör 1 aug 2015 02:19 Ben Hines <bhi...@gmail.com> skrev:

> I encountered a similar problem. Incoming firewall ports were blocked
> on one host. So the other OSDs kept marking that OSD as down. But, it
> could talk out, so it kept saying 'hey, i'm up, mark me up' so then
> the other OSDs started trying to send it data again, causing backed up
> requests.. Which goes on, ad infinitum. I had to figure out the
> connectivity problem myself by looking in the OSD logs.
>
> After a while, the cluster should just say 'no, you're not reachable,
> stop putting yourself back into the cluster'.
>
> -Ben
>
> On Fri, Jul 31, 2015 at 11:21 AM, Jan Schermer <j...@schermer.cz> wrote:
> > I remember reading that ScaleIO (I think?) does something like this by
> regularly sending reports to a multicast group, thus any node with issues
> (or just overload) is reweighted or avoided automatically on the client.
> OSD map is the Ceph equivalent I guess. It makes sense to gather metrics
> and prioritize better performing OSDs over those with e.g. worse latencies,
> but it needs to update fast. But I believe that _network_ monitoring itself
> ought to be part of… a network monitoring system you should already have
> :-) and not a storage system that just happens to use network. I don’t
> remember seeing anything but a simple ping/traceroute/dns test in any SAN
> interface. If an OSD has issues it might be anything from a failing drive
> to a swapping OS and a number like “commit latency” (= response time
> average from the clients’ perspective) is maybe the ultimate metric of all
> for this purpose, irrespective of the root cause.
> >
> > Nice option would be to read data from all replicas at once - this would
> of course increase load and cause all sorts of issues if abused, but if you
> have an app that absolutely-always-without-fail-must-get-data-ASAP then you
> could enable this in the client (and I think that would be an easy option
> to add). This is actually used in some systems. Harder part is to fail
> nicely when writing (like waiting only for the remote network buffers on 2
> nodes to get the data instead of waiting for commit on all 3 replicas…)
> >
> > Jan
> >
> >> On 31 Jul 2015, at 19:45, Robert LeBlanc <rob...@leblancnet.us> wrote:
> >>
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA256
> >>
> >> Even just a ping at max MTU set with nodefrag could tell a lot about
> >> connectivity issues and latency without a lot of traffic. Using Ceph
> >> messenger would be even better to check firewall ports. I like the
> >> idea of incorporating simple network checks into Ceph. The monitor can
> >> correlate failures and help determine if the problem is related to one
> >> host from the CRUSH map.
> >> - ----------------
> >> Robert LeBlanc
> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>
> >>
> >> On Thu, Jul 30, 2015 at 11:27 PM, Stijn De Weirdt  wrote:
> >>> wouldn't it be nice that ceph does something like this in background
> (some
> >>> sort of network-scrub). debugging network like this is not that easy
> (can't
> >>> expect admins to install e.g. perfsonar on all nodes and/or clients)
> >>>
> >>> something like: every X min, each service X pick a service Y on
> another host
> >>> (assuming X and Y will exchange some communication at some point; like
> osd
> >>> with other osd), send 1MB of data, and make the timing data available
> so we
> >>> can monitor it and detect underperforming links over time.
> >>>
> >>> ideally clients also do this, but not sure where they should
> report/store
> >>> the data.
> >>>
> >>> interpreting the data can be a bit tricky, but extreme outliers will be
> >>> spotted easily, and the main issue with this sort of debugging is
> collecting
> >>> the data.
> >>>
> >>> simply reporting / keeping track of ongoing communications is already
> a big
> >>> step forward, but then we need to have the size of the exchanged data
> to
> >>> allow interpretation (and the timing should be about the network part,
> not
> >>> e.g. flush data to disk in case of an osd). (and obviously sampling is
> >>> enough, no need to have details of every bit send).
> >>>
> >>>
> >>>
> >>> stijn
> >>>
> >>>
> >>> On 07/30/2015 08:04 PM, Mark Nelson wrote:
> >>>>
> >>>> Thanks for posting this!  We see issues like this more often than
> you'd
> >>>> think.  It's really important too because if you don't figure it out
> the
> >>>> natural inclination is to blame Ceph! :)
> >>>>
> >>>> Mark
> >>>>
> >>>> On 07/30/2015 12:50 PM, Quentin Hartman wrote:
> >>>>>
> >>>>> Just wanted to drop a note to the group that I had my cluster go
> >>>>> sideways yesterday, and the root of the problem was networking again.
> >>>>> Using iperf I discovered that one of my nodes was only moving data at
> >>>>> 1.7Mb / s. Moving that node to a different switch port with a
> different
> >>>>> cable has resolved the problem. It took awhile to track down because
> >>>>> none of the server-side error metrics for disk or network showed
> >>>>> anything was amiss, and I didn't think to test network performance
> (as
> >>>>> suggested in another thread) until well into the process.
> >>>>>
> >>>>> Check networking first!
> >>>>>
> >>>>> QH
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@lists.ceph.com
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: Mailvelope v0.13.1
> >> Comment: https://www.mailvelope.com
> >>
> >> wsFcBAEBCAAQBQJVu7QoCRDmVDuy+mK58QAAcpAQAKbv6xPRxMMJ8NWrXym0
> >> NAtZFIYywvStKfTG2pL1xjb2p/xDM+6Z5mnYJTBHb+0dkGIO6qe0jF9t4XEE
> >> ppH+55eIpkCZrKMdfN1L0vUe9ldFnJS2jsAlGkvzyRLJale++q1evymIAaWb
> >> JnEZgV3pGrPTCRaVKNrT3NaGZVDLm6ygnsT6PYJaiXM8Av3equ00Uls2/i6v
> >> vZhlIBz5TbKsNag/W7cRJVvjj7YDsgU+dplDl62mmDJ6o+cWvILlf9WPINdV
> >> MrmIeg+7fqUEp8nuEzTMm+BDHQ3c/5cxrYr8bksiVoBTXV7m9fO0Je9Exn6N
> >> iWTa5eDUBtR6Ha8WaVUib/cvFj6j94QRNWYmXHl9lG50p+XZ0L5bZ1G8v9Nb
> >> gGxRoYgAncp9M1J+7Pvm5z8wZgxXAs/veUtrf+6SkUbGyCRnUSn/VS7C8syJ
> >> 4WW2aWP/A0nxSDe1u+TGpkkPmhk7UDrJEfMQaZrFwS9FkFLfgLH7PxMcAZjJ
> >> hlN129vldPh3QxLviLidlJmzUTvKtb+XrSkA0MjhFMJS2M79DR16j+XWe7Ub
> >> wPnKpZcZ8WsQzOlTHtDEHQvhE3ilcm+4oALSiuqEAZKNKk8lUTtvfzJ2BKyu
> >> Tv46c+Wf3LbwrdMnkGiMHLuIlqhQT2FzauM2Pi+Pt7QJ7L9xXfWW4vzdemxj
> >> bBQD
> >> =rPC0
> >> -----END PGP SIGNATURE-----
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Check networking first?

Reply via email to