Hi, I did a "big-ping" test to verify the network after last major network problem. If anyone wants to take a peek I could share.
Cheers Josef lör 1 aug 2015 02:19 Ben Hines <bhi...@gmail.com> skrev: > I encountered a similar problem. Incoming firewall ports were blocked > on one host. So the other OSDs kept marking that OSD as down. But, it > could talk out, so it kept saying 'hey, i'm up, mark me up' so then > the other OSDs started trying to send it data again, causing backed up > requests.. Which goes on, ad infinitum. I had to figure out the > connectivity problem myself by looking in the OSD logs. > > After a while, the cluster should just say 'no, you're not reachable, > stop putting yourself back into the cluster'. > > -Ben > > On Fri, Jul 31, 2015 at 11:21 AM, Jan Schermer <j...@schermer.cz> wrote: > > I remember reading that ScaleIO (I think?) does something like this by > regularly sending reports to a multicast group, thus any node with issues > (or just overload) is reweighted or avoided automatically on the client. > OSD map is the Ceph equivalent I guess. It makes sense to gather metrics > and prioritize better performing OSDs over those with e.g. worse latencies, > but it needs to update fast. But I believe that _network_ monitoring itself > ought to be part of… a network monitoring system you should already have > :-) and not a storage system that just happens to use network. I don’t > remember seeing anything but a simple ping/traceroute/dns test in any SAN > interface. If an OSD has issues it might be anything from a failing drive > to a swapping OS and a number like “commit latency” (= response time > average from the clients’ perspective) is maybe the ultimate metric of all > for this purpose, irrespective of the root cause. > > > > Nice option would be to read data from all replicas at once - this would > of course increase load and cause all sorts of issues if abused, but if you > have an app that absolutely-always-without-fail-must-get-data-ASAP then you > could enable this in the client (and I think that would be an easy option > to add). This is actually used in some systems. Harder part is to fail > nicely when writing (like waiting only for the remote network buffers on 2 > nodes to get the data instead of waiting for commit on all 3 replicas…) > > > > Jan > > > >> On 31 Jul 2015, at 19:45, Robert LeBlanc <rob...@leblancnet.us> wrote: > >> > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA256 > >> > >> Even just a ping at max MTU set with nodefrag could tell a lot about > >> connectivity issues and latency without a lot of traffic. Using Ceph > >> messenger would be even better to check firewall ports. I like the > >> idea of incorporating simple network checks into Ceph. The monitor can > >> correlate failures and help determine if the problem is related to one > >> host from the CRUSH map. > >> - ---------------- > >> Robert LeBlanc > >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > >> > >> > >> On Thu, Jul 30, 2015 at 11:27 PM, Stijn De Weirdt wrote: > >>> wouldn't it be nice that ceph does something like this in background > (some > >>> sort of network-scrub). debugging network like this is not that easy > (can't > >>> expect admins to install e.g. perfsonar on all nodes and/or clients) > >>> > >>> something like: every X min, each service X pick a service Y on > another host > >>> (assuming X and Y will exchange some communication at some point; like > osd > >>> with other osd), send 1MB of data, and make the timing data available > so we > >>> can monitor it and detect underperforming links over time. > >>> > >>> ideally clients also do this, but not sure where they should > report/store > >>> the data. > >>> > >>> interpreting the data can be a bit tricky, but extreme outliers will be > >>> spotted easily, and the main issue with this sort of debugging is > collecting > >>> the data. > >>> > >>> simply reporting / keeping track of ongoing communications is already > a big > >>> step forward, but then we need to have the size of the exchanged data > to > >>> allow interpretation (and the timing should be about the network part, > not > >>> e.g. flush data to disk in case of an osd). (and obviously sampling is > >>> enough, no need to have details of every bit send). > >>> > >>> > >>> > >>> stijn > >>> > >>> > >>> On 07/30/2015 08:04 PM, Mark Nelson wrote: > >>>> > >>>> Thanks for posting this! We see issues like this more often than > you'd > >>>> think. It's really important too because if you don't figure it out > the > >>>> natural inclination is to blame Ceph! :) > >>>> > >>>> Mark > >>>> > >>>> On 07/30/2015 12:50 PM, Quentin Hartman wrote: > >>>>> > >>>>> Just wanted to drop a note to the group that I had my cluster go > >>>>> sideways yesterday, and the root of the problem was networking again. > >>>>> Using iperf I discovered that one of my nodes was only moving data at > >>>>> 1.7Mb / s. Moving that node to a different switch port with a > different > >>>>> cable has resolved the problem. It took awhile to track down because > >>>>> none of the server-side error metrics for disk or network showed > >>>>> anything was amiss, and I didn't think to test network performance > (as > >>>>> suggested in another thread) until well into the process. > >>>>> > >>>>> Check networking first! > >>>>> > >>>>> QH > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> ceph-users@lists.ceph.com > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@lists.ceph.com > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> -----BEGIN PGP SIGNATURE----- > >> Version: Mailvelope v0.13.1 > >> Comment: https://www.mailvelope.com > >> > >> wsFcBAEBCAAQBQJVu7QoCRDmVDuy+mK58QAAcpAQAKbv6xPRxMMJ8NWrXym0 > >> NAtZFIYywvStKfTG2pL1xjb2p/xDM+6Z5mnYJTBHb+0dkGIO6qe0jF9t4XEE > >> ppH+55eIpkCZrKMdfN1L0vUe9ldFnJS2jsAlGkvzyRLJale++q1evymIAaWb > >> JnEZgV3pGrPTCRaVKNrT3NaGZVDLm6ygnsT6PYJaiXM8Av3equ00Uls2/i6v > >> vZhlIBz5TbKsNag/W7cRJVvjj7YDsgU+dplDl62mmDJ6o+cWvILlf9WPINdV > >> MrmIeg+7fqUEp8nuEzTMm+BDHQ3c/5cxrYr8bksiVoBTXV7m9fO0Je9Exn6N > >> iWTa5eDUBtR6Ha8WaVUib/cvFj6j94QRNWYmXHl9lG50p+XZ0L5bZ1G8v9Nb > >> gGxRoYgAncp9M1J+7Pvm5z8wZgxXAs/veUtrf+6SkUbGyCRnUSn/VS7C8syJ > >> 4WW2aWP/A0nxSDe1u+TGpkkPmhk7UDrJEfMQaZrFwS9FkFLfgLH7PxMcAZjJ > >> hlN129vldPh3QxLviLidlJmzUTvKtb+XrSkA0MjhFMJS2M79DR16j+XWe7Ub > >> wPnKpZcZ8WsQzOlTHtDEHQvhE3ilcm+4oALSiuqEAZKNKk8lUTtvfzJ2BKyu > >> Tv46c+Wf3LbwrdMnkGiMHLuIlqhQT2FzauM2Pi+Pt7QJ7L9xXfWW4vzdemxj > >> bBQD > >> =rPC0 > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com