> Should have icinga alarms arround these types of issues? Seems like that > would be the way to go. Aside from this, I get daily emails about webrequest partition statuses, and I would at least notice the morning after that something is wrong.
> On Mar 7, 2015, at 21:20, Nuria Ruiz <nu...@wikimedia.org> wrote: > > Thanks much Christian for the writeup. > > Should have icinga alarms arround these types of issues? Seems like that > would be the way to go. > > Thanks, > > Nuria > > On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto <ao...@wikimedia.org > <mailto:ao...@wikimedia.org>> wrote: > Thanks Christian! > > > > On Mar 7, 2015, at 09:14, Christian Aistleitner <christ...@quelltextlich.at > > <mailto:christ...@quelltextlich.at>> wrote: > > > > Hi, > > > > around running jobs on the Analytics cluster, I've sometime seen > > people say in IRC: “Let's run this heavy job. I'll keep an eye on it”. > > > > But more often than not, this seems to have meant: > > “Let's just run this heavy job and wait. If QChris joins IRC, let's > > hope he doesn't ping us about having overloaded the cluster.” > > > > That's not nice^Wscalable ;-) > > > > So just in case someone is vague on how to “keep an eye on it”, I did > > a short write-up at: > > > > https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load > > <https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load> > > > > which details on detecting how the cluster is doing on a very high > > level. > > Especially, it allows you to detect if the cluster got stalled, and if > > it did, it tells you what to do. > > > > Have fun, > > Christian > > > > P.S.: The above URL has diagrams! Click the URL! > > > > -- > > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- > > Companies' registry: 360296y in Linz > > Christian Aistleitner > > Kefermarkterstrasze 6a/3 Email: christ...@quelltextlich.at > > <mailto:christ...@quelltextlich.at> > > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 > > <tel:%2B43%207946%20%2F%2020%205%2081> > > Fax: +43 7946 / 20 5 81 > > <tel:%2B43%207946%20%2F%2020%205%2081> > > Homepage: http://quelltextlich.at/ > > <http://quelltextlich.at/> > > --------------------------------------------------------------- > > _______________________________________________ > > Analytics mailing list > > Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> > > https://lists.wikimedia.org/mailman/listinfo/analytics > > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/analytics > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics