>Aside from this, I get daily emails about webrequest partition statuses, and I would at least notice the morning after that something is wrong. Right, but in the case of Friday that would mean perhaps having to backfill a bunch of data up to Saturday morning, whereas if we have alarms we can detect the issue right away and kill jobs as needed.
On Mon, Mar 9, 2015 at 8:55 AM, Andrew Otto <ao...@wikimedia.org> wrote: > Should have icinga alarms arround these types of issues? Seems like that > would be the way to go. > > Aside from this, I get daily emails about webrequest partition statuses, > and I would at least notice the morning after that something is wrong. > > > > On Mar 7, 2015, at 21:20, Nuria Ruiz <nu...@wikimedia.org> wrote: > > Thanks much Christian for the writeup. > > Should have icinga alarms arround these types of issues? Seems like that > would be the way to go. > > Thanks, > > Nuria > > On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto <ao...@wikimedia.org> wrote: > >> Thanks Christian! >> >> >> > On Mar 7, 2015, at 09:14, Christian Aistleitner < >> christ...@quelltextlich.at> wrote: >> > >> > Hi, >> > >> > around running jobs on the Analytics cluster, I've sometime seen >> > people say in IRC: “Let's run this heavy job. I'll keep an eye on it”. >> > >> > But more often than not, this seems to have meant: >> > “Let's just run this heavy job and wait. If QChris joins IRC, let's >> > hope he doesn't ping us about having overloaded the cluster.” >> > >> > That's not nice^Wscalable ;-) >> > >> > So just in case someone is vague on how to “keep an eye on it”, I did >> > a short write-up at: >> > >> > https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load >> > >> > which details on detecting how the cluster is doing on a very high >> > level. >> > Especially, it allows you to detect if the cluster got stalled, and if >> > it did, it tells you what to do. >> > >> > Have fun, >> > Christian >> > >> > P.S.: The above URL has diagrams! Click the URL! >> > >> > -- >> > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- >> > Companies' registry: 360296y in Linz >> > Christian Aistleitner >> > Kefermarkterstrasze 6a/3 Email: christ...@quelltextlich.at >> > 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 >> > Fax: +43 7946 / 20 5 81 >> > Homepage: http://quelltextlich.at/ >> > --------------------------------------------------------------- >> > _______________________________________________ >> > Analytics mailing list >> > Analytics@lists.wikimedia.org >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics