>Aside from this, I get daily emails about webrequest partition statuses,
and I would at least notice the morning after that something is wrong.
Right, but in the case of Friday that would mean perhaps having to backfill
a bunch of data up to Saturday morning, whereas if we have alarms we can
detect the issue right away and kill jobs as needed.

On Mon, Mar 9, 2015 at 8:55 AM, Andrew Otto <ao...@wikimedia.org> wrote:

> Should have icinga alarms arround these types of issues?  Seems like that
> would be the way to go.
>
> Aside from this, I get daily emails about webrequest partition statuses,
> and I would at least notice the morning after that something is wrong.
>
>
>
> On Mar 7, 2015, at 21:20, Nuria Ruiz <nu...@wikimedia.org> wrote:
>
> Thanks much Christian for the writeup.
>
> Should have icinga alarms arround these types of issues?  Seems like that
> would be the way to go.
>
> Thanks,
>
> Nuria
>
> On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto <ao...@wikimedia.org> wrote:
>
>> Thanks Christian!
>>
>>
>> > On Mar 7, 2015, at 09:14, Christian Aistleitner <
>> christ...@quelltextlich.at> wrote:
>> >
>> > Hi,
>> >
>> > around running jobs on the Analytics cluster, I've sometime seen
>> > people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
>> >
>> > But more often than not, this seems to have meant:
>> > “Let's just run this heavy job and wait. If QChris joins IRC, let's
>> > hope he doesn't ping us about having overloaded the cluster.”
>> >
>> > That's not nice^Wscalable ;-)
>> >
>> > So just in case someone is vague on how to “keep an eye on it”, I did
>> > a short write-up at:
>> >
>> >  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
>> >
>> > which details on detecting how the cluster is doing on a very high
>> > level.
>> > Especially, it allows you to detect if the cluster got stalled, and if
>> > it did, it tells you what to do.
>> >
>> > Have fun,
>> > Christian
>> >
>> > P.S.: The above URL has diagrams! Click the URL!
>> >
>> > --
>> > ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
>> >                           Companies' registry: 360296y in Linz
>> > Christian Aistleitner
>> > Kefermarkterstrasze 6a/3     Email:  christ...@quelltextlich.at
>> > 4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
>> >                             Fax:            +43 7946 / 20 5 81
>> >                             Homepage: http://quelltextlich.at/
>> > ---------------------------------------------------------------
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to