Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-10 Thread Andrew Otto
I just want to make sure it can be found.  I see you added it to the ToC at 
https://wikitech.wikimedia.org/wiki/Analytics/Cluster, so I think it’ll be fine.


> On Mar 9, 2015, at 18:51, Christian Aistleitner  
> wrote:
> 
> Hi Andrew,
> 
> On Mon, Mar 09, 2015 at 11:54:56AM -0400, Andrew Otto wrote:
>>> https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
>> Christian, may I move this page into the Cluster/Hadoop/Administration page?
>  
> I think a separate page is worth it as the target audience is
> different from the Cluster/Hadoop/Administration page.
> 
> But sure. Be Bold. Move wherever you seem fit. :-)
> 
> Have fun,
> Christian
> 
> 
> 
> -- 
>  quelltextlich e.U.  \\  Christian Aistleitner 
>   Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
> Fax:+43 7946 / 20 5 81
> Homepage: http://quelltextlich.at/
> ---
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-09 Thread Christian Aistleitner
Hi Andrew,

On Mon, Mar 09, 2015 at 11:54:56AM -0400, Andrew Otto wrote:
> > https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
> Christian, may I move this page into the Cluster/Hadoop/Administration page?

I think a separate page is worth it as the target audience is
different from the Cluster/Hadoop/Administration page.

But sure. Be Bold. Move wherever you seem fit. :-)

Have fun,
Christian



-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
 Fax:+43 7946 / 20 5 81
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-09 Thread Nuria Ruiz
>Aside from this, I get daily emails about webrequest partition statuses,
and I would at least notice the morning after that something is wrong.
Right, but in the case of Friday that would mean perhaps having to backfill
a bunch of data up to Saturday morning, whereas if we have alarms we can
detect the issue right away and kill jobs as needed.

On Mon, Mar 9, 2015 at 8:55 AM, Andrew Otto  wrote:

> Should have icinga alarms arround these types of issues?  Seems like that
> would be the way to go.
>
> Aside from this, I get daily emails about webrequest partition statuses,
> and I would at least notice the morning after that something is wrong.
>
>
>
> On Mar 7, 2015, at 21:20, Nuria Ruiz  wrote:
>
> Thanks much Christian for the writeup.
>
> Should have icinga alarms arround these types of issues?  Seems like that
> would be the way to go.
>
> Thanks,
>
> Nuria
>
> On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto  wrote:
>
>> Thanks Christian!
>>
>>
>> > On Mar 7, 2015, at 09:14, Christian Aistleitner <
>> christ...@quelltextlich.at> wrote:
>> >
>> > Hi,
>> >
>> > around running jobs on the Analytics cluster, I've sometime seen
>> > people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
>> >
>> > But more often than not, this seems to have meant:
>> > “Let's just run this heavy job and wait. If QChris joins IRC, let's
>> > hope he doesn't ping us about having overloaded the cluster.”
>> >
>> > That's not nice^Wscalable ;-)
>> >
>> > So just in case someone is vague on how to “keep an eye on it”, I did
>> > a short write-up at:
>> >
>> >  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
>> >
>> > which details on detecting how the cluster is doing on a very high
>> > level.
>> > Especially, it allows you to detect if the cluster got stalled, and if
>> > it did, it tells you what to do.
>> >
>> > Have fun,
>> > Christian
>> >
>> > P.S.: The above URL has diagrams! Click the URL!
>> >
>> > --
>> >  quelltextlich e.U.  \\  Christian Aistleitner 
>> >   Companies' registry: 360296y in Linz
>> > Christian Aistleitner
>> > Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
>> > 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>> > Fax:+43 7946 / 20 5 81
>> > Homepage: http://quelltextlich.at/
>> > ---
>> > ___
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-09 Thread Andrew Otto
> Should have icinga alarms arround these types of issues?  Seems like that 
> would be the way to go. 
Aside from this, I get daily emails about webrequest partition statuses, and I 
would at least notice the morning after that something is wrong. 



> On Mar 7, 2015, at 21:20, Nuria Ruiz  wrote:
> 
> Thanks much Christian for the writeup.
> 
> Should have icinga alarms arround these types of issues?  Seems like that 
> would be the way to go. 
> 
> Thanks, 
> 
> Nuria
> 
> On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto  > wrote:
> Thanks Christian!
> 
> 
> > On Mar 7, 2015, at 09:14, Christian Aistleitner  > > wrote:
> >
> > Hi,
> >
> > around running jobs on the Analytics cluster, I've sometime seen
> > people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
> >
> > But more often than not, this seems to have meant:
> > “Let's just run this heavy job and wait. If QChris joins IRC, let's
> > hope he doesn't ping us about having overloaded the cluster.”
> >
> > That's not nice^Wscalable ;-)
> >
> > So just in case someone is vague on how to “keep an eye on it”, I did
> > a short write-up at:
> >
> >  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load 
> > 
> >
> > which details on detecting how the cluster is doing on a very high
> > level.
> > Especially, it allows you to detect if the cluster got stalled, and if
> > it did, it tells you what to do.
> >
> > Have fun,
> > Christian
> >
> > P.S.: The above URL has diagrams! Click the URL!
> >
> > --
> >  quelltextlich e.U.  \\  Christian Aistleitner 
> >   Companies' registry: 360296y in Linz
> > Christian Aistleitner
> > Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at 
> > 
> > 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81 
> > 
> > Fax:+43 7946 / 20 5 81 
> > 
> > Homepage: http://quelltextlich.at/ 
> > 
> > ---
> > ___
> > Analytics mailing list
> > Analytics@lists.wikimedia.org 
> > https://lists.wikimedia.org/mailman/listinfo/analytics 
> > 
> 
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/analytics 
> 
> 
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-09 Thread Andrew Otto
> https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
Christian, may I move this page into the Cluster/Hadoop/Administration page?

> Should have icinga alarms arround these types of issues?  Seems like that 
> would be the way to go. 
We used to have icinga alarms based on webrequest data existence in HDFS.  They 
were very flaky due to the way we had to implement them.  Hmm, I  suppose we 
could try to use graphite anomaly detection to alarm on the graph that Qchris 
mentions.





> On Mar 9, 2015, at 09:36, Christian Aistleitner  
> wrote:
> 
> Hi Pine,
> 
> On Sat, Mar 07, 2015 at 08:15:18PM -0800, Pine W wrote:
>> Chris, may I quote your email on BASH?
> 
> They take emails too?
> 
> Regardless ... feel free to quote or forward any of my emails wherever
> you seem fit.
> 
> Have fun,
> Christian
> 
> 
> 
> -- 
>  quelltextlich e.U.  \\  Christian Aistleitner 
>   Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
> Fax:+43 7946 / 20 5 81
> Homepage: http://quelltextlich.at/
> ---
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics


___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-09 Thread Christian Aistleitner
Hi Pine,

On Sat, Mar 07, 2015 at 08:15:18PM -0800, Pine W wrote:
> Chris, may I quote your email on BASH?

They take emails too?

Regardless ... feel free to quote or forward any of my emails wherever
you seem fit.

Have fun,
Christian



-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
 Fax:+43 7946 / 20 5 81
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-09 Thread Joseph Allemandou
Thanks a lot Christian :)
I had not meant by any mean last Friday to overload the cluster ... I did
it nonetheless.
Your page on how to 'keep an eye on it' will really be useful!
Cheers
Joseph


On Sun, Mar 8, 2015 at 8:26 PM, Leila Zia  wrote:

> This is really useful, Christian. Thanks for explaining and documenting it.
>
> Leila
>
> On Sat, Mar 7, 2015 at 6:14 AM, Christian Aistleitner <
> christ...@quelltextlich.at> wrote:
>
>> Hi,
>>
>> around running jobs on the Analytics cluster, I've sometime seen
>> people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
>>
>> But more often than not, this seems to have meant:
>> “Let's just run this heavy job and wait. If QChris joins IRC, let's
>> hope he doesn't ping us about having overloaded the cluster.”
>>
>> That's not nice^Wscalable ;-)
>>
>> So just in case someone is vague on how to “keep an eye on it”, I did
>> a short write-up at:
>>
>>   https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
>>
>> which details on detecting how the cluster is doing on a very high
>> level.
>> Especially, it allows you to detect if the cluster got stalled, and if
>> it did, it tells you what to do.
>>
>> Have fun,
>> Christian
>>
>> P.S.: The above URL has diagrams! Click the URL!
>>
>> --
>>  quelltextlich e.U.  \\  Christian Aistleitner 
>>Companies' registry: 360296y in Linz
>> Christian Aistleitner
>> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
>> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>>  Fax:+43 7946 / 20 5 81
>>  Homepage: http://quelltextlich.at/
>> ---
>>
>> ___
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-08 Thread Leila Zia
This is really useful, Christian. Thanks for explaining and documenting it.

Leila

On Sat, Mar 7, 2015 at 6:14 AM, Christian Aistleitner <
christ...@quelltextlich.at> wrote:

> Hi,
>
> around running jobs on the Analytics cluster, I've sometime seen
> people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
>
> But more often than not, this seems to have meant:
> “Let's just run this heavy job and wait. If QChris joins IRC, let's
> hope he doesn't ping us about having overloaded the cluster.”
>
> That's not nice^Wscalable ;-)
>
> So just in case someone is vague on how to “keep an eye on it”, I did
> a short write-up at:
>
>   https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
>
> which details on detecting how the cluster is doing on a very high
> level.
> Especially, it allows you to detect if the cluster got stalled, and if
> it did, it tells you what to do.
>
> Have fun,
> Christian
>
> P.S.: The above URL has diagrams! Click the URL!
>
> --
>  quelltextlich e.U.  \\  Christian Aistleitner 
>Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>  Fax:+43 7946 / 20 5 81
>  Homepage: http://quelltextlich.at/
> ---
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-07 Thread Pine W
Chris, may I quote your email on BASH?

Pine
On Mar 7, 2015 6:14 AM, "Christian Aistleitner" 
wrote:

> Hi,
>
> around running jobs on the Analytics cluster, I've sometime seen
> people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
>
> But more often than not, this seems to have meant:
> “Let's just run this heavy job and wait. If QChris joins IRC, let's
> hope he doesn't ping us about having overloaded the cluster.”
>
> That's not nice^Wscalable ;-)
>
> So just in case someone is vague on how to “keep an eye on it”, I did
> a short write-up at:
>
>   https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
>
> which details on detecting how the cluster is doing on a very high
> level.
> Especially, it allows you to detect if the cluster got stalled, and if
> it did, it tells you what to do.
>
> Have fun,
> Christian
>
> P.S.: The above URL has diagrams! Click the URL!
>
> --
>  quelltextlich e.U.  \\  Christian Aistleitner 
>Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
>  Fax:+43 7946 / 20 5 81
>  Homepage: http://quelltextlich.at/
> ---
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-07 Thread Nuria Ruiz
Thanks much Christian for the writeup.

Should have icinga alarms arround these types of issues?  Seems like that
would be the way to go.

Thanks,

Nuria

On Sat, Mar 7, 2015 at 4:00 PM, Andrew Otto  wrote:

> Thanks Christian!
>
>
> > On Mar 7, 2015, at 09:14, Christian Aistleitner <
> christ...@quelltextlich.at> wrote:
> >
> > Hi,
> >
> > around running jobs on the Analytics cluster, I've sometime seen
> > people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
> >
> > But more often than not, this seems to have meant:
> > “Let's just run this heavy job and wait. If QChris joins IRC, let's
> > hope he doesn't ping us about having overloaded the cluster.”
> >
> > That's not nice^Wscalable ;-)
> >
> > So just in case someone is vague on how to “keep an eye on it”, I did
> > a short write-up at:
> >
> >  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
> >
> > which details on detecting how the cluster is doing on a very high
> > level.
> > Especially, it allows you to detect if the cluster got stalled, and if
> > it did, it tells you what to do.
> >
> > Have fun,
> > Christian
> >
> > P.S.: The above URL has diagrams! Click the URL!
> >
> > --
> >  quelltextlich e.U.  \\  Christian Aistleitner 
> >   Companies' registry: 360296y in Linz
> > Christian Aistleitner
> > Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
> > 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
> > Fax:+43 7946 / 20 5 81
> > Homepage: http://quelltextlich.at/
> > ---
> > ___
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
>
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-07 Thread Andrew Otto
Thanks Christian!


> On Mar 7, 2015, at 09:14, Christian Aistleitner  
> wrote:
> 
> Hi,
> 
> around running jobs on the Analytics cluster, I've sometime seen
> people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.
> 
> But more often than not, this seems to have meant:
> “Let's just run this heavy job and wait. If QChris joins IRC, let's
> hope he doesn't ping us about having overloaded the cluster.”
> 
> That's not nice^Wscalable ;-)
> 
> So just in case someone is vague on how to “keep an eye on it”, I did
> a short write-up at:
> 
>  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load
> 
> which details on detecting how the cluster is doing on a very high
> level.
> Especially, it allows you to detect if the cluster got stalled, and if
> it did, it tells you what to do.
> 
> Have fun,
> Christian
> 
> P.S.: The above URL has diagrams! Click the URL!
> 
> -- 
>  quelltextlich e.U.  \\  Christian Aistleitner 
>   Companies' registry: 360296y in Linz
> Christian Aistleitner
> Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
> 4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
> Fax:+43 7946 / 20 5 81
> Homepage: http://quelltextlich.at/
> ---
> ___
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-07 Thread Federico Leva (Nemo)

Christian Aistleitner, 07/03/2015 15:14:

P.S.: The above URL has diagrams! Click the URL!


And with colours! So it's like checking heartbeats, cute. :)

Nemo

___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics


[Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-07 Thread Christian Aistleitner
Hi,

around running jobs on the Analytics cluster, I've sometime seen
people say in IRC: “Let's run this heavy job. I'll keep an eye on it”.

But more often than not, this seems to have meant:
“Let's just run this heavy job and wait. If QChris joins IRC, let's
hope he doesn't ping us about having overloaded the cluster.”

That's not nice^Wscalable ;-)

So just in case someone is vague on how to “keep an eye on it”, I did
a short write-up at:

  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Load

which details on detecting how the cluster is doing on a very high
level.
Especially, it allows you to detect if the cluster got stalled, and if
it did, it tells you what to do.

Have fun,
Christian

P.S.: The above URL has diagrams! Click the URL!

-- 
 quelltextlich e.U.  \\  Christian Aistleitner 
   Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email:  christ...@quelltextlich.at
4293 Gutau, Austria  Phone:  +43 7946 / 20 5 81
 Fax:+43 7946 / 20 5 81
 Homepage: http://quelltextlich.at/
---


signature.asc
Description: Digital signature
___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics