Heads up in case you query Event Logging tables.
-- Forwarded message --
From: *Marcel Ruiz Forns*
Date: Monday, November 30, 2015
Subject: [Analytics] EventLogging outage in progress?
To: "A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics."
Team, I checked and, indeed, EventLogging database needs backfilling from
2015-11-27 01:00 until 2015-11-27 07:00. I updated the docs and started the
backfilling process. I'll let you know when it it finished.
Cheers
On Fri, Nov 27, 2015 at 8:31 PM, Oliver Keyes > wrote:
> It seems like it would depend on the class of error. 48 hours for
> events not syncing, fine. 48 hours of /total data loss/ is a
> completely different class of problem.
>
> On 27 November 2015 at 11:35, Nuria Ruiz > wrote:
> >>Unfortunately, the only team-members working full-time yesterday and
> today
> >> are we Europe folks.
> >>We weren't there when that happened and we don't get those alerts on the
> >> phone, we should though.
> > Given that this system is tier-2 i do not think we need an immediate
> > response, 24 hours should be an acceptable ETA. I would say even 48.
> >
> > On Fri, Nov 27, 2015 at 2:31 AM, Marcel Ruiz Forns >
> > wrote:
> >>
> >> Thanks, Ori, for having a look at this and restarting EL.
> >>
> >> I understand it was 01:30 UTC on Friday (today), not Thursday. It went
> on
> >> during 5-6 hours.
> >> Unfortunately, the only team-members working full-time yesterday and
> today
> >> are we Europe folks.
> >> We weren't there when that happened and we don't get those alerts on the
> >> phone, we should though.
> >>
> >> This problem happened already like a month ago. We'll backfill the
> missing
> >> events and will investigate.
> >> Thanks again for the heads-up.
> >>
> >> On Fri, Nov 27, 2015 at 8:01 AM, Ori Livneh > wrote:
> >>>
> >>> On Thu, Nov 26, 2015 at 10:46 PM, Ori Livneh > wrote:
>
> Seems that eventlog1001 has not received any events since 01:30 UTC on
> Thursday
>
>
>
> http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Miscellaneous+eqiad&h=eventlog1001.eqiad.wmnet&jr=&js=&event=hide&ts=0&v=140128.28&m=bytes_in&vl=bytes%2Fsec&ti=Bytes+Received
>
> This is pretty severe; I'd page if it wasn't a US holiday.
> >>>
> >>>
> >>> Kafka clients on eventlog1001 were in a "Autocommitting consumer
> offset"
> >>> death-loop and not receiving any events from the Kafka brokers. I ran
> >>> eventloggingctl stop / eventloggingctl start and they recovered. Needs
> to be
> >>> investigated more thoroughly. Otto, can you follow up?
> >>>
> >>>
> >>> ___
> >>> Analytics mailing list
> >>> analyt...@lists.wikimedia.org
>
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>
> >>
> >>
> >> --
> >> Marcel Ruiz Forns
> >> Analytics Developer
> >> Wikimedia Foundation
> >>
> >> ___
> >> Analytics mailing list
> >> analyt...@lists.wikimedia.org
>
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> > ___
> > Analytics mailing list
> > analyt...@lists.wikimedia.org
>
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> ___
> Analytics mailing list
> analyt...@lists.wikimedia.org
>
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
___
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l