Thanks Dario, et al.

A +1  from me -- this will make integration a lot easier. Let's see if we
can address this in the Q3 project about dashboarding.

-Toby

On Thu, Dec 11, 2014 at 4:11 PM, Dario Taraborelli <
dtarabore...@wikimedia.org> wrote:
>
> I am kicking off this thread after a good conversation with Nuria and
> Kaldari on pain points and opportunities we have around *data QA for
> EventLogging*.
>
> Kaldari, Leila and I have gone through several rounds of data QA before
> and after the deployment of new features on Mobile and we haven’t found yet
> a good solution to catch data quality issues early enough in the deployment
> cycle. Data quality issues with EventLogging typically fall under one of
> these 5 scenarios:
>
> 1) events are logged and schema-compliant but don’t capture data correctly
> (for example: a wrong value is logged; event counts that should match don’t)
> 2) events are logged but are not schema-compliant (e.g.: a required field
> is missing)
> 3) events are missing due to issues with the instrumentation (e.g.: a UI
> element is not instrumented)
> 4) events are missing due to client issues (a specific UI element is not
> correctly rendered on a given browser/platform and as a result the event is
> not fired)
> 5) events are missing due to EventLogging outages
>
> In the early days, Ori and I floated the idea of unit tests for
> instrumentation to capture constraint violations that are not easily
> detected via manual testing or the existing client-side validation, but
> this never happened. When it comes to feature deployments, beta labs is a
> great starting point for running manual data QA in an environment that is
> as close as possible to prod. However, there are types of data quality
> issues that we only discover when collecting data at scale and in the wild
> (on browsers/platforms that we don’t necessarily test for internally).
>
> Having a full-fledged set of unit tests for data would be terrific, but in
> the short term I’d like to find a better way to at least *identify events
> that fail validation as early as possible*.
>
> - the SQL log database has real-time data but only for event that pass
> client-side validation
> - the JSON logfiles on stat1003 include invalid events, but the data is
> only rsync’ed from vanadium once a day
>
> is there a way to inspect invalid events in near real time without having
> access to vanadium? For example, could we create either a dedicated
> database to write invalid events only or a logfile for validation errors
> rsync’ed to stat1003 more frequently than once a day?
>
> Thoughts?
>
> Dario
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to