Thanks Dario, et al. A +1 from me -- this will make integration a lot easier. Let's see if we can address this in the Q3 project about dashboarding.
-Toby On Thu, Dec 11, 2014 at 4:11 PM, Dario Taraborelli < dtarabore...@wikimedia.org> wrote: > > I am kicking off this thread after a good conversation with Nuria and > Kaldari on pain points and opportunities we have around *data QA for > EventLogging*. > > Kaldari, Leila and I have gone through several rounds of data QA before > and after the deployment of new features on Mobile and we haven’t found yet > a good solution to catch data quality issues early enough in the deployment > cycle. Data quality issues with EventLogging typically fall under one of > these 5 scenarios: > > 1) events are logged and schema-compliant but don’t capture data correctly > (for example: a wrong value is logged; event counts that should match don’t) > 2) events are logged but are not schema-compliant (e.g.: a required field > is missing) > 3) events are missing due to issues with the instrumentation (e.g.: a UI > element is not instrumented) > 4) events are missing due to client issues (a specific UI element is not > correctly rendered on a given browser/platform and as a result the event is > not fired) > 5) events are missing due to EventLogging outages > > In the early days, Ori and I floated the idea of unit tests for > instrumentation to capture constraint violations that are not easily > detected via manual testing or the existing client-side validation, but > this never happened. When it comes to feature deployments, beta labs is a > great starting point for running manual data QA in an environment that is > as close as possible to prod. However, there are types of data quality > issues that we only discover when collecting data at scale and in the wild > (on browsers/platforms that we don’t necessarily test for internally). > > Having a full-fledged set of unit tests for data would be terrific, but in > the short term I’d like to find a better way to at least *identify events > that fail validation as early as possible*. > > - the SQL log database has real-time data but only for event that pass > client-side validation > - the JSON logfiles on stat1003 include invalid events, but the data is > only rsync’ed from vanadium once a day > > is there a way to inspect invalid events in near real time without having > access to vanadium? For example, could we create either a dedicated > database to write invalid events only or a logfile for validation errors > rsync’ed to stat1003 more frequently than once a day? > > Thoughts? > > Dario > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics