On Tue, Aug 11, 2015 at 12:29 PM, Jon Katz <jk...@wikimedia.org> wrote:

> However, it seems that >90% of the clicks are coming from the article
> table (or adding search created bloat) and
> MobileWebUIClickTracking_10742159 is now approaching 300gb.  Mostly this is
> due to search. I would encourage further sampling, but that would mean that
> beta data would be lost.  Perhaps we can split it into separate beta/stable
> tables and then sample stable? Any other ideas?
>

Add a samplingRatio field to the schema, add a PHP global to control
sampling ratio, set it via operations/mediawiki-config appropriately for
each site, in the SQL query used for the dashboards replace count(*) with
sum(event_samplingRatio). We did that for MediaViewer and it worked great.

Also if your main concern is table size (for us it was mainly server load),
you can just run a script periodically to replace the user agent and the
URL with an empty string. Those probably take up most of the storage space,
every other field is fairly short.
_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l

Reply via email to