Re: [Analytics] Maybe Analytics project in Phabricator
On Fri, 2015-04-17 at 18:15 -0700, Grace Gellerman wrote: The project is intended for Analytics customers to alert Analytics of work in their products that they think might intersect with ours. It's a way of giving Analytics an early heads-up so that Analytics can either say,Thanks for the early warning! or Thanks, but this does not touch Analytics. We can remind participants at Scrum-of-Scrums that they can use this project. Isn't that pretty much what https://phabricator.wikimedia.org/tag/blocked-on-analytics/ is for? Both projects should receive urgent triage anyway (and hence a decision whether a task is actually Analytics territory or not), but I see zero folks listed under Watchers [1] on either project pages? So for now, please do not archive it. Thanks! I would like to archive that project soon, given my comment above. Furthermore, that project has been entirely unused (maybe because nobody has ever heard of that project...). If I imagined every project to have a corresponding maybe-project, we'd just create unneeded abstraction layers. Newly created tasks should receive triage. One triage steps is defining if the task is associated to the right project(s). No maybe needed. Cheers, andre [1] https://www.mediawiki.org/wiki/Phabricator/Help#Receiving_updates_and_notifications -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/ ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Re: [Analytics] [Technical] WMF-Last-Access
+1 'last' ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Re: [Analytics] [Technical] WMF-Last-Access
+1 for ISO dates. They're also more parsable by researchers. On 27 April 2015 at 18:57, Dario Taraborelli dtarabore...@wikimedia.org wrote: I also noticed the cookie stores a string with a 3-letter month (27-Apr-2015), any reason not to use a shorter ISO date instead (2015-04-27)? On Apr 27, 2015, at 3:00 PM, Marcel Ruiz Forns mfo...@wikimedia.org wrote: +1 'last' ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Re: [Analytics] [Technical] WMF-Last-Access
Gonna stop this ISO date fancy bandwagon right here :) We could do it with a bunch of VCL code but that affects performance of the site and we'd rather take the hit in analytics. We could look into making a UDF that deals with this and other common date code we'd want to DRY. On Mon, Apr 27, 2015 at 4:02 PM, Oliver Keyes oke...@wikimedia.org wrote: +1 for ISO dates. They're also more parsable by researchers. On 27 April 2015 at 18:57, Dario Taraborelli dtarabore...@wikimedia.org wrote: I also noticed the cookie stores a string with a 3-letter month (27-Apr-2015), any reason not to use a shorter ISO date instead (2015-04-27)? On Apr 27, 2015, at 3:00 PM, Marcel Ruiz Forns mfo...@wikimedia.org wrote: +1 'last' ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- Oliver Keyes Research Analyst Wikimedia Foundation ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Re: [Analytics] [Ops] udp2log shutdown (for analytics instances) next week
Ok thanks for the heads up! On Mon, 27 Apr 2015, Andrew Otto wrote: Hi again! Today I turned of most udp2log webrequest filters. For now, I have left the Fundraising filters, as well as the 5xx and sampled-1000 filters running. All of these filters are now running on erbium. oxygen's udp2log instance has been shut off. Instead of constantly updating this thread, I will track this here: https://phabricator.wikimedia.org/T97294 Thanks! On Tue, Apr 21, 2015 at 3:49 PM, Andrew Otto ao...@wikimedia.org wrote: Hi all! Now that all data that is generated by udp2log is also being generated by the Analytics Cluster, we are finally ready to turn off analytics udp2log instances. I will start with the ones that are used to generate the logs on stat1002 at /a/squid/archive. The (identical) cluster generated logs can be found on stat1002 at /a/log/webrequest/archive. I will paste the contents of the README file in /a/squid/archive describing the differences at the bottom of this email. If you use any of the logs in /a/squid/archive for regular statistics, you will need to switch your code to use files in /a/log/webrequest/archive instead. I plan to start turning off udp2log instances on Monday April 27th (that’s next week!). From the README: [@stat1002:/a/squid/archive] $ cat README.migrate-to-hive.2015-02-17 *** * * * This directory will run stale once udp2log will get turned off. * * Please use the corresponding TSVs from /a/log/webrequest/archive/ * * instead. * * * *** The TSV files in this directory underneath /a/squid/archive get generated by udp2log and suffer from * Sub-par data quality (E.g.: udp2log had an inherent loss). * Lack of a way to backfill/fix data. * Some files consuming https requests twice, which made filtering necessary. * Consfusing naming scheme, where each file covered 24 hours, but not midnight to midnight, but ~06:30 previous day to ~06:30 current day. The new TSVs at /a/log/webrequest/archive/ contain the same information but get generated by Hive, and address the above four issues: * By using Hive's webrequest table as input, the inherent loss is gone. Also statistics on the hour's data quality are available. * Hive data allows to backfill/fix data. * Only data from the varnishes gets picked up. So https traffic no longer gets duplicated. * The files now cover 24 hours from midnight to midnight. No more stitching/cutting is needed to get the logs for a given day. Please migrate to using the Hive-generated TSVs from /a/log/webrequest/archive/ Thanks! I’ll keep you updated as this happens. -Andrew Otto ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Re: [Analytics] Maybe Analytics project in Phabricator
+1 to Dan On Monday, April 27, 2015, Dan Andreescu dandree...@wikimedia.org wrote: Sounds to me like the nuance we were trying to go for is causing confusion. This is unintended and my opinion is that we should remove maybe-analytics and just tell everyone to use blocked-on-analytics as liberally as they wish. On Mon, Apr 27, 2015 at 1:45 AM, Andre Klapper aklap...@wikimedia.org wrote: On Fri, 2015-04-17 at 18:15 -0700, Grace Gellerman wrote: The project is intended for Analytics customers to alert Analytics of work in their products that they think might intersect with ours. It's a way of giving Analytics an early heads-up so that Analytics can either say,Thanks for the early warning! or Thanks, but this does not touch Analytics. We can remind participants at Scrum-of-Scrums that they can use this project. Isn't that pretty much what https://phabricator.wikimedia.org/tag/blocked-on-analytics/ is for? Both projects should receive urgent triage anyway (and hence a decision whether a task is actually Analytics territory or not), but I see zero folks listed under Watchers [1] on either project pages? So for now, please do not archive it. Thanks! I would like to archive that project soon, given my comment above. Furthermore, that project has been entirely unused (maybe because nobody has ever heard of that project...). If I imagined every project to have a corresponding maybe-project, we'd just create unneeded abstraction layers. Newly created tasks should receive triage. One triage steps is defining if the task is associated to the right project(s). No maybe needed. Cheers, andre [1] https://www.mediawiki.org/wiki/Phabricator/Help#Receiving_updates_and_notifications -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/ ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Re: [Analytics] [Technical] WMF-Last-Access
I also noticed the cookie stores a string with a 3-letter month (27-Apr-2015), any reason not to use a shorter ISO date instead (2015-04-27)? On Apr 27, 2015, at 3:00 PM, Marcel Ruiz Forns mfo...@wikimedia.org wrote: +1 'last' ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics