After talking with Dario and Leila we decided that we will sample the page-impression event at 1:1000. We would, however, like to retain the widget-impression event unsampled if possible. That event happens approximately 50% as often as page-impression. So we're probably talking about somewhere around 60 events per second in that case. Would that be acceptable or should we sample the widget-impression event as well?
Kaldari On Wed, Jan 7, 2015 at 5:33 PM, Leila Zia <le...@wikimedia.org> wrote: > Thanks, Nuria! > > On Wed, Jan 7, 2015 at 5:30 PM, Ryan Kaldari <rkald...@wikimedia.org> > wrote: > >> Thanks everyone for the research on this! I'll go ahead and create a card >> for implementing sampling on the high-throughput WikiGrok events. >> >> Kaldari >> >> On Wed, Jan 7, 2015 at 5:20 PM, Nuria Ruiz <nu...@wikimedia.org> wrote: >> >>> Sorry, I send it too soon, trying again: >>> >>> >We're talking about a total of ~170 events per sec for these pages. >>> This is to high to log in 1:1 rate, we would need to do 1:10. At this >>> time most events on EL logging log at a much lower rate, events over 1 per >>> sec are the following, as you can see mobile & media viewer are the >>> majority of the throughput. >>> >>> My preference would be to be less than 400 events per sec until we have >>> done some perf testing to make sure we can handle it (we might be able to >>> as we have done many improvements since we set these thresholds) >>> >>> MobileWebClickTracking 41.35% (114.15/sec) >>> MediaViewer 21.66% (59.78/sec) >>> MobileWikiAppToCInteraction 12.44% (34.35/sec) >>> PageContentSaveComplete 3.39% (9.35/sec) >>> EchoInteraction 2.69% (7.42/sec) >>> NavigationTiming 2.51% (6.93/sec) >>> MultimediaViewerNetworkPerformance 1.84% (5.07/sec) >>> SaveTiming 1.58% (4.37/sec) >>> Edit 1.39% (3.83/sec) >>> PersonalBar 1.24% (3.43/sec) >>> TimingData 0.83% (2.28/sec) >>> MobileWebUIClickTracking 0.73% (2.02/sec) >>> Popups 0.68% (1.87/sec) >>> MobileWikiAppOnboarding 0.62% (1.70/sec) >>> MultimediaViewerDimensions 0.61% (1.68/sec) >>> UniversalLanguageSelector 0.50% (1.37/sec) >>> PageCreation 0.50% (1.37/sec) >>> MultimediaViewerDuration 0.47% (1.30/sec) >>> MobileWebEditing 0.45% (1.25/sec) >>> MobileWikiAppSearch 0.41% (1.13/sec) >>> CentralAuth 0.40% (1.12/sec) >>> >>> On Wed, Jan 7, 2015 at 5:12 PM, Nuria Ruiz <nu...@wikimedia.org> wrote: >>> >>>> >We're talking about a total of ~170 events per sec for these pages. >>>> This is to high to log in 1:1 rate, we would need to do 1:10. >>>> >>>> On Wed, Jan 7, 2015 at 4:10 PM, Leila Zia <le...@wikimedia.org> wrote: >>>> >>>>> Thanks everyone for chiming in. Your comments were very helpful. :-) >>>>> >>>>> Nuria, I checked the per second pageview count for the pages wikigrok >>>>> will be live on for 3 hours in 2015-01-07 (as a sample). We're talking >>>>> about a total of ~170 events per sec for these pages. Of course major >>>>> events can affect this number. This number added to the current 270 events >>>>> per sec you mentioned will send us over the 350 events per sec limit (if >>>>> it's a hard limit). What do you think? >>>>> >>>>> Leila >>>>> >>>>> >>>>> >>>>> On Wed, Jan 7, 2015 at 10:13 AM, Nuria Ruiz <nu...@wikimedia.org> >>>>> wrote: >>>>> >>>>>> >Given that information, do you have any idea if we are in danger of >>>>>> overloading EventLogging? >>>>>> Logging broad events (such a page load) 1 to 1 might incur into >>>>>> problems as our traffic is high enough that events logged1/1000 happen >>>>>> still in very large amounts. >>>>>> >>>>>> Some numbers (oversimplyfying and rounding) >>>>>> >>>>>> We have about 200 million visits per day for the enwiki mobile site . >>>>>> This means about 2300 pageviews per sec, if we are sending 1 load event >>>>>> per >>>>>> pageview EL will (sadly) die, most likely. >>>>>> >>>>>> If we assume EL handles up to 350 events per second (and now we are >>>>>> at 270 events per sec) I would think that sending 10 events per sec on >>>>>> your >>>>>> case would be pretty safe. That would be sampling about 1/200 for a load >>>>>> event per every pageview. This seems like a good upper bound. >>>>>> >>>>>> Now, since there are no constrains as to how long you keep your >>>>>> experiment running you can try a lower sampling ratio, say, 1/1000 and >>>>>> keep >>>>>> the experiment running for longer. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 6, 2015 at 5:50 PM, Ryan Kaldari <rkald...@wikimedia.org> >>>>>> wrote: >>>>>> >>>>>>> The highest volume events we are going to log will be: >>>>>>> 1. For each of the 166,000 articles, one event when the page loads >>>>>>> 2. For each of the 166,000 articles, one event when the WikiGrok >>>>>>> widget enters the viewport (about half as often as #1) >>>>>>> >>>>>>> These will be active for all mobile users, logged in and logged out, >>>>>>> including many high pageview articles. >>>>>>> >>>>>>> Given that information, do you have any idea if we are in danger of >>>>>>> overloading EventLogging? If so, do you have recommendations on >>>>>>> sampling? >>>>>>> So far, everyone has said not to worry about it, but it would be good to >>>>>>> get a sanity check for this test specifically. >>>>>>> >>>>>>> Kaldari >>>>>>> >>>>>>> On Tue, Jan 6, 2015 at 4:57 PM, Nuria Ruiz <nu...@wikimedia.org> >>>>>>> wrote: >>>>>>> >>>>>>>> (cc-ing mobile-tech) >>>>>>>> >>>>>>>> Since we do not the details of how wikigrok is used and its >>>>>>>> throughput of requests we can not "estimate" sampling ourselves. I >>>>>>>> imagine >>>>>>>> wikigrok is been deployed to a number of users and it is with that >>>>>>>> usage >>>>>>>> the mobile team could estimate the total throughput expected, with this >>>>>>>> throughput we can recommend sampling ratios. >>>>>>>> >>>>>>>> >>>>>>>> Thanks for asking about this without before deploying! >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 6, 2015 at 4:55 PM, Ryan Kaldari < >>>>>>>> rkald...@wikimedia.org> wrote: >>>>>>>> >>>>>>>>> I can elaborate on this after I finished the SWAT deployment.... >>>>>>>>> Gimme 30 minutes or so. >>>>>>>>> >>>>>>>>> On Tue, Jan 6, 2015 at 4:51 PM, Leila Zia <le...@wikimedia.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> The mobile team is planning to switch WikiGrok on for >>>>>>>>>> non-logged in users next week (2014-01-12). The widget will be on on >>>>>>>>>> 166,029 article pages in enwiki. There are two EventLogging schema >>>>>>>>>> that may >>>>>>>>>> collect data heavily and we want to make sure EL can handle the >>>>>>>>>> influx of >>>>>>>>>> data. >>>>>>>>>> >>>>>>>>>> The two schema collecting data are: >>>>>>>>>> https://meta.wikimedia.org/wiki/Schema:MobileWebWikiGrok >>>>>>>>>> https://meta.wikimedia.org/wiki/Schema:MobileWebWikiGrokError >>>>>>>>>> and the list of pages affected is in: >>>>>>>>>> wgq_page in enwiki.wikigrok_questions. >>>>>>>>>> >>>>>>>>>> It would be great if someone from the dev side let us know >>>>>>>>>> whether we will need sampling. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Leila >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Analytics mailing list >>>>>>>>> Analytics@lists.wikimedia.org >>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Analytics mailing list >>>>>>>> Analytics@lists.wikimedia.org >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Analytics mailing list >>>>>>> Analytics@lists.wikimedia.org >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Analytics mailing list >>>>>> Analytics@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> Analytics@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics