We are also happy to add cached entry points for high-traffic end points in the REST API. I commented to that effect at https://phabricator.wikimedia.org/T124216#1984206. Let us know if you think this would be useful for this use case.
On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso <ab...@wikimedia.org> wrote: > Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think if > we're doing near term experimentation with a controlled A/B test the Android > app is the only logical place to start. Dmitry, can that work for you? It's > not required, but I think it would be neat to see if we can move the needle > even more. Of course your quarterly goals take top priority...but what do > you think? > > On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso <ab...@wikimedia.org> wrote: >> >> Hey all, am planning to look at Phabricator tasks and provide a reply >> during the upcoming weekdays. Just wanted to acknowledge I saw your replies! >> >> >> On Friday, January 22, 2016, Erik Bernhardson <ebernhard...@wikimedia.org> >> wrote: >>> >>> On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez >>> <jhernan...@wikimedia.org> wrote: >>>> >>>> Regarding the caching, we would need to agree between apps and web about >>>> the url and smaxage parameter as Adam noted so that the urls are exactly >>>> the >>>> same to not bloat varnish and reuse the same cached objects across >>>> platforms. >>>> >>>> It is an extremely adhoc and brittle solution but seems like it would be >>>> the greatest win. >>>> >>>> 20% of the traffic from searches by being only in android and web beta >>>> seems a lot to me, and we should work on reducing it, otherwise when it >>>> hits >>>> web stable we're going to crush the servers, so caching seems the highest >>>> priority. >>>> >>> To clarify its 20% of the load, as opposed to 20% of the traffic. But >>> same difference :) >>> >>>> >>>> Let's chime in https://phabricator.wikimedia.org/T124216 and continue >>>> the cache discussion there. >>>> >>>> Regarding the validity of results with opening text only, how should we >>>> proceed? Adam? >>>> >>> I've put together https://phabricator.wikimedia.org/T124258 to track >>> putting together an AB test that measures the difference in click through >>> rates for the two approaches. >>> >>> >>>> >>>> On Wed, Jan 20, 2016 at 9:34 PM, David Causse <dcau...@wikimedia.org> >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Yes we can combine many factors, from templates (quality but also >>>>> disambiguation/stubs), size and others. >>>>> Today cirrus uses mostly the number of incoming links which (imho) is >>>>> not very good for morelike. >>>>> On enwiki results will also be scored according the weights defined in >>>>> https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates. >>>>> >>>>> I wrote a small bash to compare results : >>>>> https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad >>>>> Here is some random results from the list (Semetimes better, sometimes >>>>> worse) : >>>>> >>>>> $ sh morelike.sh Revolution_Muslim >>>>> Defaults >>>>> "title": "Chess", >>>>> "title": "Suicide attack", >>>>> "title": "Zachary Adam Chesser", >>>>> ======= >>>>> Opening text no boost links >>>>> "title": "Hungarian Revolution of 1956", >>>>> "title": "Muslims for America", >>>>> "title": "Salafist Front", >>>>> >>>>> $ sh morelike.sh Chesser >>>>> Defaults >>>>> "title": "Chess", >>>>> "title": "Edinburgh", >>>>> "title": "Edinburgh Corn Exchange", >>>>> ======= >>>>> Opening text no boost links >>>>> "title": "Dreghorn Barracks", >>>>> "title": "Edinburgh Chess Club", >>>>> "title": "Threipmuir Reservoir", >>>>> >>>>> $ sh morelike.sh Time_%28disambiguation%29 >>>>> Defaults >>>>> "title": "Atlantis: The Lost Empire", >>>>> "title": "Stargate", >>>>> "title": "Stargate SG-1", >>>>> ======= >>>>> Opening text no boost links >>>>> "title": "Father Time (disambiguation)", >>>>> "title": "The Last Time", >>>>> "title": "Time After Time", >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Le 20/01/2016 19:34, Jon Robson a écrit : >>>>>> >>>>>> I'm actually interested to see whether this yields better results in >>>>>> certain examples where the algorithm is lacking [1]. If it's done as >>>>>> an A/B test we could even measure things such as click throughs in the >>>>>> related article feature (whether they go up or not) >>>>>> >>>>>> Out of interest is it also possible to take article size and type into >>>>>> account and not returning any morelike results for things like >>>>>> disambiguation pages and stubs? >>>>>> >>>>>> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >>>>>> >>>>>> >>>>>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso <ab...@wikimedia.org> >>>>>> wrote: >>>>>>> >>>>>>> One thing we could do regarding the quality of the output is check >>>>>>> results >>>>>>> against a random sample of popular articles (example approach to find >>>>>>> some >>>>>>> articles) on mdot Wikipedia. Presuming that improves the quality of >>>>>>> the >>>>>>> recommendations or at least does not degrade them, we should consider >>>>>>> adding >>>>>>> the enhancement task to a future sprint, with further instrumentation >>>>>>> and >>>>>>> A/B testing / timeboxed beta test, etc. >>>>>>> >>>>>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good >>>>>>> fix for >>>>>>> now for further reduction of client perceived wait, at least for >>>>>>> non-cold >>>>>>> cache requests, even if we stop beating up the backend. Does anyone >>>>>>> know of >>>>>>> a compelling reason to not do that for the time being? The main thing >>>>>>> that >>>>>>> comes to mind as always is growing the Varnish cache object pool - >>>>>>> probably >>>>>>> not a huge deal while the thing is only in beta, but on the stable >>>>>>> channel >>>>>>> maybe noteworthy because it would run on probably most pages (but >>>>>>> that's >>>>>>> what edge caches are for, after all). >>>>>>> >>>>>>> Erik, from your perspective does use of smaxage relieve the backend >>>>>>> sufficiently? >>>>>>> >>>>>>> If we do smaxage, then Web, Android, iOS should standardize their >>>>>>> URLs so we >>>>>>> get more cache hits at the edge across all clients. Here's the URL I >>>>>>> see >>>>>>> being used on the web today from mobile web beta: >>>>>>> >>>>>>> >>>>>>> https://en.m.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages%7Cpageterms&piprop=thumbnail&pithumbsize=80&wbptterms=description&pilimit=3&generator=search&gsrsearch=morelike%3ACome_Share_My_Love&gsrnamespace=0&gsrlimit=3 >>>>>>> >>>>>>> >>>>>>> -Adam >>>>>>> >>>>>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>>>>>> <jhernan...@wikimedia.org> wrote: >>>>>>>> >>>>>>>> I'd be up to it if we manage to cram it up in a following sprint and >>>>>>>> it is >>>>>>>> worth it. >>>>>>>> >>>>>>>> We could run a controlled test against production with a long batch >>>>>>>> of >>>>>>>> articles and check median/percentiles response time with repeated >>>>>>>> runs and >>>>>>>> highlight the different results for human inspection regarding >>>>>>>> quality. >>>>>>>> >>>>>>>> It's been noted previously that the results are far from ideal >>>>>>>> (which they >>>>>>>> are because it is just morelike), and I think it would be a great >>>>>>>> idea to >>>>>>>> change the endpoint to a specific one that is smarter and has some >>>>>>>> cache (we >>>>>>>> could do much more to get relevant results besides text similarity, >>>>>>>> take >>>>>>>> into account links, or see also links if there are, etc...). >>>>>>>> >>>>>>>> As a note, in mobile web the related articles extension allows >>>>>>>> editors to >>>>>>>> specify articles to show in the section, which would avoid queries >>>>>>>> to >>>>>>>> cirrussearch if it was more used (once rolled into stable I guess). >>>>>>>> >>>>>>>> I remember that the performance related task was closed as resolved >>>>>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>>>>>>> reopen it or >>>>>>>> create a new one? >>>>>>>> >>>>>>>> I'm not sure if we ended up adding the smaxage parameter (I think we >>>>>>>> didn't), should we? To me it seems a no-brainer that we should be >>>>>>>> caching >>>>>>>> this results in varnish since they don't need to be completely up to >>>>>>>> date >>>>>>>> for this use case. >>>>>>>> >>>>>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>>>>>>> <ebernhard...@wikimedia.org> wrote: >>>>>>>>> >>>>>>>>> Both mobile apps and web are using CirrusSearch's morelike: feature >>>>>>>>> which >>>>>>>>> is showing some performance issues on our end. We would like to >>>>>>>>> make a >>>>>>>>> performance optimization to it, but before we would prefer to run >>>>>>>>> an A/B >>>>>>>>> test to see if the results are still "about as good" as they are >>>>>>>>> currently. >>>>>>>>> >>>>>>>>> The optimization is basically: Currently more like this takes the >>>>>>>>> entire >>>>>>>>> article into account, we would like to change this to take only the >>>>>>>>> opening >>>>>>>>> text of an article into account. This should reduce the amount of >>>>>>>>> work we >>>>>>>>> have to do on the backend saving both server load and latency the >>>>>>>>> user sees >>>>>>>>> running the query. >>>>>>>>> >>>>>>>>> This can be triggered by adding these two query parameters to the >>>>>>>>> search >>>>>>>>> api request that is being performed: >>>>>>>>> >>>>>>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>>>>>>> >>>>>>>>> >>>>>>>>> The API will give a warning that these parameters do not exist, but >>>>>>>>> they >>>>>>>>> are safe to ignore. Would any of you be willing to run this test? >>>>>>>>> We would >>>>>>>>> basically want to look at user perceived latency along with click >>>>>>>>> through >>>>>>>>> rates for the current default setup along with the restricted setup >>>>>>>>> using >>>>>>>>> only opening_text. >>>>>>>>> >>>>>>>>> Erik B. >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Mobile-l mailing list >>>>>>>>> Mobile-l@lists.wikimedia.org >>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mobile-l mailing list >>>>>>> Mobile-l@lists.wikimedia.org >>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Mobile-l mailing list >>>>> Mobile-l@lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> >>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>> > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > -- Gabriel Wicke Principal Engineer, Wikimedia Foundation _______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l