On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez < jhernan...@wikimedia.org> wrote:
> Regarding the caching, we would need to agree between apps and web about > the url and smaxage parameter as Adam noted so that the urls are *exactly* the > same to not bloat varnish and reuse the same cached objects across > platforms. > > It is an extremely adhoc and brittle solution but seems like it would be > the greatest win. > > 20% of the traffic from searches by being only in android and web beta > seems a lot to me, and we should work on reducing it, otherwise when it > hits web stable we're going to crush the servers, so caching seems the > highest priority. > > To clarify its 20% of the load, as opposed to 20% of the traffic. But same difference :) > Let's chime in https://phabricator.wikimedia.org/T124216 and continue the > cache discussion there. > > Regarding the validity of results with opening text only, how should we > proceed? Adam? > > I've put together https://phabricator.wikimedia.org/T124258 to track putting together an AB test that measures the difference in click through rates for the two approaches. > On Wed, Jan 20, 2016 at 9:34 PM, David Causse <dcau...@wikimedia.org> > wrote: > >> Hi, >> >> Yes we can combine many factors, from templates (quality but also >> disambiguation/stubs), size and others. >> Today cirrus uses mostly the number of incoming links which (imho) is not >> very good for morelike. >> On enwiki results will also be scored according the weights defined in >> https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates. >> >> I wrote a small bash to compare results : >> https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad >> Here is some random results from the list (Semetimes better, sometimes >> worse) : >> >> $ sh morelike.sh Revolution_Muslim >> Defaults >> "title": "Chess", >> "title": "Suicide attack", >> "title": "Zachary Adam Chesser", >> ======= >> Opening text no boost links >> "title": "Hungarian Revolution of 1956", >> "title": "Muslims for America", >> "title": "Salafist Front", >> >> $ sh morelike.sh Chesser >> Defaults >> "title": "Chess", >> "title": "Edinburgh", >> "title": "Edinburgh Corn Exchange", >> ======= >> Opening text no boost links >> "title": "Dreghorn Barracks", >> "title": "Edinburgh Chess Club", >> "title": "Threipmuir Reservoir", >> >> $ sh morelike.sh Time_%28disambiguation%29 >> Defaults >> "title": "Atlantis: The Lost Empire", >> "title": "Stargate", >> "title": "Stargate SG-1", >> ======= >> Opening text no boost links >> "title": "Father Time (disambiguation)", >> "title": "The Last Time", >> "title": "Time After Time", >> >> >> >> >> >> Le 20/01/2016 19:34, Jon Robson a écrit : >> >>> I'm actually interested to see whether this yields better results in >>> certain examples where the algorithm is lacking [1]. If it's done as >>> an A/B test we could even measure things such as click throughs in the >>> related article feature (whether they go up or not) >>> >>> Out of interest is it also possible to take article size and type into >>> account and not returning any morelike results for things like >>> disambiguation pages and stubs? >>> >>> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >>> >>> >>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso <ab...@wikimedia.org> wrote: >>> >>>> One thing we could do regarding the quality of the output is check >>>> results >>>> against a random sample of popular articles (example approach to find >>>> some >>>> articles) on mdot Wikipedia. Presuming that improves the quality of the >>>> recommendations or at least does not degrade them, we should consider >>>> adding >>>> the enhancement task to a future sprint, with further instrumentation >>>> and >>>> A/B testing / timeboxed beta test, etc. >>>> >>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a good fix >>>> for >>>> now for further reduction of client perceived wait, at least for >>>> non-cold >>>> cache requests, even if we stop beating up the backend. Does anyone >>>> know of >>>> a compelling reason to not do that for the time being? The main thing >>>> that >>>> comes to mind as always is growing the Varnish cache object pool - >>>> probably >>>> not a huge deal while the thing is only in beta, but on the stable >>>> channel >>>> maybe noteworthy because it would run on probably most pages (but that's >>>> what edge caches are for, after all). >>>> >>>> Erik, from your perspective does use of smaxage relieve the backend >>>> sufficiently? >>>> >>>> If we do smaxage, then Web, Android, iOS should standardize their URLs >>>> so we >>>> get more cache hits at the edge across all clients. Here's the URL I see >>>> being used on the web today from mobile web beta: >>>> >>>> >>>> https://en.m.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages%7Cpageterms&piprop=thumbnail&pithumbsize=80&wbptterms=description&pilimit=3&generator=search&gsrsearch=morelike%3ACome_Share_My_Love&gsrnamespace=0&gsrlimit=3 >>>> >>>> >>>> -Adam >>>> >>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>>> <jhernan...@wikimedia.org> wrote: >>>> >>>>> I'd be up to it if we manage to cram it up in a following sprint and >>>>> it is >>>>> worth it. >>>>> >>>>> We could run a controlled test against production with a long batch of >>>>> articles and check median/percentiles response time with repeated runs >>>>> and >>>>> highlight the different results for human inspection regarding quality. >>>>> >>>>> It's been noted previously that the results are far from ideal (which >>>>> they >>>>> are because it is just morelike), and I think it would be a great idea >>>>> to >>>>> change the endpoint to a specific one that is smarter and has some >>>>> cache (we >>>>> could do much more to get relevant results besides text similarity, >>>>> take >>>>> into account links, or see also links if there are, etc...). >>>>> >>>>> As a note, in mobile web the related articles extension allows editors >>>>> to >>>>> specify articles to show in the section, which would avoid queries to >>>>> cirrussearch if it was more used (once rolled into stable I guess). >>>>> >>>>> I remember that the performance related task was closed as resolved >>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we reopen >>>>> it or >>>>> create a new one? >>>>> >>>>> I'm not sure if we ended up adding the smaxage parameter (I think we >>>>> didn't), should we? To me it seems a no-brainer that we should be >>>>> caching >>>>> this results in varnish since they don't need to be completely up to >>>>> date >>>>> for this use case. >>>>> >>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>>>> <ebernhard...@wikimedia.org> wrote: >>>>> >>>>>> Both mobile apps and web are using CirrusSearch's morelike: feature >>>>>> which >>>>>> is showing some performance issues on our end. We would like to make a >>>>>> performance optimization to it, but before we would prefer to run an >>>>>> A/B >>>>>> test to see if the results are still "about as good" as they are >>>>>> currently. >>>>>> >>>>>> The optimization is basically: Currently more like this takes the >>>>>> entire >>>>>> article into account, we would like to change this to take only the >>>>>> opening >>>>>> text of an article into account. This should reduce the amount of >>>>>> work we >>>>>> have to do on the backend saving both server load and latency the >>>>>> user sees >>>>>> running the query. >>>>>> >>>>>> This can be triggered by adding these two query parameters to the >>>>>> search >>>>>> api request that is being performed: >>>>>> >>>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>>>>> >>>>>> >>>>>> The API will give a warning that these parameters do not exist, but >>>>>> they >>>>>> are safe to ignore. Would any of you be willing to run this test? We >>>>>> would >>>>>> basically want to look at user perceived latency along with click >>>>>> through >>>>>> rates for the current default setup along with the restricted setup >>>>>> using >>>>>> only opening_text. >>>>>> >>>>>> Erik B. >>>>>> >>>>>> _______________________________________________ >>>>>> Mobile-l mailing list >>>>>> Mobile-l@lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>>>> >>>>>> >>>> _______________________________________________ >>>> Mobile-l mailing list >>>> Mobile-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>>> >>>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >> >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
_______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l