Hi, Can someone on this list point me to where the more-like code sits? Or better, yet would be someone documenting the rules that govern prioritization of suggestions.
I would like to document the logic for our communities so that we can have an open discussion about what variables and weighting we should use to suggest articles. -J On Mon, Feb 15, 2016 at 11:26 AM, Dmitry Brant <dbr...@wikimedia.org> wrote: > Just a quick note that our latest production release (just published) > contains this A/B test, in addition to the other updates. > Looking forward to seeing the numbers from this! > > -Dmitry > > > On Sun, Jan 31, 2016 at 9:35 PM, Dmitry Brant <dbr...@wikimedia.org> > wrote: > >> Roger that! I think we could squeeze it in -- the change would be pretty >> straightforward. We'll be able to release a Beta with this A/B test in >> short order, but it will probably be a couple weeks until our next >> production release. I hope that's all right. >> >> >> On Sat, Jan 30, 2016 at 1:02 PM, Gabriel Wicke <gwi...@wikimedia.org> >> wrote: >> >>> We are also happy to add cached entry points for high-traffic end >>> points in the REST API. I commented to that effect at >>> https://phabricator.wikimedia.org/T124216#1984206. Let us know if you >>> think this would be useful for this use case. >>> >>> On Sat, Jan 30, 2016 at 8:11 AM, Adam Baso <ab...@wikimedia.org> wrote: >>> > Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I >>> think if >>> > we're doing near term experimentation with a controlled A/B test the >>> Android >>> > app is the only logical place to start. Dmitry, can that work for you? >>> It's >>> > not required, but I think it would be neat to see if we can move the >>> needle >>> > even more. Of course your quarterly goals take top priority...but what >>> do >>> > you think? >>> > >>> > On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso <ab...@wikimedia.org> >>> wrote: >>> >> >>> >> Hey all, am planning to look at Phabricator tasks and provide a reply >>> >> during the upcoming weekdays. Just wanted to acknowledge I saw your >>> replies! >>> >> >>> >> >>> >> On Friday, January 22, 2016, Erik Bernhardson < >>> ebernhard...@wikimedia.org> >>> >> wrote: >>> >>> >>> >>> On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez >>> >>> <jhernan...@wikimedia.org> wrote: >>> >>>> >>> >>>> Regarding the caching, we would need to agree between apps and web >>> about >>> >>>> the url and smaxage parameter as Adam noted so that the urls are >>> exactly the >>> >>>> same to not bloat varnish and reuse the same cached objects across >>> >>>> platforms. >>> >>>> >>> >>>> It is an extremely adhoc and brittle solution but seems like it >>> would be >>> >>>> the greatest win. >>> >>>> >>> >>>> 20% of the traffic from searches by being only in android and web >>> beta >>> >>>> seems a lot to me, and we should work on reducing it, otherwise >>> when it hits >>> >>>> web stable we're going to crush the servers, so caching seems the >>> highest >>> >>>> priority. >>> >>>> >>> >>> To clarify its 20% of the load, as opposed to 20% of the traffic. But >>> >>> same difference :) >>> >>> >>> >>>> >>> >>>> Let's chime in https://phabricator.wikimedia.org/T124216 and >>> continue >>> >>>> the cache discussion there. >>> >>>> >>> >>>> Regarding the validity of results with opening text only, how >>> should we >>> >>>> proceed? Adam? >>> >>>> >>> >>> I've put together https://phabricator.wikimedia.org/T124258 to track >>> >>> putting together an AB test that measures the difference in click >>> through >>> >>> rates for the two approaches. >>> >>> >>> >>> >>> >>>> >>> >>>> On Wed, Jan 20, 2016 at 9:34 PM, David Causse < >>> dcau...@wikimedia.org> >>> >>>> wrote: >>> >>>>> >>> >>>>> Hi, >>> >>>>> >>> >>>>> Yes we can combine many factors, from templates (quality but also >>> >>>>> disambiguation/stubs), size and others. >>> >>>>> Today cirrus uses mostly the number of incoming links which (imho) >>> is >>> >>>>> not very good for morelike. >>> >>>>> On enwiki results will also be scored according the weights >>> defined in >>> >>>>> >>> https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates. >>> >>>>> >>> >>>>> I wrote a small bash to compare results : >>> >>>>> https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad >>> >>>>> Here is some random results from the list (Semetimes better, >>> sometimes >>> >>>>> worse) : >>> >>>>> >>> >>>>> $ sh morelike.sh Revolution_Muslim >>> >>>>> Defaults >>> >>>>> "title": "Chess", >>> >>>>> "title": "Suicide attack", >>> >>>>> "title": "Zachary Adam Chesser", >>> >>>>> ======= >>> >>>>> Opening text no boost links >>> >>>>> "title": "Hungarian Revolution of 1956", >>> >>>>> "title": "Muslims for America", >>> >>>>> "title": "Salafist Front", >>> >>>>> >>> >>>>> $ sh morelike.sh Chesser >>> >>>>> Defaults >>> >>>>> "title": "Chess", >>> >>>>> "title": "Edinburgh", >>> >>>>> "title": "Edinburgh Corn Exchange", >>> >>>>> ======= >>> >>>>> Opening text no boost links >>> >>>>> "title": "Dreghorn Barracks", >>> >>>>> "title": "Edinburgh Chess Club", >>> >>>>> "title": "Threipmuir Reservoir", >>> >>>>> >>> >>>>> $ sh morelike.sh Time_%28disambiguation%29 >>> >>>>> Defaults >>> >>>>> "title": "Atlantis: The Lost Empire", >>> >>>>> "title": "Stargate", >>> >>>>> "title": "Stargate SG-1", >>> >>>>> ======= >>> >>>>> Opening text no boost links >>> >>>>> "title": "Father Time (disambiguation)", >>> >>>>> "title": "The Last Time", >>> >>>>> "title": "Time After Time", >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> Le 20/01/2016 19:34, Jon Robson a écrit : >>> >>>>>> >>> >>>>>> I'm actually interested to see whether this yields better >>> results in >>> >>>>>> certain examples where the algorithm is lacking [1]. If it's done >>> as >>> >>>>>> an A/B test we could even measure things such as click throughs >>> in the >>> >>>>>> related article feature (whether they go up or not) >>> >>>>>> >>> >>>>>> Out of interest is it also possible to take article size and type >>> into >>> >>>>>> account and not returning any morelike results for things like >>> >>>>>> disambiguation pages and stubs? >>> >>>>>> >>> >>>>>> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya >>> >>>>>> >>> >>>>>> >>> >>>>>> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso <ab...@wikimedia.org> >>> >>>>>> wrote: >>> >>>>>>> >>> >>>>>>> One thing we could do regarding the quality of the output is >>> check >>> >>>>>>> results >>> >>>>>>> against a random sample of popular articles (example approach to >>> find >>> >>>>>>> some >>> >>>>>>> articles) on mdot Wikipedia. Presuming that improves the quality >>> of >>> >>>>>>> the >>> >>>>>>> recommendations or at least does not degrade them, we should >>> consider >>> >>>>>>> adding >>> >>>>>>> the enhancement task to a future sprint, with further >>> instrumentation >>> >>>>>>> and >>> >>>>>>> A/B testing / timeboxed beta test, etc. >>> >>>>>>> >>> >>>>>>> Joaquin, smaxage (e.g., 24 hour cached responses) does seem a >>> good >>> >>>>>>> fix for >>> >>>>>>> now for further reduction of client perceived wait, at least for >>> >>>>>>> non-cold >>> >>>>>>> cache requests, even if we stop beating up the backend. Does >>> anyone >>> >>>>>>> know of >>> >>>>>>> a compelling reason to not do that for the time being? The main >>> thing >>> >>>>>>> that >>> >>>>>>> comes to mind as always is growing the Varnish cache object pool >>> - >>> >>>>>>> probably >>> >>>>>>> not a huge deal while the thing is only in beta, but on the >>> stable >>> >>>>>>> channel >>> >>>>>>> maybe noteworthy because it would run on probably most pages (but >>> >>>>>>> that's >>> >>>>>>> what edge caches are for, after all). >>> >>>>>>> >>> >>>>>>> Erik, from your perspective does use of smaxage relieve the >>> backend >>> >>>>>>> sufficiently? >>> >>>>>>> >>> >>>>>>> If we do smaxage, then Web, Android, iOS should standardize their >>> >>>>>>> URLs so we >>> >>>>>>> get more cache hits at the edge across all clients. Here's the >>> URL I >>> >>>>>>> see >>> >>>>>>> being used on the web today from mobile web beta: >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> https://en.m.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=pageimages%7Cpageterms&piprop=thumbnail&pithumbsize=80&wbptterms=description&pilimit=3&generator=search&gsrsearch=morelike%3ACome_Share_My_Love&gsrnamespace=0&gsrlimit=3 >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> -Adam >>> >>>>>>> >>> >>>>>>> On Wed, Jan 20, 2016 at 7:45 AM, Joaquin Oltra Hernandez >>> >>>>>>> <jhernan...@wikimedia.org> wrote: >>> >>>>>>>> >>> >>>>>>>> I'd be up to it if we manage to cram it up in a following >>> sprint and >>> >>>>>>>> it is >>> >>>>>>>> worth it. >>> >>>>>>>> >>> >>>>>>>> We could run a controlled test against production with a long >>> batch >>> >>>>>>>> of >>> >>>>>>>> articles and check median/percentiles response time with >>> repeated >>> >>>>>>>> runs and >>> >>>>>>>> highlight the different results for human inspection regarding >>> >>>>>>>> quality. >>> >>>>>>>> >>> >>>>>>>> It's been noted previously that the results are far from ideal >>> >>>>>>>> (which they >>> >>>>>>>> are because it is just morelike), and I think it would be a >>> great >>> >>>>>>>> idea to >>> >>>>>>>> change the endpoint to a specific one that is smarter and has >>> some >>> >>>>>>>> cache (we >>> >>>>>>>> could do much more to get relevant results besides text >>> similarity, >>> >>>>>>>> take >>> >>>>>>>> into account links, or see also links if there are, etc...). >>> >>>>>>>> >>> >>>>>>>> As a note, in mobile web the related articles extension allows >>> >>>>>>>> editors to >>> >>>>>>>> specify articles to show in the section, which would avoid >>> queries >>> >>>>>>>> to >>> >>>>>>>> cirrussearch if it was more used (once rolled into stable I >>> guess). >>> >>>>>>>> >>> >>>>>>>> I remember that the performance related task was closed as >>> resolved >>> >>>>>>>> (https://phabricator.wikimedia.org/T121254#1907192), should we >>> >>>>>>>> reopen it or >>> >>>>>>>> create a new one? >>> >>>>>>>> >>> >>>>>>>> I'm not sure if we ended up adding the smaxage parameter (I >>> think we >>> >>>>>>>> didn't), should we? To me it seems a no-brainer that we should >>> be >>> >>>>>>>> caching >>> >>>>>>>> this results in varnish since they don't need to be completely >>> up to >>> >>>>>>>> date >>> >>>>>>>> for this use case. >>> >>>>>>>> >>> >>>>>>>> On Tue, Jan 19, 2016 at 11:54 PM, Erik Bernhardson >>> >>>>>>>> <ebernhard...@wikimedia.org> wrote: >>> >>>>>>>>> >>> >>>>>>>>> Both mobile apps and web are using CirrusSearch's morelike: >>> feature >>> >>>>>>>>> which >>> >>>>>>>>> is showing some performance issues on our end. We would like to >>> >>>>>>>>> make a >>> >>>>>>>>> performance optimization to it, but before we would prefer to >>> run >>> >>>>>>>>> an A/B >>> >>>>>>>>> test to see if the results are still "about as good" as they >>> are >>> >>>>>>>>> currently. >>> >>>>>>>>> >>> >>>>>>>>> The optimization is basically: Currently more like this takes >>> the >>> >>>>>>>>> entire >>> >>>>>>>>> article into account, we would like to change this to take >>> only the >>> >>>>>>>>> opening >>> >>>>>>>>> text of an article into account. This should reduce the amount >>> of >>> >>>>>>>>> work we >>> >>>>>>>>> have to do on the backend saving both server load and latency >>> the >>> >>>>>>>>> user sees >>> >>>>>>>>> running the query. >>> >>>>>>>>> >>> >>>>>>>>> This can be triggered by adding these two query parameters to >>> the >>> >>>>>>>>> search >>> >>>>>>>>> api request that is being performed: >>> >>>>>>>>> >>> >>>>>>>>> cirrusMltUseFields=yes&cirrusMltFields=opening_text >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> The API will give a warning that these parameters do not >>> exist, but >>> >>>>>>>>> they >>> >>>>>>>>> are safe to ignore. Would any of you be willing to run this >>> test? >>> >>>>>>>>> We would >>> >>>>>>>>> basically want to look at user perceived latency along with >>> click >>> >>>>>>>>> through >>> >>>>>>>>> rates for the current default setup along with the restricted >>> setup >>> >>>>>>>>> using >>> >>>>>>>>> only opening_text. >>> >>>>>>>>> >>> >>>>>>>>> Erik B. >>> >>>>>>>>> >>> >>>>>>>>> _______________________________________________ >>> >>>>>>>>> Mobile-l mailing list >>> >>>>>>>>> Mobile-l@lists.wikimedia.org >>> >>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>>>>>>>> >>> >>>>>>> >>> >>>>>>> _______________________________________________ >>> >>>>>>> Mobile-l mailing list >>> >>>>>>> Mobile-l@lists.wikimedia.org >>> >>>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>>>>>> >>> >>>>>> _______________________________________________ >>> >>>>>> Mobile-l mailing list >>> >>>>>> Mobile-l@lists.wikimedia.org >>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> _______________________________________________ >>> >>>>> Mobile-l mailing list >>> >>>>> Mobile-l@lists.wikimedia.org >>> >>>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>>> >>> >>>> >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Mobile-l mailing list >>> >>>> Mobile-l@lists.wikimedia.org >>> >>>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>>> >>> >>> >>> > >>> > >>> > _______________________________________________ >>> > Mobile-l mailing list >>> > Mobile-l@lists.wikimedia.org >>> > https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> > >>> >>> >>> >>> -- >>> Gabriel Wicke >>> Principal Engineer, Wikimedia Foundation >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >> >> >> >> -- >> Dmitry Brant >> Mobile Apps Team (Android) >> Wikimedia Foundation >> https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering >> >> > > > -- > Dmitry Brant > Mobile Apps Team (Android) > Wikimedia Foundation > https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering > > > _______________________________________________ > Mobile-l mailing list > Mobile-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >
_______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l