By the way, a meeting has been scheduled for 1920 UTC on Thursday,
4-February-2016 to go into technical specifics on PageImages.
Email me off list if you'd like to be added to the meeting.
-Adam
-- Forwarded message --
From: Adam Baso
Date: Sat, Jan 30, 2016 at 8:11 AM
Subject: Re: [WikimediaMobile] Similar articles feature performance in
CirrusSearch for apps and mobile web
To: Erik Bernhardson
Cc: Joaquin Oltra Hernandez , mobile-l <
mobile-l@lists.wikimedia.org>
Okay. As per https://phabricator.wikimedia.org/T124225#1984080 I think if
we're doing near term experimentation with a controlled A/B test the
Android app is the only logical place to start. Dmitry, can that work for
you? It's not required, but I think it would be neat to see if we can move
the needle even more. Of course your quarterly goals take top
priority...but what do you think?
On Sat, Jan 23, 2016 at 5:58 AM, Adam Baso wrote:
> Hey all, am planning to look at Phabricator tasks and provide a reply
> during the upcoming weekdays. Just wanted to acknowledge I saw your replies!
>
>
> On Friday, January 22, 2016, Erik Bernhardson
> wrote:
>
>> On Thu, Jan 21, 2016 at 1:29 AM, Joaquin Oltra Hernandez <
>> jhernan...@wikimedia.org> wrote:
>>
>>> Regarding the caching, we would need to agree between apps and web about
>>> the url and smaxage parameter as Adam noted so that the urls are
>>> *exactly* the same to not bloat varnish and reuse the same cached
>>> objects across platforms.
>>>
>>> It is an extremely adhoc and brittle solution but seems like it would be
>>> the greatest win.
>>>
>>> 20% of the traffic from searches by being only in android and web beta
>>> seems a lot to me, and we should work on reducing it, otherwise when it
>>> hits web stable we're going to crush the servers, so caching seems the
>>> highest priority.
>>>
>>> To clarify its 20% of the load, as opposed to 20% of the traffic. But
>> same difference :)
>>
>>
>>> Let's chime in https://phabricator.wikimedia.org/T124216 and continue
>>> the cache discussion there.
>>>
>>> Regarding the validity of results with opening text only, how should we
>>> proceed? Adam?
>>>
>>> I've put together https://phabricator.wikimedia.org/T124258 to track
>> putting together an AB test that measures the difference in click through
>> rates for the two approaches.
>>
>>
>>
>>> On Wed, Jan 20, 2016 at 9:34 PM, David Causse
>>> wrote:
>>>
Hi,
Yes we can combine many factors, from templates (quality but also
disambiguation/stubs), size and others.
Today cirrus uses mostly the number of incoming links which (imho) is
not very good for morelike.
On enwiki results will also be scored according the weights defined in
https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates.
I wrote a small bash to compare results :
https://gist.github.com/nomoa/93c5097e3c3cb3b6ebad
Here is some random results from the list (Semetimes better, sometimes
worse) :
$ sh morelike.sh Revolution_Muslim
Defaults
"title": "Chess",
"title": "Suicide attack",
"title": "Zachary Adam Chesser",
===
Opening text no boost links
"title": "Hungarian Revolution of 1956",
"title": "Muslims for America",
"title": "Salafist Front",
$ sh morelike.sh Chesser
Defaults
"title": "Chess",
"title": "Edinburgh",
"title": "Edinburgh Corn Exchange",
===
Opening text no boost links
"title": "Dreghorn Barracks",
"title": "Edinburgh Chess Club",
"title": "Threipmuir Reservoir",
$ sh morelike.sh Time_%28disambiguation%29
Defaults
"title": "Atlantis: The Lost Empire",
"title": "Stargate",
"title": "Stargate SG-1",
===
Opening text no boost links
"title": "Father Time (disambiguation)",
"title": "The Last Time",
"title": "Time After Time",
Le 20/01/2016 19:34, Jon Robson a écrit :
> I'm actually interested to see whether this yields better results in
> certain examples where the algorithm is lacking [1]. If it's done as
> an A/B test we could even measure things such as click throughs in the
> related article feature (whether they go up or not)
>
> Out of interest is it also possible to take article size and type into
> account and not returning any morelike results for things like
> disambiguation pages and stubs?
>
> [1] https://www.mediawiki.org/wiki/Topic:Swsjajvdll3pf8ya
>
>
> On Wed, Jan 20, 2016 at 9:47 AM, Adam Baso
> wrote:
>
>> One thing we could do regarding