Le 20/02/2016 02:13, Jon Katz a écrit :
Also, even without boost links, there seems to be a bias towards popular (long pages). it seems that a focus on # of words in common rather than % is one of the things leading to long articles seeing so much more traction - would this be an easy thing to test as well?

Hi,

you're right but I think it's because of the boost templates feature which is enabled even when boostlinks is not: on enwiki few templates are configured in https://en.wikipedia.org/wiki/MediaWiki:Cirrussearch-boost-templates which means that a featured article will be overboosted.

We could fine tune the core more like algorithm with various params but today I think that the rescore features (boostlinks, boost-templates) is what have the most impact.

To sum up, 2 types of score are combined when ranking articles:
- A score that computes the similarity between documents, this can be fine-tuned[1] - A score (we call it "rescore") that uses article metadata: boostlinks, templates.

The way these scores are combined can be configured with a rescore profile, but today it's a product of all the scores, e.g.

morelike:A_Summer_Bird-Cage

The score for "I Know Why the Caged Bird Sings" with boost links is:
- similarity: 0.3457441 (terms chosen: "from", "cage", "bird")
- boostlinks: 2.807535
- boost-templates: 2
- total: 0.3457441 * 2.807535 * 2 => 1.9413773


[1]: https://www.mediawiki.org/wiki/Help:CirrusSearch#morelike:

_______________________________________________
Mobile-l mailing list
Mobile-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mobile-l

Reply via email to