There is another similar article where they tested a different search engine: http://www.searchtechnologies.com/querying-indexing-cloudsearch
Some takeaways: * Considers longer articles more important * Considers shorter titles more important (aka Germany vs List of German Corps in World War II) * Some hand tweaking ended up with the formula: text_relevance + 40.0*log10(content_size) - 15.0*log10(title_size) * defined a per-document boost from 0 to 10 based on which namespace something belongs to. * tweaked formula into: ext_relevance + (log10(content_size)*(doc_boost == 1 ? 25.0 : 40.0)) - (log10(title_size)*15) On Thu, Jul 7, 2016 at 10:29 AM, Erik Bernhardson < [email protected]> wrote: > Semi interesting post from Search Technologies (aka Paul Score) about > indexing wikipedia data: > http://www.searchtechnologies.com/wikipedia-azure-search > > Takeaways: > * Automated entity detection, categorizing into person/place/organization > * Offers search facets by wikipedia category and by entity detection > * Multiple scoring profiles offered which change the weight between title > and description (content? not clear) >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
