1) To ignore document language you have to search unstemmed, a stemmed search is constrained to the language set in the query (or the default). The way we handle this is to run a stemmed query in the user's language OR-ed with the same query unstemmed.
3) We don't bother with any stop word filtering because they'll have low relevance anyway. Rob -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Geert Josten Sent: 05 November 2012 10:21 To: MarkLogic Developer Discussion Subject: [MarkLogic Dev General] Searching using language features.. Hi, Several language support related questions this time. Most have been asked before, but had trouble putting all answers together. So, I'm just going to ask them once more: 1) Others have asked before, but is there a trick to ignore language in queries, and get results for all languages, without doing an or-query for all languages you are interested in? 2) MarkLogic has stemming support, but there is also a library to use thesauri. What is the best way to integrate that into the search library if I would like to use thesauri to expand search terms before doing the actual search? Or other similar code that would be able to expand a term into a list of all kinds of synonyms (or related terms).. 3) Stopwords: to my knowledge there are no built-in language-specific lists of stop words like 'the'. I know I can find stop words by searching for the top number of values (or words) and take the most common ones up to some threshold (and perhaps synthesize static lists from that). But what is the most efficient way to eliminate those from a search string? I have some code of my own in which I tokenize and eliminate with xqy dynamically, on each call, but perhaps someone knows a smarter trick? Cheers, Geert M.Sc. G.P.H. (Geert) Josten Senior Developer Dayon B.V. Delftechpark 37b 2628 XJ Delft The Netherlands T +31 (0)88 26 82 570 [email protected] www.dayon.nl De informatie - verzonden in of met dit e-mailbericht - is afkomstig van Dayon BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht kunnen geen rechten worden ontleend. _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
