Geert,

Regarding 2), there is thsr:expand(), which integrates well into the search 
libraries, but has its limitations. I gave a presentation at the last MarkLogic 
World that included an example of thesaurus expansion beyond what's provided in 
the thsr library, specifically multi-word expansion. The code is available in 
my github repo: https://github.com/wthoolihan/MLUC-2012-Examples. If you have 
any questions, let me know.

-Will


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: Monday, November 05, 2012 2:21 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Searching using language features..

Hi,

Several language support related questions this time. Most have been asked 
before, but had trouble putting all answers together. So, I'm just going to ask 
them once more:

1) Others have asked before, but is there a trick to ignore language in 
queries, and get results for all languages, without doing an or-query for all 
languages you are interested in?

2) MarkLogic has stemming support, but there is also a library to use thesauri. 
What is the best way to integrate that into the search library if I would like 
to use thesauri to expand search terms before doing the actual search? Or other 
similar code that would be able to expand a term into a list of all kinds of 
synonyms (or related terms)..

3) Stopwords: to my knowledge there are no built-in language-specific lists of 
stop words like 'the'. I know I can find stop words by searching for the top 
number of values (or words) and take the most common ones up to some threshold 
(and perhaps synthesize static lists from that). But what is the most efficient 
way to eliminate those from a search string? I have some code of my own in 
which I tokenize and eliminate with xqy dynamically, on each call, but perhaps 
someone knows a smarter trick?

Cheers,
Geert


M.Sc. G.P.H. (Geert) Josten
Senior Developer


Dayon B.V.
Delftechpark 37b
2628 XJ Delft
The Netherlands

T +31 (0)88 26 82 570

[email protected]
www.dayon.nl

De informatie - verzonden in of met dit e-mailbericht - is afkomstig van Dayon 
BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht 
onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht 
kunnen geen rechten worden ontleend.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to