Hi,

Several language support related questions this time. Most have been asked
before, but had trouble putting all answers together. So, I'm just going
to
ask them once more:

1) Others have asked before, but is there a trick to ignore language in
queries, and get results for all languages, without doing an or-query for
all languages you are interested in?

2) MarkLogic has stemming support, but there is also a library to use
thesauri. What is the best way to integrate that into the search library
if
I would like to use thesauri to expand search terms before doing the
actual
search? Or other similar code that would be able to expand a term into a
list of all kinds of synonyms (or related terms)..

3) Stopwords: to my knowledge there are no built-in language-specific
lists
of stop words like 'the'. I know I can find stop words by searching for
the
top number of values (or words) and take the most common ones up to some
threshold (and perhaps synthesize static lists from that). But what is the
most efficient way to eliminate those from a search string? I have some
code
of my own in which I tokenize and eliminate with xqy dynamically, on each
call, but perhaps someone knows a smarter trick?

Cheers,
Geert


M.Sc. G.P.H. (Geert) Josten
Senior Developer


Dayon B.V.
Delftechpark 37b
2628 XJ Delft
The Netherlands

T +31 (0)88 26 82 570

[email protected]
www.dayon.nl

De informatie - verzonden in of met dit e-mailbericht - is afkomstig van
Dayon BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan
dit bericht kunnen geen rechten worden ontleend.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to