Hi Will, Looks interesting. Could you give some brief samples of how to call/use this code? Couldn't find a main module..
Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: [email protected] [mailto:general- > [email protected]] Namens Will Thompson > Verzonden: maandag 5 november 2012 22:41 > Aan: MarkLogic Developer Discussion > Onderwerp: Re: [MarkLogic Dev General] Searching using language features.. > > Geert, > > Regarding 2), there is thsr:expand(), which integrates well into the search > libraries, but has its limitations. I gave a presentation at the last MarkLogic > World that included an example of thesaurus expansion beyond what's provided > in the thsr library, specifically multi-word expansion. The code is available in my > github repo: https://github.com/wthoolihan/MLUC-2012-Examples. If you have > any questions, let me know. > > -Will > > > -----Original Message----- > From: [email protected] [mailto:general- > [email protected]] On Behalf Of Geert Josten > Sent: Monday, November 05, 2012 2:21 AM > To: MarkLogic Developer Discussion > Subject: [MarkLogic Dev General] Searching using language features.. > > Hi, > > Several language support related questions this time. Most have been asked > before, but had trouble putting all answers together. So, I'm just going to ask > them once more: > > 1) Others have asked before, but is there a trick to ignore language in queries, > and get results for all languages, without doing an or-query for all languages you > are interested in? > > 2) MarkLogic has stemming support, but there is also a library to use thesauri. > What is the best way to integrate that into the search library if I would like to > use thesauri to expand search terms before doing the actual search? Or other > similar code that would be able to expand a term into a list of all kinds of > synonyms (or related terms).. > > 3) Stopwords: to my knowledge there are no built-in language-specific lists of > stop words like 'the'. I know I can find stop words by searching for the top > number of values (or words) and take the most common ones up to some > threshold (and perhaps synthesize static lists from that). But what is the most > efficient way to eliminate those from a search string? I have some code of my > own in which I tokenize and eliminate with xqy dynamically, on each call, but > perhaps someone knows a smarter trick? > > Cheers, > Geert > > > M.Sc. G.P.H. (Geert) Josten > Senior Developer > > > Dayon B.V. > Delftechpark 37b > 2628 XJ Delft > The Netherlands > > T +31 (0)88 26 82 570 > > [email protected] > www.dayon.nl > > De informatie - verzonden in of met dit e-mailbericht - is afkomstig van Dayon > BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit bericht > onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit bericht > kunnen geen rechten worden ontleend. > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
