Thnx so far!

> -----Oorspronkelijk bericht-----
> Van: [email protected] [mailto:general-
> [email protected]] Namens Whitby, Rob, Springer Healthcare
> UK
> Verzonden: maandag 5 november 2012 11:50
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Searching using language
features..
>
> 1) To ignore document language you have to search unstemmed, a stemmed
> search is constrained to the language set in the query (or the default).
> The way we handle this is to run a stemmed query in the user's language
> OR-ed with the same query unstemmed.
>
> 3) We don't bother with any stop word filtering because they'll have low
> relevance anyway.
>
>
> Rob
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Geert
> Josten
> Sent: 05 November 2012 10:21
> To: MarkLogic Developer Discussion
> Subject: [MarkLogic Dev General] Searching using language features..
>
> Hi,
>
> Several language support related questions this time. Most have been
> asked
> before, but had trouble putting all answers together. So, I'm just going
> to
> ask them once more:
>
> 1) Others have asked before, but is there a trick to ignore language in
> queries, and get results for all languages, without doing an or-query
> for
> all languages you are interested in?
>
> 2) MarkLogic has stemming support, but there is also a library to use
> thesauri. What is the best way to integrate that into the search library
> if
> I would like to use thesauri to expand search terms before doing the
> actual
> search? Or other similar code that would be able to expand a term into a
> list of all kinds of synonyms (or related terms)..
>
> 3) Stopwords: to my knowledge there are no built-in language-specific
> lists
> of stop words like 'the'. I know I can find stop words by searching for
> the
> top number of values (or words) and take the most common ones up to some
> threshold (and perhaps synthesize static lists from that). But what is
> the
> most efficient way to eliminate those from a search string? I have some
> code
> of my own in which I tokenize and eliminate with xqy dynamically, on
> each
> call, but perhaps someone knows a smarter trick?
>
> Cheers,
> Geert
>
>
> M.Sc. G.P.H. (Geert) Josten
> Senior Developer
>
>
> Dayon B.V.
> Delftechpark 37b
> 2628 XJ Delft
> The Netherlands
>
> T +31 (0)88 26 82 570
>
> [email protected]
> www.dayon.nl
>
> De informatie - verzonden in of met dit e-mailbericht - is afkomstig van
> Dayon BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit
> bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen.
> Aan
> dit bericht kunnen geen rechten worden ontleend.
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to