Using fields won't be an option for our usage case, but arranging things to use value queries may be.
Is it possible to re-classify these characters as symbols or words, without using field tokenizer overrides? For example, by modifying the tokenizer.xml file? Wissam -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Mary Holstege Sent: 29 June 2016 17:42 To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] word-query including punctuation characters On Wed, 29 Jun 2016 08:06:35 -0700, Wissam Asfahani (TSO GB) <wissam.asfah...@tso.co.uk> wrote: > Good afternoon, > > We are having some issues estimating the number of documents when > performing word queries containing punctuation characters. > > I have attached 4 sample documents. When using the below query, the > estimate returns 3 and the count 1. > > Are there any db configuration settings we can use to ensure a more > accurate estimate result? > > > let $query := cts:word-query("4µ", ("exact"), 2) > > return > ( > xdmp:estimate(cts:search(fn:doc(), $query)), > fn:count(cts:search(fn:doc(), $query)) > ) > > > Wissam Asfahani > XML Developer > Punctuation is not indexed in the word query indexes. An exact unwildcarded *value* query will consider punctuation, so if you can arrange things so that you can use a value query, that could be a solution. If it is just this character and searching for it in this way is confined to identifiable parts of the document, you could use field tokenizer overrides to redefine µ as a word or symbol character for that field. But it looks like it is being classified as a punctuation mark in error: it should be classified as a letter character anyway since it is listed as Ll in the Unicode tables. //Mary _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general ________________________________________________________________________ This e-mail has been scanned for all viruses by Claranet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.claranet.co.uk ________________________________________________________________________ GOGREEN Climate Protection with DHL: please consider your environmental responsibility before printing this email. This email is intended exclusively for the individual or entity to which it is addressed. This communication may contain information that is proprietary, privileged or confidential. If you are not the named addressee, you are not authorized to read, print, retain, copy or disseminate this message or any part of it. If you have received this message in error, please notify the sender immediately by email and delete all copies of the message. _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general