On Wed, 29 Jun 2016 08:06:35 -0700, Wissam Asfahani (TSO GB) <wissam.asfah...@tso.co.uk> wrote:
> Good afternoon, > > We are having some issues estimating the number of documents when > performing word queries containing punctuation characters. > > I have attached 4 sample documents. When using the below query, the > estimate returns 3 and the count 1. > > Are there any db configuration settings we can use to ensure a more > accurate estimate result? > > > let $query := cts:word-query("4µ", ("exact"), 2) > > return > ( > xdmp:estimate(cts:search(fn:doc(), $query)), > fn:count(cts:search(fn:doc(), $query)) > ) > > > Wissam Asfahani > XML Developer > Punctuation is not indexed in the word query indexes. An exact unwildcarded *value* query will consider punctuation, so if you can arrange things so that you can use a value query, that could be a solution. If it is just this character and searching for it in this way is confined to identifiable parts of the document, you could use field tokenizer overrides to redefine µ as a word or symbol character for that field. But it looks like it is being classified as a punctuation mark in error: it should be classified as a letter character anyway since it is listed as Ll in the Unicode tables. //Mary _______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general