On Wed, 29 Jun 2016 08:06:35 -0700, Wissam Asfahani (TSO GB)  
<wissam.asfah...@tso.co.uk> wrote:

> Good afternoon,
>
> We are having some issues estimating the number of documents when  
> performing word queries containing punctuation characters.
>
> I have attached 4 sample documents. When using the below query, the  
> estimate returns 3 and the count 1.
>
> Are there any db configuration settings we can use to ensure a more  
> accurate estimate result?
>
>
> let $query := cts:word-query("4µ", ("exact"), 2)
>
> return
>   (
>     xdmp:estimate(cts:search(fn:doc(), $query)),
>     fn:count(cts:search(fn:doc(), $query))
>   )
>
>
> Wissam Asfahani
> XML Developer
>

Punctuation is not indexed in the word query indexes. An exact  
unwildcarded *value* query will consider punctuation, so if you can  
arrange things so that you can use a value query, that could be a  
solution. If it is just this character and searching for it in this way is  
confined to identifiable parts of the document, you could use field  
tokenizer overrides to redefine µ as a word  or symbol character for that  
field.  But it looks like it is being classified as a punctuation mark in  
error: it should be classified as a letter character anyway since it is  
listed as Ll in the Unicode tables.

//Mary
_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to