Using fields won't be an option for our usage case, but arranging things to use
value queries may be.
Is it possible to re-classify these characters as symbols or words, without
using field tokenizer overrides? For example, by modifying the tokenizer.xml
file?
Wissam
-Original Message-
From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Mary Holstege
Sent: 29 June 2016 17:42
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] word-query including punctuation characters
On Wed, 29 Jun 2016 08:06:35 -0700, Wissam Asfahani (TSO GB)
wrote:
> Good afternoon,
>
> We are having some issues estimating the number of documents when
> performing word queries containing punctuation characters.
>
> I have attached 4 sample documents. When using the below query, the
> estimate returns 3 and the count 1.
>
> Are there any db configuration settings we can use to ensure a more
> accurate estimate result?
>
>
> let $query := cts:word-query("4µ", ("exact"), 2)
>
> return
> (
> xdmp:estimate(cts:search(fn:doc(), $query)),
> fn:count(cts:search(fn:doc(), $query))
> )
>
>
> Wissam Asfahani
> XML Developer
>
Punctuation is not indexed in the word query indexes. An exact unwildcarded
*value* query will consider punctuation, so if you can arrange things so that
you can use a value query, that could be a solution. If it is just this
character and searching for it in this way is confined to identifiable parts of
the document, you could use field tokenizer overrides to redefine µ as a word
or symbol character for that field. But it looks like it is being classified
as a punctuation mark in
error: it should be classified as a letter character anyway since it is listed
as Ll in the Unicode tables.
//Mary
___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
This e-mail has been scanned for all viruses by Claranet. The service is
powered by MessageLabs. For more information on a proactive anti-virus service
working around the clock, around the globe, visit:
http://www.claranet.co.uk
GOGREEN Climate Protection with DHL: please consider your environmental
responsibility before printing this email.
This email is intended exclusively for the individual or entity to which it is
addressed. This communication may contain information that is proprietary,
privileged or confidential. If you are not the named addressee, you are not
authorized to read, print, retain, copy or disseminate this message or any part
of it. If you have received this message in error, please notify the sender
immediately by email and delete all copies of the message.
___
General mailing list
General@developer.marklogic.com
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general