MarkLogic doesn't index punctuation characters (Unicode class P) except for "exact" value queries.
Therefore a word query or a value query that does not have the "exact" option cannot be resolved precisely by the index, only by the filter. So the index returns false positives and if you want precise answers you need to use filtered search. "punctuation sensitive" by itself doesn't change this dynamic. © is a punctuation character, which is why you see precise answers only in filtered seaches (and at great cost). √ on the other hand, is not a punctuation character but a symbol (Unicode class S) and symbols are indexed as word in their own right, which is why you see precise answers even in unfiltered searches. If you are doing searches within the scope of particular elements, you could set up a field with a tokenizer override to reclassify certain punctuation characters (such as ©) as symbols instead. This only applies in the context of the field, but you can then do a field-word-query or field-value-query that would be accurate out of the index. By the way, the Uniview application (http://r12a.github.io/uniview) is a handy place to lookup particular characters and see what their Unicode classification is. //Mary _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
