Hi Folks,
I am building a list of words that I want to use to search text string to
distinguish between the different types of text strings. If the text string
contains any words in my list, then it is type A; otherwise, it defaults to
type B. The searches need to be case, diacritic, and punctuation insensitive.
I’m looking for optimal performance for this search. My thought is to simply
add the words to an XML record that looks something like this:
<keywords xmlns="/app/keyword">
<keyword>car</keyword>
<keyword>bus</keyword>
<keyword>train</keyword>
</keywords>
and to create a range element index using the appropriate
punctuation-insensitive collation on [ns]:keyword, and given a phrase, to
tokenize it on space and perform a case, diacritic, and punctuation insensitive
lexicon search.
Any better ideas? I thought about creating a dictionary but I don’t see any
options for a case-insensitive search.
Thanks for any ideas!
Tim
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general