Hi Folks,

 

I am building a list of words that I want to use to search text string to 
distinguish between the different types of text strings.  If the text string 
contains any words in my list, then it is type A; otherwise, it defaults to 
type B.  The searches need to be case, diacritic, and punctuation insensitive.  
I’m looking for optimal performance for this search.  My thought is to simply 
add the words to an XML record that looks something like this:

<keywords xmlns="/app/keyword">
    <keyword>car</keyword>
    <keyword>bus</keyword>
    <keyword>train</keyword>
</keywords>



and to create a range element index using the appropriate 
punctuation-insensitive collation on [ns]:keyword, and given a phrase, to 
tokenize it on space and perform a case, diacritic, and punctuation insensitive 
lexicon search.

 

Any better ideas?  I thought about creating a dictionary but I don’t see any 
options for a case-insensitive search.

 

Thanks for any ideas!

 

Tim

 

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to