Hello,

I need to do a search that is capable to also match on substrings, for example:

*oo bar the qu*

should find a document that contains 'foo bar the quux' and 'foo bar the qux'. 
Now, should I index the text as UN_TOKENIZED also, and do a WildCardQuery on 
this field? Obviously, then every blobtext is added as a single term in lucene. 
Clearly, this doesn't scale at all, and searching becomes very slow. 

Does anybody know a more efficient way? A PhraseQuery might get me somewhere, 
isn't? Does PhraseQuery allow wildcards in the phrase? But, as a phrase is 
analyzed according some analyzer it might strip the 'the' as a stopword, 
implying that *oo bar qu* would also match, right?

I know the requirements is a little strange, but it is part of the JSR-170 
specification (sql 'like' or xpath 'jcr:like' which mimics the sql like in db)

Thanks for any pointers 

Ard

-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
[EMAIL PROTECTED] / [EMAIL PROTECTED] / http://www.hippo.nl
-------------------------------------------------------------- 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to