> i mean afterall: you still wnat the index to be useful for searching > right? ... if you are really paranoid don't just strip the positions, > strip all duplicate terms as well to prevent any attempt at statistical > sampling ... but now all you relaly have is a lookup table of word to
That's right, once offsets info is discarded, phrase/spans search is not possible. With the token obfuscation approach (obfuscate only after stemming/normalizing at both indexing and search time) phrase/spans queries work, but not so wildcard queries. To me, phrase/span queries seems more important than wildcard queries, but this really depends on the application in question. Security wise, I think both solutions will not be considered safe by any security expert. > docid with no tf/idf or position info to improve scoring, so why bother > with Lucene, jsut use a BerkleyDB file to do your lookups. With tf info in place Lucene search quality would be far beyond that of DB lookup. In fact search quality is preserved, right? (except that phrase/span queries don't work) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]