> i mean afterall: you still wnat the index to be useful for searching
> right? ... if you are really paranoid don't just strip the positions,
> strip all duplicate terms as well to prevent any attempt at statistical
> sampling ... but now all you relaly have is a lookup table of word to

That's right, once offsets info is discarded, phrase/spans search is not
possible. With the token obfuscation approach (obfuscate only after
stemming/normalizing at both indexing and search time) phrase/spans queries
work, but not so wildcard queries. To me, phrase/span queries seems more
important than wildcard queries, but this really depends on the application
in question. Security wise, I think both solutions will not be considered
safe by any security expert.

> docid with no tf/idf or position info to improve scoring, so why bother
> with Lucene, jsut use a BerkleyDB file to do your lookups.

With tf info in place Lucene search quality would be far beyond that of DB
lookup. In fact search quality is preserved, right? (except that
phrase/span queries don't work)





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to