OK here what i've come up with - After reading your suggestions - bit set from DB stays untouched - only one field shall be used to store interest field bits in the document: "interest". Saves disk space. - The bits shall be not be converted to readable string but added as values separated by space " " ====Code Below==== ----------------- public Document getDocument(int db_interest_bits) { String interest_string =""; // sport if (db_interest_bits & 1) { interest_string +="1"+" "; // empty space as delimiter } // music if (bitsfromdb & 2) { interest_string +="2"+" "; // empty space as delimiter }
Document doc = new Document(); doc.add("interest", interest_string); // how do i tell Lucene to separate tokens on search ? return doc; } --------------- FURTHERMORE - i realized that almost all potential values are often set i.e. sport music film sport music sport music film sport music film sport music music So i was thinking : How about doing the reverse when it comes to building the index ? I would onyl store the fields that are not set. The search would be a negation. Example Values ofd interest: 1. "no_film" => Only a film is not set 2. "no_sport no_film" => film and sport are not set 3. "" => all values are set since this is a negation It follows, searching for people interested in music: => search for NOT no_music QUESTION How does the perfomance of a negative search NOT compare to a normal one I.E. "NOT no_music" vs "music" search under the premise that most interest flags are set ? --------- Daniel Noll-3 wrote: > > Erick Erickson wrote: >> Well, you really have the code already <G>. From the top... >> >> 1> there's no good way to support searching bitfields If you wanted, you >> could probably store it as a small integer and then search on it, but >> that's >> waaay too complicated than you want. >> >> 2> Add the fields like you have the snippet from, something like >> Document doc = new Document. >> if (bitsfromdb & 1) { >> doc.add("sport", "y"); >> } >> if (bitsfromdb & 2) { >> doc.add("music", "y"); >> } > > Beware that if there are a large number of bits, this is going to impact > memory usage due to there being more fields. > > Perhaps a better way would be to use a single "bits" field and store the > words "sport", "music", ... in that field. > > Daniel > > > -- > Daniel Noll > > Nuix Pty Ltd > Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699 > Web: http://nuix.com/ Fax: +61 2 9212 6902 > > This message is intended only for the named recipient. If you are not > the intended recipient you are notified that disclosing, copying, > distributing or taking any action in reliance on the contents of this > message or attachment is strictly prohibited. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Searching-by-bit-masks-tf2603918.html#a7576286 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]