OK here what i've come up with - After reading your suggestions
- bit set from DB stays untouched
- only one field shall be used to store interest field bits in the document:
"interest". Saves disk space.
- The bits shall be not be converted to readable string but added as values
separated by space " "
====Code Below====
-----------------
public Document getDocument(int db_interest_bits)
{
   String interest_string ="";
   // sport
   if (db_interest_bits & 1) {
       interest_string +="1"+" "; // empty space as delimiter
   }
   // music
   if (bitsfromdb & 2) {
       interest_string +="2"+" "; // empty space as delimiter
   } 

   Document doc = new Document(); 
   doc.add("interest", interest_string); 
   // how do i tell Lucene to separate tokens on search ?

   return doc;
}
---------------

FURTHERMORE - i realized that almost all potential values are often set
i.e.
sport music film
sport music
sport music film
sport music film
sport music
music

So i was thinking : How about doing the reverse when it comes to building
the index ?
I would onyl store the fields that are not set.
The search would be a negation.

Example Values ofd interest:
1. "no_film" => Only a film is not set
2. "no_sport no_film" => film and sport are not set
3. "" => all values are set since this is a negation


It follows, searching for people interested in music:
=> search for NOT no_music

QUESTION
How does the perfomance of a negative search NOT compare to a normal one
I.E. 
"NOT no_music" vs "music" search under the premise that most interest flags
are set ?



---------

Daniel Noll-3 wrote:
> 
> Erick Erickson wrote:
>> Well, you really have the code already <G>. From the top...
>> 
>> 1> there's no good way to support searching bitfields If you wanted, you
>> could probably store it as a small integer and then search on it, but 
>> that's
>> waaay too complicated than you want.
>> 
>> 2> Add the fields like you have the snippet from, something like
>> Document doc = new Document.
>> if (bitsfromdb & 1) {
>>    doc.add("sport", "y");
>> }
>> if (bitsfromdb & 2) {
>>    doc.add("music", "y");
>> }
> 
> Beware that if there are a large number of bits, this is going to impact 
> memory usage due to there being more fields.
> 
> Perhaps a better way would be to use a single "bits" field and store the 
> words "sport", "music", ... in that field.
> 
> Daniel
> 
> 
> -- 
> Daniel Noll
> 
> Nuix Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
> Web: http://nuix.com/                               Fax: +61 2 9212 6902
> 
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-by-bit-masks-tf2603918.html#a7576286
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to