Hi John,

I want to separate your suggestion in two.
1. Realize Set-Value data type.
2. Realize HAS function.

Unfortunately, I have no idea about the research papers about searching in 
Set-Valued attribute. It's too difficult for me ;-)

But you really give me a big hint. I think we should realize "IN" function 
firstly rather than "HAS".

IN function is useful for some multi-value searching scenario, which can 
significant reduce the recursion.
For example, Imaging a database which is an automobile db. (This is a different 
example, not user's preference about automobile, since user can love both red 
and green. While a automobile model can only in one certain color.)
Then if I want to search my favorite automobile, at least I can replace 
"(color=='red' OR color=='green' OR color=='blue') AND ..." 
With
"color IN('red', 'green', 'blue'......) AND ......"

This approach is helpful for make the condition clause support more conditions.

But to achieve this goal, I still have some more questions:
1. IN function has variable amount of terms. I have study the code of 
ibis::qExpr. I think it's easy to add a two terms function or three terms 
function. But how to add a variable terms function?
2. All operators are hard code in qExpr. Even to add a two or three terms 
function, the change will be everywhere of qExpr class. Is there any plug-in 
framework to simplify the job? I think you should provide some plug-in API for 
this.
3.IN function can only use for the attributes that not support composite 
values. For example "red" and "green" are mutually exclusive. While "Sport" and 
"Sunroof" is combinable. So your original suggestion still need to 
considered.But to solve the problem, I think Fastbit need not realize a new 
set-value searching arithmetic, since bitmap is already a flag of grouping. 
Explain in the following.

In order to store a group of checkbox value in arbitrary combine, each checkbox 
need a bit to store it's state. That means a set-value attribute for all these 
checkbox is equivalent to a set of one bit value attributes.
My idea is either realize a "Bit Stream" data type, which we can store 
arbitrary bits, each stands for a checkbox, or realize a "Single bit"(Boolean) 
data type, which stands for only one checkbox.

For "Bit stream" approach we also need provide operator to manipulate each bit 
like "& | !" in C express.
Then If I want to search find out my favorite automobile in fastbit database, 
the condition clause can be written like this: "ap&5>0".
Before generate this condition clause, Some preparation need to be make:
Define binary value "0100" stands for "sports". Define binary value "0001" 
stands for "sunroof".
While Binary value "0101" can be conclude to stands for "sports and sunroof", 
which is 5 in decimal.
So "ap&5>0"means I care about Sports and sunroof, but no matter in other 
characters.

"Bit stream" solution can reduce the usage of logic connector in express since 
bit flag composite has already been calculated before searching. So it can 
reduce recursion compare with "Single bit" solution.
But the shortcoming of "Bit stream" solution is not intuitionist.

In my opinion:
* Support "Single bit" data value and "eliminate recursion" is the best choice.
* Support "Bit Stream" data value and "bit operator & | !" is good.
* Support "Bit operator" only but not support "Bit Stream" data value is also 
acceptable. That means we can only use LONG as a 64 bit stream at most. but 
much easy to realize.

Cheers,
Davey

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Friday, July 06, 2012 11:52 PM
To: FastBit Users
Cc: Lidawei (Davey)
Subject: Re: [FastBit-users] Is it possible to eliminate the rescursion of 
ibis::qExpr::simplify

Hi, Davey,

Thanks for the suggestions.  It might be hard to change the structure
of ibis::qExpr::simply.  A systematic way of addressing the problem
might be to develop a new expression to capture the some list of
preferences.
What you called booleans might actually be a representation of a set
of check boxes, right?  If this is the case, then there is another
term to describe the data, set-valued attribute.  A set-value
attribute can be recorded as a single column in a data table, where
the value at each row is a set taken from a list of predefined
choices.  I have found a number of different research papers on
indexing and querying set-valued attributes.  You probably can build
an extension to FastBit to deal with set-valued attributes without too
much trouble.

After implement an index for set-valued attributes, you can extend the
query syntax to support querying over a long list of choices.  For
example, if you have a group of checkboxes named "automobile
preference" (ap for short), where the choices might including
something like "sports", "race", "sedan", "wagon", "convertible",
"sunroof", "green", "red", "white", "4-door", "2-door"..  You might
invent a new keyword to express "finding all person who prefer red
sports convertible that is not white" as follows

ap has ("sports", "red", "convertible") and NOT ap has ("white")

In between the parentheses, you can put in a long list, however,
ibis::qExpre::simply will only see on single expression.  The keyword
"HAS" functions somewhat like the keyword "IN" except that the
properties given in the list must all be true (or present in the set
of values).

Hope this helps.

John




On 7/6/12 2:45 AM, Lidawei (Davey) wrote:
> Hi authors of fastbit,
> 
>  
> 
> This is my first post. Thanks for your wonderful job!
> 
>  
> 
> Fastbit do help me solved a performance problem which need select
> users from a huge user database with so many characters quickly.
> 
> After that, I still have some idea to make Fastbit better. My topic is
> about the complex condition support.
> 
>  
> 
> In our use case, we need select from billions of users which have
> thousands of characters (chocolate lover for example), and we use a
> very long condition clause (more than 1000 and/or items). All
> character is a boolean value with yes/no
> 
>  
> 
> In my first attempt of fastbit, I found that unfortunately the
> condition can't support more than 200 and/or items.
> 
> After further study, I found that the reason is Stack overflow at
> ibis::qExpr::simplify.
> 
> This function is a recursion function. A very long condition make too
> deep recursion which cause the stack exhaust.
> 
>  
> 
> Then I expand the stack size to 20M bytes. That make my program
> support 10 thousands or/and items.
> 
> The approach solved my problem but not elegant. The best solution is
> eliminate the recursion of ibis::qExpr::simplify.
> 
> I think It's a realizable. The refactoring can significant expand the
> limit of Fastbit which make Fastbit more useful.
> 
>  
> 
> Can you consider this idea?
> 
>  
> 
> Another pity is no BOOLEAN data type support. Would you consider to
> support it in future version?
> 
>  
> 
> cheers,
> 
> Davey
> 
>  
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to