Hi, Davey,
Thanks for the suggestions. It might be hard to change the structure
of ibis::qExpr::simply. A systematic way of addressing the problem
might be to develop a new expression to capture the some list of
preferences.
What you called booleans might actually be a representation of a set
of check boxes, right? If this is the case, then there is another
term to describe the data, set-valued attribute. A set-value
attribute can be recorded as a single column in a data table, where
the value at each row is a set taken from a list of predefined
choices. I have found a number of different research papers on
indexing and querying set-valued attributes. You probably can build
an extension to FastBit to deal with set-valued attributes without too
much trouble.
After implement an index for set-valued attributes, you can extend the
query syntax to support querying over a long list of choices. For
example, if you have a group of checkboxes named "automobile
preference" (ap for short), where the choices might including
something like "sports", "race", "sedan", "wagon", "convertible",
"sunroof", "green", "red", "white", "4-door", "2-door".. You might
invent a new keyword to express "finding all person who prefer red
sports convertible that is not white" as follows
ap has ("sports", "red", "convertible") and NOT ap has ("white")
In between the parentheses, you can put in a long list, however,
ibis::qExpre::simply will only see on single expression. The keyword
"HAS" functions somewhat like the keyword "IN" except that the
properties given in the list must all be true (or present in the set
of values).
Hope this helps.
John
On 7/6/12 2:45 AM, Lidawei (Davey) wrote:
> Hi authors of fastbit,
>
>
>
> This is my first post. Thanks for your wonderful job!
>
>
>
> Fastbit do help me solved a performance problem which need select
> users from a huge user database with so many characters quickly.
>
> After that, I still have some idea to make Fastbit better. My topic is
> about the complex condition support.
>
>
>
> In our use case, we need select from billions of users which have
> thousands of characters (chocolate lover for example), and we use a
> very long condition clause (more than 1000 and/or items). All
> character is a boolean value with yes/no
>
>
>
> In my first attempt of fastbit, I found that unfortunately the
> condition can’t support more than 200 and/or items.
>
> After further study, I found that the reason is Stack overflow at
> ibis::qExpr::simplify.
>
> This function is a recursion function. A very long condition make too
> deep recursion which cause the stack exhaust.
>
>
>
> Then I expand the stack size to 20M bytes. That make my program
> support 10 thousands or/and items.
>
> The approach solved my problem but not elegant. The best solution is
> eliminate the recursion of ibis::qExpr::simplify.
>
> I think It’s a realizable. The refactoring can significant expand the
> limit of Fastbit which make Fastbit more useful.
>
>
>
> Can you consider this idea?
>
>
>
> Another pity is no BOOLEAN data type support. Would you consider to
> support it in future version?
>
>
>
> cheers,
>
> Davey
>
>
>
>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users