Hi John, That's a good news for me. Please allow me to complain the poor document :P Maybe you can provide a wiki to establish a FAQ list. I'll glad to contribute some experience on it.
For Set-value & HAS function, your plan is too long:( Could you help to realize "Fix Length Bit Stream" data type first, so that we can simulate HAS function by bitwise logical operator which described in previous email. I think it's much easier but It's a very useful data type which can be also used as a fix length string without work load of locate record by *.sp file. Have you planed a plug-in framework for extend function for condition expression? It's also useful. Cheers, Davey -----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Tuesday, July 10, 2012 12:53 AM To: Lidawei (Davey) Cc: FastBit Users Subject: Re: [FastBit-users] Is it possible to eliminate the rescursion of ibis::qExpr::simplify Hi, Davey, The "IN" functionality is already there. It is for a field represented by a (exclusive) radiobox. The values could be either numerical or text. You can give it a try now. The operations on bit field through bitwise logical operators '&' and '|' is currently implemented. For integer values such as IP addresses, it is possible to say something like the following in the arithmetic expressions "ip & 0x1001 == 0x1001" Please let us know if this is not working as it is expected. As for the set valued attributes, since a number of different users have mentioned it, we should be developing one some time soon. However, this might be something in 6 months or a year.. John On 7/9/12 12:52 AM, Lidawei (Davey) wrote: > Hi John, > > I want to separate your suggestion in two. > 1. Realize Set-Value data type. > 2. Realize HAS function. > > Unfortunately, I have no idea about the research papers about searching in > Set-Valued attribute. It's too difficult for me ;-) > > But you really give me a big hint. I think we should realize "IN" function > firstly rather than "HAS". > > IN function is useful for some multi-value searching scenario, which can > significant reduce the recursion. > For example, Imaging a database which is an automobile db. (This is a > different example, not user's preference about automobile, since user can > love both red and green. While a automobile model can only in one certain > color.) > Then if I want to search my favorite automobile, at least I can replace > "(color=='red' OR color=='green' OR color=='blue') AND ..." > With > "color IN('red', 'green', 'blue'......) AND ......" > > This approach is helpful for make the condition clause support more > conditions. > > But to achieve this goal, I still have some more questions: > 1. IN function has variable amount of terms. I have study the code of > ibis::qExpr. I think it's easy to add a two terms function or three terms > function. But how to add a variable terms function? > 2. All operators are hard code in qExpr. Even to add a two or three terms > function, the change will be everywhere of qExpr class. Is there any plug-in > framework to simplify the job? I think you should provide some plug-in API > for this. > 3.IN function can only use for the attributes that not support composite > values. For example "red" and "green" are mutually exclusive. While "Sport" > and "Sunroof" is combinable. So your original suggestion still need to > considered.But to solve the problem, I think Fastbit need not realize a new > set-value searching arithmetic, since bitmap is already a flag of grouping. > Explain in the following. > > In order to store a group of checkbox value in arbitrary combine, each > checkbox need a bit to store it's state. That means a set-value attribute for > all these checkbox is equivalent to a set of one bit value attributes. > My idea is either realize a "Bit Stream" data type, which we can store > arbitrary bits, each stands for a checkbox, or realize a "Single > bit"(Boolean) data type, which stands for only one checkbox. > > For "Bit stream" approach we also need provide operator to manipulate each > bit like "& | !" in C express. > Then If I want to search find out my favorite automobile in fastbit database, > the condition clause can be written like this: "ap&5>0". > Before generate this condition clause, Some preparation need to be make: > Define binary value "0100" stands for "sports". Define binary value "0001" > stands for "sunroof". > While Binary value "0101" can be conclude to stands for "sports and sunroof", > which is 5 in decimal. > So "ap&5>0"means I care about Sports and sunroof, but no matter in other > characters. > > "Bit stream" solution can reduce the usage of logic connector in express > since bit flag composite has already been calculated before searching. So it > can reduce recursion compare with "Single bit" solution. > But the shortcoming of "Bit stream" solution is not intuitionist. > > In my opinion: > * Support "Single bit" data value and "eliminate recursion" is the best > choice. > * Support "Bit Stream" data value and "bit operator & | !" is good. > * Support "Bit operator" only but not support "Bit Stream" data value is also > acceptable. That means we can only use LONG as a 64 bit stream at most. but > much easy to realize. > > Cheers, > Davey > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Friday, July 06, 2012 11:52 PM > To: FastBit Users > Cc: Lidawei (Davey) > Subject: Re: [FastBit-users] Is it possible to eliminate the rescursion of > ibis::qExpr::simplify > > Hi, Davey, > > Thanks for the suggestions. It might be hard to change the structure > of ibis::qExpr::simply. A systematic way of addressing the problem > might be to develop a new expression to capture the some list of > preferences. > What you called booleans might actually be a representation of a set > of check boxes, right? If this is the case, then there is another > term to describe the data, set-valued attribute. A set-value > attribute can be recorded as a single column in a data table, where > the value at each row is a set taken from a list of predefined > choices. I have found a number of different research papers on > indexing and querying set-valued attributes. You probably can build > an extension to FastBit to deal with set-valued attributes without too > much trouble. > > After implement an index for set-valued attributes, you can extend the > query syntax to support querying over a long list of choices. For > example, if you have a group of checkboxes named "automobile > preference" (ap for short), where the choices might including > something like "sports", "race", "sedan", "wagon", "convertible", > "sunroof", "green", "red", "white", "4-door", "2-door".. You might > invent a new keyword to express "finding all person who prefer red > sports convertible that is not white" as follows > > ap has ("sports", "red", "convertible") and NOT ap has ("white") > > In between the parentheses, you can put in a long list, however, > ibis::qExpre::simply will only see on single expression. The keyword > "HAS" functions somewhat like the keyword "IN" except that the > properties given in the list must all be true (or present in the set > of values). > > Hope this helps. > > John > > > > > On 7/6/12 2:45 AM, Lidawei (Davey) wrote: >> Hi authors of fastbit, >> >> >> >> This is my first post. Thanks for your wonderful job! >> >> >> >> Fastbit do help me solved a performance problem which need select >> users from a huge user database with so many characters quickly. >> >> After that, I still have some idea to make Fastbit better. My topic is >> about the complex condition support. >> >> >> >> In our use case, we need select from billions of users which have >> thousands of characters (chocolate lover for example), and we use a >> very long condition clause (more than 1000 and/or items). All >> character is a boolean value with yes/no >> >> >> >> In my first attempt of fastbit, I found that unfortunately the >> condition can't support more than 200 and/or items. >> >> After further study, I found that the reason is Stack overflow at >> ibis::qExpr::simplify. >> >> This function is a recursion function. A very long condition make too >> deep recursion which cause the stack exhaust. >> >> >> >> Then I expand the stack size to 20M bytes. That make my program >> support 10 thousands or/and items. >> >> The approach solved my problem but not elegant. The best solution is >> eliminate the recursion of ibis::qExpr::simplify. >> >> I think It's a realizable. The refactoring can significant expand the >> limit of Fastbit which make Fastbit more useful. >> >> >> >> Can you consider this idea? >> >> >> >> Another pity is no BOOLEAN data type support. Would you consider to >> support it in future version? >> >> >> >> cheers, >> >> Davey >> >> >> >> >> >> _______________________________________________ >> FastBit-users mailing list >> [email protected] >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >> > _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
