Hi, Dominique, Thanks for the suggestions and and test cases. Just checked in a set of changes as SVN Revision 506. Here is a bit more explanation.
On item 1, I have taken the option you've suggested in the first message, i.e., use the macro FASTBIT_EMPTY_STRING_AS_NULL On item 2, I have restored the escaping of using backslash as you requested. Will go through you tests cases next and see what else I need to change. John On 3/28/12 8:02 AM, Dominique Prunier wrote: > Hi John, > > Here is two simple test cases for my two issues with r505. > > 1. empty-strings.zip: there is no way to select empty string anymore > > **** FAILED (bad result count) with where clause << a='' >> > > **** FAILED (bad result count) with where clause << a=0 >> > > **** FAILED (hard) with where clause << a LIKE '' >> > > 2. de-escaping.zip: there is no way to select a string with a reserved char > in it > > **** FAILED (bad result count) with where clause << a='it\'s good' >> > > Warning -- ibis::whereParser encountered syntax error, unexpected name string > at location a='it's good':1.7 > Warning -- query[96w0-3xn8Tg----1]::setWhereClause -- failed to parse the > WHERE clause "a='it's good'" > Warning -- fastbit_build_query failed to assign conditions (a='it's good') to > a query > **** FAILED (hard) with where clause << a='it's good' >> > > Warning -- ibis::whereParser encountered syntax error, unexpected $undefined > at location a=it's good:1.5 > Warning -- query[96w0-3xn8Tg----2]::setWhereClause -- failed to parse the > WHERE clause "a=it's good" > Warning -- fastbit_build_query failed to assign conditions (a=it's good) to a > query > **** FAILED (hard) with where clause << a=it's good >> > > With the attached patch, it fixes my test cases (except a LIKE '' that didn't > work before and a='it's good' and a=it's good whieh are invalid where > clauses). > > Thanks, > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Dominique Prunier > Sent: Wednesday, March 28, 2012 10:04 AM > To: K. John Wu > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 > > Hey John, > > I'm trying r505 right know, i have two questions/remarks: > > 1. I saw a change about the dictionary now able to accept empty strings and > treat them as normal strings instead of NULLs. This breaks quite a lot of my > test cases, specifically those which used to test category=0 or !=0 to > simulate the IS NULL/IS NOT NULL because testing for an empty string doesn't > work (='' or LIKE '' used to fail or return invalid results). What would be > your recommendation for that use case ? Could we add a define for those > relying on this behavior ? Something like: > > - //if (*str == 0) return 0; > +#ifdef FASTBIT_EMPTY_STRING_AS_NULL > + if (*str == 0) return 0; > +#endif > > 2. I can't find anywhere the de-escaping patch in r505, am i missing > something ? > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Wednesday, March 28, 2012 1:55 AM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 > > Hi, Dominique, > > There are some updates to involving merge of dictionaries and to > exercise the operations involving unmatched quotes. The new code is > SVN R505. > > Please let me know if you have any additional questions. > > John > > > On 3/27/12 1:03 PM, Dominique Prunier wrote: >> Hey John, >> >> There is definitely a need for FastBit escaping. The escaping problem is not >> at the shell level (though we could have one there) since in pure C/C++ >> code, there's no shell involved when i'm building a where clause from a >> string. The problem is at the where clause parsing level (in the lexer to be >> more precise) to be able to express string literals among other things (and >> not only metas, it is also white spaces, ...). >> Typically, my test that fails is as simple as calling >> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected to create >> a qString << a = it's good >> but now, it creates a qString << a = it\'s >> good >> which is wrong. The attached patch restores the descaping, but _not_ >> the double quote stripping (because it is already handled in the lexer). All >> my test cases works after applying it on r503. > > The constructors for ibis::qString and ibis::qLike really should not > strip away anything. In your case, you should be able to do the following > > .../tcapi data-dir "a=\"it's good\"" > > if you are using fastbit_build_query, you can use the same string > fastbit_build_query(..., ..., "a=\"it's good\""); > > Since FastBit regular expression only support four meta characters ? * > _ %. There is no need to escape anything. It is probably cleaner to > not introduce stripping of anything special (except the outer most > quotes, which should be only done once). > >> >> About the decompression, thanks for the link, this is very interesting stuff >> ! But my point here is not about questing the fact the decompression can be >> better is some case, i was just under the impression that the hit vector >> given to category::patternSearch was _always_ already decompressed since it >> is ultimately a bitvector that has been created from scratch for query >> evaluation (it would need verification though). My guess is that the few >> percent of performance i'm loosing here are attributable to the check >> (hits.isCompressed() && hits.bytes()*mult + bv->bytes() > hits.size()), >> since it gets executed _A LOT_ of times. I'll try to investigate it a little >> bit further. > > I have rearranged the tests for decompression in layers which > hopefully will eliminate the need to perform more expensive tests in > your case that presumably involve a fairly small number of values. > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
