Hi, Dominique,

There are some updates to involving merge of dictionaries and to
exercise the operations involving unmatched quotes.  The new code is
SVN R505.

Please let me know if you have any additional questions.

John


On 3/27/12 1:03 PM, Dominique Prunier wrote:
> Hey John,
> 
> There is definitely a need for FastBit escaping. The escaping problem is not 
> at the shell level (though we could have one there) since in pure C/C++ code, 
> there's no shell involved when i'm building a where clause from a string. The 
> problem is at the where clause parsing level (in the lexer to be more 
> precise) to be able to express string literals among other things (and not 
> only metas, it is also white spaces, ...).
> Typically, my test that fails is as simple as calling 
> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected to create 
> a qString << a = it's good >> but now, it creates a qString << a = it\'s good 
> >> which is wrong. The attached patch restores the descaping, but _not_ the 
> double quote stripping (because it is already handled in the lexer). All my 
> test cases works after applying it on r503.

The constructors for ibis::qString and ibis::qLike really should not
strip away anything.  In your case, you should be able to do the following

.../tcapi data-dir "a=\"it's good\""

if you are using fastbit_build_query, you can use the same string
fastbit_build_query(..., ..., "a=\"it's good\"");

Since FastBit regular expression only support four meta characters ? *
_ %.  There is no need to escape anything.  It is probably cleaner to
not introduce stripping of anything special (except the outer most
quotes, which should be only done once).

> 
> About the decompression, thanks for the link, this is very interesting stuff 
> ! But my point here is not about questing the fact the decompression can be 
> better is some case, i was just under the impression that the hit vector 
> given to category::patternSearch was _always_ already decompressed since it 
> is ultimately a bitvector that has been created from scratch for query 
> evaluation (it would need verification though). My guess is that the few 
> percent of performance i'm loosing here are attributable to the check 
> (hits.isCompressed() && hits.bytes()*mult + bv->bytes() > hits.size()), since 
> it gets executed _A LOT_ of times. I'll try to investigate it a little bit 
> further.

I have rearranged the tests for decompression in layers which
hopefully will eliminate the need to perform more expensive tests in
your case that presumably involve a fairly small number of values.
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to