Thanks John for your help. It is always very appreciated.

With the macro FASTBIT_EMPTY_STRING_AS_NULL enabled, all my test cases now 
works. I'll test performance next but it could be a good candidate for my 
stable version.

About the FASTBIT_EMPTY_STRING_AS_NULL, what do you think would be the best 
default ? For my use case, it is obviously to enable it, but for everybody else 
i don't know.  My concern is the backward compatibility here, especially the 
fact that it influences the index creation, and not usage. This means that 
somebody who upgrades won't notice this change before it regenerates indexes. 
What's your opinion on this ?

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Wednesday, March 28, 2012 11:20 AM
To: Dominique Prunier
Cc: FastBit Users
Subject: Re: [FastBit-users] PATCH: perf boost on top of r501

Hi, Dominique,

Thanks for the suggestions and and test cases.  Just checked in a set
of changes as SVN Revision 506.  Here is a bit more explanation.

On item 1, I have taken the option you've suggested in the first
message, i.e., use the macro FASTBIT_EMPTY_STRING_AS_NULL

On item 2, I have restored the escaping of using backslash as you
requested.

Will go through you tests cases next and see what else I need to change.

John


On 3/28/12 8:02 AM, Dominique Prunier wrote:
> Hi John,
> 
> Here is two simple test cases for my two issues with r505.
> 
> 1. empty-strings.zip: there is no way to select empty string anymore
> 
> **** FAILED (bad result count) with where clause << a='' >>
> 
> **** FAILED (bad result count) with where clause << a=0 >>
> 
> **** FAILED (hard) with where clause << a LIKE '' >>
> 
> 2. de-escaping.zip: there is no way to select a string with a reserved char 
> in it
> 
> **** FAILED (bad result count) with where clause << a='it\'s good' >>
> 
> Warning -- ibis::whereParser encountered syntax error, unexpected name string 
> at location a='it's good':1.7
> Warning -- query[96w0-3xn8Tg----1]::setWhereClause -- failed to parse the 
> WHERE clause "a='it's good'"
> Warning -- fastbit_build_query failed to assign conditions (a='it's good') to 
> a query
> **** FAILED (hard) with where clause << a='it's good' >>
> 
> Warning -- ibis::whereParser encountered syntax error, unexpected $undefined 
> at location a=it's good:1.5
> Warning -- query[96w0-3xn8Tg----2]::setWhereClause -- failed to parse the 
> WHERE clause "a=it's good"
> Warning -- fastbit_build_query failed to assign conditions (a=it's good) to a 
> query
> **** FAILED (hard) with where clause << a=it's good >>
> 
> With the attached patch, it fixes my test cases (except a LIKE '' that didn't 
> work before and a='it's good' and a=it's good whieh are invalid where 
> clauses).
> 
> Thanks,
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Dominique Prunier
> Sent: Wednesday, March 28, 2012 10:04 AM
> To: K. John Wu
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
> 
> Hey John,
> 
> I'm trying r505 right know, i have two questions/remarks:
> 
>  1. I saw a change about the dictionary now able to accept empty strings and 
> treat them as normal strings instead of NULLs. This breaks quite a lot of my 
> test cases, specifically those which used to test category=0 or !=0 to 
> simulate the IS NULL/IS NOT NULL because testing for an empty string doesn't 
> work (='' or LIKE '' used to fail or return invalid results). What would be 
> your recommendation for that use case ? Could we add a define for those 
> relying on this behavior ? Something like:
> 
> -    //if (*str == 0) return 0;
> +#ifdef FASTBIT_EMPTY_STRING_AS_NULL
> +    if (*str == 0) return 0;
> +#endif
> 
>  2. I can't find anywhere the de-escaping patch in r505, am i missing 
> something ?
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Wednesday, March 28, 2012 1:55 AM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
> 
> Hi, Dominique,
> 
> There are some updates to involving merge of dictionaries and to
> exercise the operations involving unmatched quotes.  The new code is
> SVN R505.
> 
> Please let me know if you have any additional questions.
> 
> John
> 
> 
> On 3/27/12 1:03 PM, Dominique Prunier wrote:
>> Hey John,
>>
>> There is definitely a need for FastBit escaping. The escaping problem is not 
>> at the shell level (though we could have one there) since in pure C/C++ 
>> code, there's no shell involved when i'm building a where clause from a 
>> string. The problem is at the where clause parsing level (in the lexer to be 
>> more precise) to be able to express string literals among other things (and 
>> not only metas, it is also white spaces, ...).
>> Typically, my test that fails is as simple as calling 
>> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected to create 
>> a qString << a = it's good >> but now, it creates a qString << a = it\'s 
>> good >> which is wrong. The attached patch restores the descaping, but _not_ 
>> the double quote stripping (because it is already handled in the lexer). All 
>> my test cases works after applying it on r503.
> 
> The constructors for ibis::qString and ibis::qLike really should not
> strip away anything.  In your case, you should be able to do the following
> 
> .../tcapi data-dir "a=\"it's good\""
> 
> if you are using fastbit_build_query, you can use the same string
> fastbit_build_query(..., ..., "a=\"it's good\"");
> 
> Since FastBit regular expression only support four meta characters ? *
> _ %.  There is no need to escape anything.  It is probably cleaner to
> not introduce stripping of anything special (except the outer most
> quotes, which should be only done once).
> 
>>
>> About the decompression, thanks for the link, this is very interesting stuff 
>> ! But my point here is not about questing the fact the decompression can be 
>> better is some case, i was just under the impression that the hit vector 
>> given to category::patternSearch was _always_ already decompressed since it 
>> is ultimately a bitvector that has been created from scratch for query 
>> evaluation (it would need verification though). My guess is that the few 
>> percent of performance i'm loosing here are attributable to the check 
>> (hits.isCompressed() && hits.bytes()*mult + bv->bytes() > hits.size()), 
>> since it gets executed _A LOT_ of times. I'll try to investigate it a little 
>> bit further.
> 
> I have rearranged the tests for decompression in layers which
> hopefully will eliminate the need to perform more expensive tests in
> your case that presumably involve a fairly small number of values.
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to