Hey John,

There is definitely a need for FastBit escaping. The escaping problem is not at 
the shell level (though we could have one there) since in pure C/C++ code, 
there's no shell involved when i'm building a where clause from a string. The 
problem is at the where clause parsing level (in the lexer to be more precise) 
to be able to express string literals among other things (and not only metas, 
it is also white spaces, ...).
Typically, my test that fails is as simple as calling fastbit_build_query(..., 
..., "a='it\\'s good'"). This is expected to create a qString << a = it's good 
>> but now, it creates a qString << a = it\'s good >> which is wrong. The 
attached patch restores the descaping, but _not_ the double quote stripping 
(because it is already handled in the lexer). All my test cases works after 
applying it on r503.

About the decompression, thanks for the link, this is very interesting stuff ! 
But my point here is not about questing the fact the decompression can be 
better is some case, i was just under the impression that the hit vector given 
to category::patternSearch was _always_ already decompressed since it is 
ultimately a bitvector that has been created from scratch for query evaluation 
(it would need verification though). My guess is that the few percent of 
performance i'm loosing here are attributable to the check (hits.isCompressed() 
&& hits.bytes()*mult + bv->bytes() > hits.size()), since it gets executed _A 
LOT_ of times. I'll try to investigate it a little bit further.

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Tuesday, March 27, 2012 3:33 PM
To: FastBit Users
Cc: Dominique Prunier
Subject: Re: [FastBit-users] PATCH: perf boost on top of r501



On 3/27/12 9:41 AM, Dominique Prunier wrote:
> Hey John,
> 
> I'm testing r503 right away (my usual performance and regression testing). 
> 
> Multiple remarks:
>  * in qExpr.cpp, i think you removed too much code. The superfluous part was 
> the stripping of the quotes (which is already handled in the lexer), but you 
> still have to keep the de-escaping logic here.

Presumably, the characters are special to a shell will be stripped
away by the shell already.  What passed through argv to a program
should be exactly what user intended.  None of the meta characters
"?*_%" are special or require escaping, therefore, there is no need
for FastBit to have an escape character.  Let me know if you have a
use case there requires an escape.

>  * about the decompression, shouldn't the hit vector already be uncompressed 
> (it is a vector created from scratch) ? it seems that r503 is a few percent 
> slower than my original patch

There is a need in some cases to decompress the bitvector to avoid
memory allocations.  Exactly when to do this is obviously a subject of
further discussion.  There was an extensive study on this some time
ago <http://lbl.gov/%7Ekwu/ps/LBNL-54673.html>.  Maybe this case
deserves a detailed review..

> 
> Except that, it seems all good. Hopefully, i'll give a closer look this 
> afternoon and get my stable version.
> 
> Thanks,
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of K. John Wu
> Sent: Tuesday, March 27, 2012 1:38 AM
> To: FastBit Users
> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
> 
> Hi, Dominique,
> 
> All you changes should be in SVN Revision 503.  Please take a look.
> 
> John
> 
> 
> On 3/26/12 6:07 PM, Dominique Prunier wrote:
>> Hey John,
>>
>>
>> About the compression/decompression, i'm not sure what they were useful for, 
>> i simply removed them and it simply gave me a nice performance boost :)
>> Do you think these 2 performance patches would be worth including into the 
>> subversion (along with the few other bug fixes i sent today) ?
>>
>> Thanks,
>>
>> ________________________________________
>> From: K. John Wu [[email protected]]
>> Sent: March-26-12 7:18 PM
>> To: FastBit Users
>> Cc: Dominique Prunier
>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>
>> Hi, Dominique,
>>
>> Thanks for the performance patches.
>>
>> The real problem with the decompress call in patternSearch is that as
>> is the call to decompress seems to be repeated with without any limit.
>>  It should only be called once.  I will examine all other calls to
>> decompress to make sure they are only called once..
>>
>> John
>>
>>
>> On 3/26/12 9:43 AM, Dominique Prunier wrote:
>>> To go even further, it is also possible to add the attached patch __on
>>> top__ of the previous one:
>>>
>>>
>>>
>>> ·         don't use bitvector::sloppyCount at all in
>>> category::patternSearch
>>>
>>> ·         need to AND with the mast when there's no match in
>>> query::doEvaluate
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> *From:*Dominique Prunier
>>> *Sent:* Monday, March 26, 2012 11:46 AM
>>> *To:* FastBit Users
>>> *Subject:* PATCH: perf boost on top of r501
>>>
>>>
>>>
>>> Hey John,
>>>
>>>
>>>
>>> I tried the performance of the sloppyCount and it seems quite
>>> convincing. It is only 1% slower than returning constants (1 or 0) and
>>> could potentially speedup trivial cases from time to time i guess.
>>>
>>>
>>>
>>> So i ported my performance patch on top of r501 and replaced my
>>> “return 1 or 0” by some calls to sloppyCount. Here is what the patch
>>> includes:
>>>
>>> ·         in category::patternSearch, skip the
>>> decompression/compression phase (this one has a __HUGE__ impact on perf)
>>>
>>> ·         in category::patternSearch, use sloppyCount for estimation
>>>
>>> ·         in query::doEvaluate, fix returned value when term == null
>>> (was 0, now is row count)
>>>
>>> ·         in query::doEvaluate, even more usage of
>>> bitvector::sloppyCount instead of bitvector::cnt
>>>
>>> ·         in query::doEvaluate, removed complex OR evaluation
>>> depending on bitmap sizes (maybe this can be restored in a less costly
>>> way)
>>>
>>>
>>>
>>> So far, it passes all my tests (this is not a very intrusive change).
>>>
>>> Please tell me if this would be candidate for the next stable release.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> */Dominique Prunier/**//*
>>>
>>>  APG Lead Developper
>>>
>>> Logo-W4N-100dpi
>>>
>>>  4388, rue Saint-Denis
>>>
>>>  Bureau 309
>>>
>>>  Montreal (Quebec)  H2J 2L1
>>>
>>>  Tel. +1 514-842-6767  x310
>>>
>>>  Fax +1 514-842-3989
>>>
>>>  [email protected] <mailto:[email protected]>
>>>
>>>  www.watch4net.com <http://www.watch4net.com/>
>>>
>>> / /
>>>
>>> /This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise private information. If you have
>>> received it in error, please notify the sender immediately and delete
>>> the original. Any other use of this electronic mail by you is prohibited.
>>>
>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>> contenir des informations privilégiées, propriétaires ou autrement
>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>> courrier électronique par vous est prohibée.///
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Attachment: restore_deescaping.patch
Description: restore_deescaping.patch

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to