Hey John, There is definitely a need for FastBit escaping. The escaping problem is not at the shell level (though we could have one there) since in pure C/C++ code, there's no shell involved when i'm building a where clause from a string. The problem is at the where clause parsing level (in the lexer to be more precise) to be able to express string literals among other things (and not only metas, it is also white spaces, ...). Typically, my test that fails is as simple as calling fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected to create a qString << a = it's good >> but now, it creates a qString << a = it\'s good >> which is wrong. The attached patch restores the descaping, but _not_ the double quote stripping (because it is already handled in the lexer). All my test cases works after applying it on r503.
About the decompression, thanks for the link, this is very interesting stuff ! But my point here is not about questing the fact the decompression can be better is some case, i was just under the impression that the hit vector given to category::patternSearch was _always_ already decompressed since it is ultimately a bitvector that has been created from scratch for query evaluation (it would need verification though). My guess is that the few percent of performance i'm loosing here are attributable to the check (hits.isCompressed() && hits.bytes()*mult + bv->bytes() > hits.size()), since it gets executed _A LOT_ of times. I'll try to investigate it a little bit further. Thanks, -----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Tuesday, March 27, 2012 3:33 PM To: FastBit Users Cc: Dominique Prunier Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 On 3/27/12 9:41 AM, Dominique Prunier wrote: > Hey John, > > I'm testing r503 right away (my usual performance and regression testing). > > Multiple remarks: > * in qExpr.cpp, i think you removed too much code. The superfluous part was > the stripping of the quotes (which is already handled in the lexer), but you > still have to keep the de-escaping logic here. Presumably, the characters are special to a shell will be stripped away by the shell already. What passed through argv to a program should be exactly what user intended. None of the meta characters "?*_%" are special or require escaping, therefore, there is no need for FastBit to have an escape character. Let me know if you have a use case there requires an escape. > * about the decompression, shouldn't the hit vector already be uncompressed > (it is a vector created from scratch) ? it seems that r503 is a few percent > slower than my original patch There is a need in some cases to decompress the bitvector to avoid memory allocations. Exactly when to do this is obviously a subject of further discussion. There was an extensive study on this some time ago <http://lbl.gov/%7Ekwu/ps/LBNL-54673.html>. Maybe this case deserves a detailed review.. > > Except that, it seems all good. Hopefully, i'll give a closer look this > afternoon and get my stable version. > > Thanks, > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of K. John Wu > Sent: Tuesday, March 27, 2012 1:38 AM > To: FastBit Users > Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 > > Hi, Dominique, > > All you changes should be in SVN Revision 503. Please take a look. > > John > > > On 3/26/12 6:07 PM, Dominique Prunier wrote: >> Hey John, >> >> >> About the compression/decompression, i'm not sure what they were useful for, >> i simply removed them and it simply gave me a nice performance boost :) >> Do you think these 2 performance patches would be worth including into the >> subversion (along with the few other bug fixes i sent today) ? >> >> Thanks, >> >> ________________________________________ >> From: K. John Wu [[email protected]] >> Sent: March-26-12 7:18 PM >> To: FastBit Users >> Cc: Dominique Prunier >> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >> >> Hi, Dominique, >> >> Thanks for the performance patches. >> >> The real problem with the decompress call in patternSearch is that as >> is the call to decompress seems to be repeated with without any limit. >> It should only be called once. I will examine all other calls to >> decompress to make sure they are only called once.. >> >> John >> >> >> On 3/26/12 9:43 AM, Dominique Prunier wrote: >>> To go even further, it is also possible to add the attached patch __on >>> top__ of the previous one: >>> >>> >>> >>> · don't use bitvector::sloppyCount at all in >>> category::patternSearch >>> >>> · need to AND with the mast when there's no match in >>> query::doEvaluate >>> >>> >>> >>> Thanks, >>> >>> >>> >>> *From:*Dominique Prunier >>> *Sent:* Monday, March 26, 2012 11:46 AM >>> *To:* FastBit Users >>> *Subject:* PATCH: perf boost on top of r501 >>> >>> >>> >>> Hey John, >>> >>> >>> >>> I tried the performance of the sloppyCount and it seems quite >>> convincing. It is only 1% slower than returning constants (1 or 0) and >>> could potentially speedup trivial cases from time to time i guess. >>> >>> >>> >>> So i ported my performance patch on top of r501 and replaced my >>> “return 1 or 0” by some calls to sloppyCount. Here is what the patch >>> includes: >>> >>> · in category::patternSearch, skip the >>> decompression/compression phase (this one has a __HUGE__ impact on perf) >>> >>> · in category::patternSearch, use sloppyCount for estimation >>> >>> · in query::doEvaluate, fix returned value when term == null >>> (was 0, now is row count) >>> >>> · in query::doEvaluate, even more usage of >>> bitvector::sloppyCount instead of bitvector::cnt >>> >>> · in query::doEvaluate, removed complex OR evaluation >>> depending on bitmap sizes (maybe this can be restored in a less costly >>> way) >>> >>> >>> >>> So far, it passes all my tests (this is not a very intrusive change). >>> >>> Please tell me if this would be candidate for the next stable release. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> */Dominique Prunier/**//* >>> >>> APG Lead Developper >>> >>> Logo-W4N-100dpi >>> >>> 4388, rue Saint-Denis >>> >>> Bureau 309 >>> >>> Montreal (Quebec) H2J 2L1 >>> >>> Tel. +1 514-842-6767 x310 >>> >>> Fax +1 514-842-3989 >>> >>> [email protected] <mailto:[email protected]> >>> >>> www.watch4net.com <http://www.watch4net.com/> >>> >>> / / >>> >>> /This message is for the designated recipient only and may contain >>> privileged, proprietary, or otherwise private information. If you have >>> received it in error, please notify the sender immediately and delete >>> the original. Any other use of this electronic mail by you is prohibited. >>> >>> //Ce message est pour le récipiendaire désigné seulement et peut >>> contenir des informations privilégiées, propriétaires ou autrement >>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >>> immédiatement et effacez l'original. Toute autre utilisation de ce >>> courrier électronique par vous est prohibée./// >>> >>> >>> >>> >>> >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >> _______________________________________________ >> FastBit-users mailing list >> [email protected] >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
restore_deescaping.patch
Description: restore_deescaping.patch
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
