Hi John, I'm happy to hear that FastBit reached 1.3.0. It's been a lot of improvements over 1.2.8 ! I'm testing right now r511 (i guess it is the same as the stable release). So far, it behaves like r510 did for me: stable and fast !
I noticed something strange in the packaging however, it seems that there are some differences between the repository at r511 are and the zip: --- r511 +++ ibis1.3.0 -contrib/fbmerge/.deps/fbmerge.Po -doc/contact.html -doc/header.html -doc/rara.html +src/._colValues.cpp +src/._colValues.h -src/fastbit-0.7.pc.in +src/fastbit-config.h -tests/scripts/star2002.sh +win/._MinGW.mak -win/static_pthread_init.c -win/static_pthread.patch -win/xMinGW.mak While we are at it, i have a small fix to push to xMinGW.mak (since we rename it, we have to change the following line): fastbit.dll: $(FRC) - $(MAKE) -f MinGW.mak DEF="$(DEF) -DCXX_USE_DLL -DDLL_EXPORT" $(OBJ) + $(MAKE) -f xMinGW.mak DEF="$(DEF) -DCXX_USE_DLL -DDLL_EXPORT" $(OBJ) These are only minor things, i don't think it is really worth changing for 1.3.0. Thanks, -----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Monday, April 02, 2012 11:32 PM To: Dominique Prunier Cc: FastBit Users Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 Hi, Dominique, Here is the link to the latest stable release <https://codeforge.lbl.gov/frs/download.php/389/fastbit-ibis1.3.0.tar.gz> Let me know spot anything that needs attention. Thanks for all the help. John On 4/2/12 6:51 AM, Dominique Prunier wrote: > Hey John, > > I just tested r510 and i'm please to say that not only it passes all my test > cases (with -DFASTBIT_EMPTY_STRING_AS_NULL), but it is also unbelievably > faster (more than 25% in my test) ! I'm not sure exactly why but this is > greatly appreciated. Could it be the move from the simple loop in > category::patternSearch to the much more sophisticated index::sumBins ? > > Do you have a lot of thing to change before the stable release ? I admit that > i'd prefer use a stable release too rather than r506. > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Saturday, March 31, 2012 1:33 AM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 > > Hi, Dominique, > > I have added a couple of more test cases involving special characters > in the test suite used by 'make check'. The latest SVN revision is 510. > > Things seem to work OK. I would like to wrap things for a stable > research soon. If you find anything that needs attention please let > me know. > > John > > > On 3/29/12 1:23 PM, Dominique Prunier wrote: >> Hey John, >> >> I found the problem. When the exact value doesn't exists in the dictionary, >> operator[] is supposed to return size()+1 but here it returns 3 instead of 2 >> (it returns raw_.size()+1 instead of key_.size()+1) which make the following >> code fail in dictionary::patternSearch: >> >> if (!meta) { >> uint32_t code = operator[](prefix.c_str()); >> if (code != size() + 1) { >> matches.push_back(code); >> } >> return; >> } >> >> We probably never saw it before for at least 3 reasons: >> * it only affects linear search from dictionary::operator[] since in the >> other case it returns raw_.size() so it can't happen is the dic is larger >> than 16 >> * index::getBitvector that was previously used in category::patternSearch >> validate the given index and return 0 if it is out of bounds >> * category::patternSearch was validating that index::getBitvector didn't >> return NULL >> >> Thanks, >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Thursday, March 29, 2012 3:48 PM >> To: Dominique Prunier >> Cc: FastBit Users >> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >> >> Hi, Dominique, >> >> The query seg faulted in r507 because ibis::dictionary::patternSearch >> placed the number 3 into the output array, however, the dictionary has >> only one value "\"val%\"". This creates an opportunity for >> ibis::index::sumBins attempt to access bits[3], but there are only two >> values in bits. Any idea why is ibis::dictionary::patternSearch >> producing 3? >> >> John >> >> >> On 3/29/12 10:05 AM, Dominique Prunier wrote: >>> Hey John, >>> >>> I'm sorry, my test case was not really "minimal" and too complex for what i >>> wanted to show you. >>> Please don't change the escaping, the query evaluation works just fine. >>> >>> Just for your information, this test case was here to validate one thing: >>> consistent de-escaping in all layers: >>> - C source : "'\"val\\\\%'" >>> - Compiled : '"val\\%' >>> - After lexer : "val\\% >>> - In qLike : "val\% >>> Ultimately, this tests try to validate that % escaping works in >>> patternMatch by not treating % as a wildcard but as a regular character, >>> thus this is not supposed to return any match (it ends up being an >>> equivalent of = "val%). And it works just as expected. There is nothing to >>> change about it. >>> >>> However, it just happened to be a query that used to segfault in r507, but >>> my guess is that it was related to one of these specificities: >>> * this column only has a single value >>> * there is no nulls >>> * the query returns no results >>> >>> To give you a simpler example, the query: >>> >>> SELECT single_value_category FROM <partition> WHERE single_value_category >>> LIKE 'missing' >>> >>> segfaulted as well in r507 for the same reasons. And this exemple doesn't >>> involve any fancy de-escaping but share the same specificities (single >>> value, no nulls, no results). >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Thursday, March 29, 2012 12:41 PM >>> To: Dominique Prunier >>> Cc: FastBit Users >>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>> >>> Hi, Dominique, >>> >>> I see what you are trying to do. The weird_category only has value >>> "\"val%\"", therefore, the where clause "weird_category LIKE >>> '\"val\\%'" should match every row, but "weird_category LIKE 'val\\%'" >>> should match no row. >>> >>> If you can step into category::patternSearch, you will see that >>> "val\\%" has been stripped to "val%", which will have the same outcome >>> as you intended, but it stripped away too many back slashes. You >>> intend category::patternSearch to see "val\%" to match the literal >>> percent sign (%), however, because the back slash was stripped twice, >>> you only got a bare percent sign left, which means it is a wild card >>> character, not a literal character as you intended. >>> >>> My theory is this. The string "val\\%" becomes "val\%" when it gets >>> to inside the C code. The runtime system has stripped away the first >>> back slash. The constructor of ibis::qLike take away the second one. >>> >>> Since we have gone back and forth many times on this, I will wait for >>> you confirmation before doing anything about it. >>> >>> John >>> >>> >>> On 3/29/12 8:09 AM, Dominique Prunier wrote: >>>> Hey John, >>>> >>>> When i'll have some time, i'll make my test suite dump the queries and >>>> expected results so that you try it yourself. It only tests category (and >>>> a small bit of long) data types for the things i'm doing but it can be >>>> useful. >>>> >>>> In the meantime, here is my test partition (see attached) and the query >>>> that generated the segfault with r507 was: >>>> >>>> SELECT weird_category FROM <partition> WHERE weird_category LIKE '"val\\%' >>>> (which is not supposed to return any result) >>>> >>>> It might help you understand what could have happened. >>>> >>>> Thanks, >>>> >>>> -----Original Message----- >>>> From: K. John Wu [mailto:[email protected]] >>>> Sent: Thursday, March 29, 2012 10:55 AM >>>> To: Dominique Prunier >>>> Cc: FastBit Users >>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>> >>>> Good to know. The problem was then I did not check the array bounds. >>>> Odd though, I did not think the values could be out of bounds.. >>>> >>>> On 3/29/12 7:24 AM, Dominique Prunier wrote: >>>>> Hi John, >>>>> >>>>> I ran the query with r507, and apparently, the problem was there: >>>>> >>>>> ==14864== Invalid read of size 8 >>>>> ==14864== at 0x550067A: ibis::index::sumBins(ibis::array_t<unsigned >>>>> int> const&, ibis::bitvector&) const (index.cpp:6371) >>>>> ==14864== by 0x576E76D: ibis::category::patternSearch(char const*, >>>>> ibis::bitvector&) const (category.cpp:871) >>>>> ==14864== by 0x509933C: ibis::part::patternSearch(ibis::qLike const&, >>>>> ibis::bitvector&) const (part.cpp:3260) >>>>> ==14864== by 0x545FF03: ibis::query::doEvaluate(ibis::qExpr const*, >>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3962) >>>>> ==14864== by 0x545B63D: ibis::query::computeHits() (query.cpp:2771) >>>>> ==14864== by 0x5452413: ibis::query::evaluate(bool) (query.cpp:847) >>>>> ==14864== by 0x587328C: fastbit_build_query (capi.cpp:477) >>>>> ==14864== by 0x4030F8: main (main.cpp:38) >>>>> >>>>> In r508, the problem is gone ! >>>>> >>>>> Thanks, >>>>> >>>>> -----Original Message----- >>>>> From: K. John Wu [mailto:[email protected]] >>>>> Sent: Thursday, March 29, 2012 12:33 AM >>>>> To: Dominique Prunier >>>>> Cc: FastBit Users >>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>> >>>>> Hi, Dominique, >>>>> >>>>> The stack trace shows that it is invoking a copy constructor of the >>>>> ibis::bitvector class when it encountered the seg fault. Not sure >>>>> what is the problem here. I have tried to reproduce the problem by >>>>> modifying an existing test suite check-maurel. However, the code >>>>> seems to work. >>>>> >>>>> There is a minor change ibis::index::sumBins to check that the >>>>> incoming array contains only values less than bits.size() (the number >>>>> of bitvectors stored in an index object - ibis::direkte is an index >>>>> object). This might prevent attempting to out-of-bound accesses. The >>>>> change is in SVN Revision 508. >>>>> >>>>> If you are able to find more information. Please let me know. >>>>> >>>>> John >>>>> >>>>> >>>>> On 3/28/12 2:07 PM, Dominique Prunier wrote: >>>>>> Woops, r507 segfaults right away in: >>>>>> >>>>>> C [libfastbit.so.0.0.9+0x800fbb] >>>>>> ibis::bitvector::bitvector(ibis::bitvector const&)+0x23 >>>>>> C [libfastbit.so.0.0.9+0x6d16c1] >>>>>> ibis::index::sumBins(ibis::array_t<unsigned> const&, ibis::bitvector&) >>>>>> const+0x103 >>>>>> C [libfastbit.so.0.0.9+0x93f76e] ibis::category::patternSearch(char >>>>>> const*, ibis::bitvector&) const+0x3c8 >>>>>> C [libfastbit.so.0.0.9+0x26a33d] ibis::part::patternSearch(ibis::qLike >>>>>> const&, ibis::bitvector&) const+0xcf >>>>>> C [libfastbit.so.0.0.9+0x630f04] ibis::query::doEvaluate(ibis::qExpr >>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const+0xe1a >>>>>> C [libfastbit.so.0.0.9+0x62c63e] ibis::query::computeHits()+0x356 >>>>>> C [libfastbit.so.0.0.9+0x623414] ibis::query::evaluate(bool)+0x4a6 >>>>>> C [libfastbit.so.0.0.9+0xa4428d] float+0x3c0 >>>>>> C [jna1347076628273574295.tmp+0x11b20] float+0x4c >>>>>> >>>>>> I'll try to investigate this latter but the last query of my tests it >>>>>> executed was >>>>>> >>>>>> SELECT weird_category FROM /tmp/junit8378310225719591578 WHERE >>>>>> weird_category LIKE '"val\\%' >>>>>> >>>>>> I'll try to investigate this, but it might be related to the fact that >>>>>> this query is supposed to return no result. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -----Original Message----- >>>>>> From: [email protected] >>>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>>> Prunier >>>>>> Sent: Wednesday, March 28, 2012 4:58 PM >>>>>> To: K. John Wu >>>>>> Cc: FastBit Users >>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>> >>>>>> Hey John, >>>>>> >>>>>> I'll have a look to r507, probably tomorrow. To limit the risk, i've >>>>>> chosen the well tested r506 as my stable version using the >>>>>> FASTBIT_EMPTY_STRING_AS_NULL define. >>>>>> >>>>>> My problem with null mask is that i write my partitions in pure Java >>>>>> code. It is quite straightforward to write data files and -part.txt, but >>>>>> writing a NULL mask is something else. Besides, I don't really need a >>>>>> difference between NULL and empty anyway, since treating empty strings >>>>>> as NULLs is no different than some RDBMS engines that we support (Oracle >>>>>> does that for example). >>>>>> >>>>>> I have one quick question about category columns. Do they really have a >>>>>> separate NULL mask or do they use the bitmap stored at key 0 as their >>>>>> NULL mask ? If so, that means that keeping the >>>>>> FASTBIT_EMPTY_STRING_AS_NULL would allow me to indirectly build some >>>>>> real NULL values which would work with the NOT NULL syntax (which, by >>>>>> the way, in SQL is IS NOT NULL/IS NULL, maybe it would be worth changing >>>>>> it). >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -----Original Message----- >>>>>> From: K. John Wu [mailto:[email protected]] >>>>>> Sent: Wednesday, March 28, 2012 3:26 PM >>>>>> To: Dominique Prunier >>>>>> Cc: FastBit Users >>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>> >>>>>> Hi, Dominique, >>>>>> >>>>>> I have added code to accept "colname NOT NULL" in the where clauses. >>>>>> The new code is in SVN Revision 507. >>>>>> >>>>>> The new revision should also consolidate the handling of many >>>>>> bitvectors in category::patternSearch and address the issue of >>>>>> possibly missing calls to index::activate (which leads to incorrect >>>>>> answers). >>>>>> >>>>>> You can input null values through tablex::readCSV. There is an >>>>>> example in tests/Makefile.am in the way it generates data partition >>>>>> w7. The test case 15 of really-small also makes use of the new >>>>>> expression "NOT NULL". >>>>>> >>>>>> Let me know if you spot any problems. >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> On 3/28/12 10:45 AM, Dominique Prunier wrote: >>>>>>> Hey John, >>>>>>> >>>>>>> My guess is that empty strings are used more often that we'd think for >>>>>>> the same use case: use them as NULL marker because it is the easiest >>>>>>> way to both insert and query (which is exactly what i use them for). >>>>>>> Even if it is not an exact synonym of the SQL NULL (especially for >>>>>>> propagation), for most use cases, it is close enough. The fact that its >>>>>>> key is fixed and well known provides a valid workaround to simulate IS >>>>>>> NULL/IS NOT NULL predicates and since it is excluded from most string >>>>>>> related predicates (most importantly, a LIKE '%' wont select it), it is >>>>>>> a quite easy and intuitive to use. >>>>>>> >>>>>>> Besides, the column data file for strings can't represent NULL values >>>>>>> so you would have to deal with null masks, which make things >>>>>>> significantly more complex (especially, to create a partition from >>>>>>> scratch without using the FastBit library) and potentially more >>>>>>> resource hungry (have to read/write null mask, ...). >>>>>>> >>>>>>> That being said, my only problem with making the distinction between >>>>>>> NULL and empty in the dictionary is the current revision's inability to >>>>>>> query empty strings and/or NULLs (which is currently worked around by >>>>>>> the uint alternative, =0) and the fact that it matches a pattern like >>>>>>> '%'. Since there is no guarantee on the empty string key, i don't have >>>>>>> any workaround for now. >>>>>>> >>>>>>> For the time being, i'll stick with the FASTBIT_EMPTY_STRING_AS_NULL >>>>>>> define waiting for the ambiguity to vanish. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>> Sent: Wednesday, March 28, 2012 1:22 PM >>>>>>> To: Dominique Prunier >>>>>>> Cc: FastBit Users >>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>> >>>>>>> The ability to distinguish between empty string and null string can be >>>>>>> useful. The ambiguity in the current code is related to some earlier >>>>>>> design oversight, which we intend to correct. My hope is that not too >>>>>>> many people have empty strings that want to do something with, so >>>>>>> breaking this compatibility is not a big deal. >>>>>>> >>>>>>> There are also a problem with the implementation of >>>>>>> category::patternSearch that neglects to read the bitmaps from index >>>>>>> files (neglecting to call index::activate). I am consolidating some >>>>>>> code to make use of the existing strategies involving summation of a >>>>>>> large number of bitmaps in ibis::index::sumBits and >>>>>>> ibis::index::sumBins. These functions should encapsulate the idea of >>>>>>> when to decompress bitmaps a lot better than the simple version >>>>>>> currently in category::patternSearch. >>>>>>> >>>>>>> I am doing tests now and will let everyone know when I am ready to >>>>>>> check the code in. >>>>>>> >>>>>>> John >>>>>>> >>>>>>> >>>>>>> On 3/28/12 8:44 AM, Dominique Prunier wrote: >>>>>>>> Thanks John for your help. It is always very appreciated. >>>>>>>> >>>>>>>> With the macro FASTBIT_EMPTY_STRING_AS_NULL enabled, all my test cases >>>>>>>> now works. I'll test performance next but it could be a good candidate >>>>>>>> for my stable version. >>>>>>>> >>>>>>>> About the FASTBIT_EMPTY_STRING_AS_NULL, what do you think would be the >>>>>>>> best default ? For my use case, it is obviously to enable it, but for >>>>>>>> everybody else i don't know. My concern is the backward compatibility >>>>>>>> here, especially the fact that it influences the index creation, and >>>>>>>> not usage. This means that somebody who upgrades won't notice this >>>>>>>> change before it regenerates indexes. What's your opinion on this ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>> Sent: Wednesday, March 28, 2012 11:20 AM >>>>>>>> To: Dominique Prunier >>>>>>>> Cc: FastBit Users >>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>>> >>>>>>>> Hi, Dominique, >>>>>>>> >>>>>>>> Thanks for the suggestions and and test cases. Just checked in a set >>>>>>>> of changes as SVN Revision 506. Here is a bit more explanation. >>>>>>>> >>>>>>>> On item 1, I have taken the option you've suggested in the first >>>>>>>> message, i.e., use the macro FASTBIT_EMPTY_STRING_AS_NULL >>>>>>>> >>>>>>>> On item 2, I have restored the escaping of using backslash as you >>>>>>>> requested. >>>>>>>> >>>>>>>> Will go through you tests cases next and see what else I need to >>>>>>>> change. >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> On 3/28/12 8:02 AM, Dominique Prunier wrote: >>>>>>>>> Hi John, >>>>>>>>> >>>>>>>>> Here is two simple test cases for my two issues with r505. >>>>>>>>> >>>>>>>>> 1. empty-strings.zip: there is no way to select empty string anymore >>>>>>>>> >>>>>>>>> **** FAILED (bad result count) with where clause << a='' >> >>>>>>>>> >>>>>>>>> **** FAILED (bad result count) with where clause << a=0 >> >>>>>>>>> >>>>>>>>> **** FAILED (hard) with where clause << a LIKE '' >> >>>>>>>>> >>>>>>>>> 2. de-escaping.zip: there is no way to select a string with a >>>>>>>>> reserved char in it >>>>>>>>> >>>>>>>>> **** FAILED (bad result count) with where clause << a='it\'s good' >> >>>>>>>>> >>>>>>>>> Warning -- ibis::whereParser encountered syntax error, unexpected >>>>>>>>> name string at location a='it's good':1.7 >>>>>>>>> Warning -- query[96w0-3xn8Tg----1]::setWhereClause -- failed to parse >>>>>>>>> the WHERE clause "a='it's good'" >>>>>>>>> Warning -- fastbit_build_query failed to assign conditions (a='it's >>>>>>>>> good') to a query >>>>>>>>> **** FAILED (hard) with where clause << a='it's good' >> >>>>>>>>> >>>>>>>>> Warning -- ibis::whereParser encountered syntax error, unexpected >>>>>>>>> $undefined at location a=it's good:1.5 >>>>>>>>> Warning -- query[96w0-3xn8Tg----2]::setWhereClause -- failed to parse >>>>>>>>> the WHERE clause "a=it's good" >>>>>>>>> Warning -- fastbit_build_query failed to assign conditions (a=it's >>>>>>>>> good) to a query >>>>>>>>> **** FAILED (hard) with where clause << a=it's good >> >>>>>>>>> >>>>>>>>> With the attached patch, it fixes my test cases (except a LIKE '' >>>>>>>>> that didn't work before and a='it's good' and a=it's good whieh are >>>>>>>>> invalid where clauses). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: [email protected] >>>>>>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>>>>>> Prunier >>>>>>>>> Sent: Wednesday, March 28, 2012 10:04 AM >>>>>>>>> To: K. John Wu >>>>>>>>> Cc: FastBit Users >>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>>>> >>>>>>>>> Hey John, >>>>>>>>> >>>>>>>>> I'm trying r505 right know, i have two questions/remarks: >>>>>>>>> >>>>>>>>> 1. I saw a change about the dictionary now able to accept empty >>>>>>>>> strings and treat them as normal strings instead of NULLs. This >>>>>>>>> breaks quite a lot of my test cases, specifically those which used to >>>>>>>>> test category=0 or !=0 to simulate the IS NULL/IS NOT NULL because >>>>>>>>> testing for an empty string doesn't work (='' or LIKE '' used to fail >>>>>>>>> or return invalid results). What would be your recommendation for >>>>>>>>> that use case ? Could we add a define for those relying on this >>>>>>>>> behavior ? Something like: >>>>>>>>> >>>>>>>>> - //if (*str == 0) return 0; >>>>>>>>> +#ifdef FASTBIT_EMPTY_STRING_AS_NULL >>>>>>>>> + if (*str == 0) return 0; >>>>>>>>> +#endif >>>>>>>>> >>>>>>>>> 2. I can't find anywhere the de-escaping patch in r505, am i missing >>>>>>>>> something ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>> Sent: Wednesday, March 28, 2012 1:55 AM >>>>>>>>> To: Dominique Prunier >>>>>>>>> Cc: FastBit Users >>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>>>> >>>>>>>>> Hi, Dominique, >>>>>>>>> >>>>>>>>> There are some updates to involving merge of dictionaries and to >>>>>>>>> exercise the operations involving unmatched quotes. The new code is >>>>>>>>> SVN R505. >>>>>>>>> >>>>>>>>> Please let me know if you have any additional questions. >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/27/12 1:03 PM, Dominique Prunier wrote: >>>>>>>>>> Hey John, >>>>>>>>>> >>>>>>>>>> There is definitely a need for FastBit escaping. The escaping >>>>>>>>>> problem is not at the shell level (though we could have one there) >>>>>>>>>> since in pure C/C++ code, there's no shell involved when i'm >>>>>>>>>> building a where clause from a string. The problem is at the where >>>>>>>>>> clause parsing level (in the lexer to be more precise) to be able to >>>>>>>>>> express string literals among other things (and not only metas, it >>>>>>>>>> is also white spaces, ...). >>>>>>>>>> Typically, my test that fails is as simple as calling >>>>>>>>>> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected >>>>>>>>>> to create a qString << a = it's good >> but now, it creates a >>>>>>>>>> qString << a = it\'s good >> which is wrong. The attached patch >>>>>>>>>> restores the descaping, but _not_ the double quote stripping >>>>>>>>>> (because it is already handled in the lexer). All my test cases >>>>>>>>>> works after applying it on r503. >>>>>>>>> >>>>>>>>> The constructors for ibis::qString and ibis::qLike really should not >>>>>>>>> strip away anything. In your case, you should be able to do the >>>>>>>>> following >>>>>>>>> >>>>>>>>> .../tcapi data-dir "a=\"it's good\"" >>>>>>>>> >>>>>>>>> if you are using fastbit_build_query, you can use the same string >>>>>>>>> fastbit_build_query(..., ..., "a=\"it's good\""); >>>>>>>>> >>>>>>>>> Since FastBit regular expression only support four meta characters ? * >>>>>>>>> _ %. There is no need to escape anything. It is probably cleaner to >>>>>>>>> not introduce stripping of anything special (except the outer most >>>>>>>>> quotes, which should be only done once). >>>>>>>>> >>>>>>>>>> >>>>>>>>>> About the decompression, thanks for the link, this is very >>>>>>>>>> interesting stuff ! But my point here is not about questing the fact >>>>>>>>>> the decompression can be better is some case, i was just under the >>>>>>>>>> impression that the hit vector given to category::patternSearch was >>>>>>>>>> _always_ already decompressed since it is ultimately a bitvector >>>>>>>>>> that has been created from scratch for query evaluation (it would >>>>>>>>>> need verification though). My guess is that the few percent of >>>>>>>>>> performance i'm loosing here are attributable to the check >>>>>>>>>> (hits.isCompressed() && hits.bytes()*mult + bv->bytes() > >>>>>>>>>> hits.size()), since it gets executed _A LOT_ of times. I'll try to >>>>>>>>>> investigate it a little bit further. >>>>>>>>> >>>>>>>>> I have rearranged the tests for decompression in layers which >>>>>>>>> hopefully will eliminate the need to perform more expensive tests in >>>>>>>>> your case that presumably involve a fairly small number of values. >>>>>>>>> _______________________________________________ >>>>>>>>> FastBit-users mailing list >>>>>>>>> [email protected] >>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>> _______________________________________________ >>>>>> FastBit-users mailing list >>>>>> [email protected] >>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
