Hi, Dominique, Thanks for the updates. Except the issue with win/xMinGW.mak, looks like all others will not affect whether the source tar ball will produce the correct library or executable. Therefore, I will only include them in the next stable release. In the mean time, your changes have been incorporated into the SVN Revision 512.
Thanks again. John On 4/3/12 6:30 AM, Dominique Prunier wrote: > Hi John, > > I'm happy to hear that FastBit reached 1.3.0. It's been a lot of improvements > over 1.2.8 ! > I'm testing right now r511 (i guess it is the same as the stable release). > So far, it behaves like r510 did for me: stable and fast ! > > I noticed something strange in the packaging however, it seems that there are > some differences between the repository at r511 are and the zip: > --- r511 > +++ ibis1.3.0 > -contrib/fbmerge/.deps/fbmerge.Po > -doc/contact.html > -doc/header.html > -doc/rara.html > +src/._colValues.cpp > +src/._colValues.h > -src/fastbit-0.7.pc.in > +src/fastbit-config.h > -tests/scripts/star2002.sh > +win/._MinGW.mak > -win/static_pthread_init.c > -win/static_pthread.patch > -win/xMinGW.mak > > While we are at it, i have a small fix to push to xMinGW.mak (since we rename > it, we have to change the following line): > > fastbit.dll: $(FRC) > - $(MAKE) -f MinGW.mak DEF="$(DEF) -DCXX_USE_DLL -DDLL_EXPORT" $(OBJ) > + $(MAKE) -f xMinGW.mak DEF="$(DEF) -DCXX_USE_DLL -DDLL_EXPORT" $(OBJ) > > These are only minor things, i don't think it is really worth changing for > 1.3.0. > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Monday, April 02, 2012 11:32 PM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 > > Hi, Dominique, > > Here is the link to the latest stable release > <https://codeforge.lbl.gov/frs/download.php/389/fastbit-ibis1.3.0.tar.gz> > > Let me know spot anything that needs attention. > > Thanks for all the help. > > John > > > > On 4/2/12 6:51 AM, Dominique Prunier wrote: >> Hey John, >> >> I just tested r510 and i'm please to say that not only it passes all my test >> cases (with -DFASTBIT_EMPTY_STRING_AS_NULL), but it is also unbelievably >> faster (more than 25% in my test) ! I'm not sure exactly why but this is >> greatly appreciated. Could it be the move from the simple loop in >> category::patternSearch to the much more sophisticated index::sumBins ? >> >> Do you have a lot of thing to change before the stable release ? I admit >> that i'd prefer use a stable release too rather than r506. >> >> Thanks, >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Saturday, March 31, 2012 1:33 AM >> To: Dominique Prunier >> Cc: FastBit Users >> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >> >> Hi, Dominique, >> >> I have added a couple of more test cases involving special characters >> in the test suite used by 'make check'. The latest SVN revision is 510. >> >> Things seem to work OK. I would like to wrap things for a stable >> research soon. If you find anything that needs attention please let >> me know. >> >> John >> >> >> On 3/29/12 1:23 PM, Dominique Prunier wrote: >>> Hey John, >>> >>> I found the problem. When the exact value doesn't exists in the dictionary, >>> operator[] is supposed to return size()+1 but here it returns 3 instead of >>> 2 (it returns raw_.size()+1 instead of key_.size()+1) which make the >>> following code fail in dictionary::patternSearch: >>> >>> if (!meta) { >>> uint32_t code = operator[](prefix.c_str()); >>> if (code != size() + 1) { >>> matches.push_back(code); >>> } >>> return; >>> } >>> >>> We probably never saw it before for at least 3 reasons: >>> * it only affects linear search from dictionary::operator[] since in the >>> other case it returns raw_.size() so it can't happen is the dic is larger >>> than 16 >>> * index::getBitvector that was previously used in category::patternSearch >>> validate the given index and return 0 if it is out of bounds >>> * category::patternSearch was validating that index::getBitvector didn't >>> return NULL >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Thursday, March 29, 2012 3:48 PM >>> To: Dominique Prunier >>> Cc: FastBit Users >>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>> >>> Hi, Dominique, >>> >>> The query seg faulted in r507 because ibis::dictionary::patternSearch >>> placed the number 3 into the output array, however, the dictionary has >>> only one value "\"val%\"". This creates an opportunity for >>> ibis::index::sumBins attempt to access bits[3], but there are only two >>> values in bits. Any idea why is ibis::dictionary::patternSearch >>> producing 3? >>> >>> John >>> >>> >>> On 3/29/12 10:05 AM, Dominique Prunier wrote: >>>> Hey John, >>>> >>>> I'm sorry, my test case was not really "minimal" and too complex for what >>>> i wanted to show you. >>>> Please don't change the escaping, the query evaluation works just fine. >>>> >>>> Just for your information, this test case was here to validate one thing: >>>> consistent de-escaping in all layers: >>>> - C source : "'\"val\\\\%'" >>>> - Compiled : '"val\\%' >>>> - After lexer : "val\\% >>>> - In qLike : "val\% >>>> Ultimately, this tests try to validate that % escaping works in >>>> patternMatch by not treating % as a wildcard but as a regular character, >>>> thus this is not supposed to return any match (it ends up being an >>>> equivalent of = "val%). And it works just as expected. There is nothing to >>>> change about it. >>>> >>>> However, it just happened to be a query that used to segfault in r507, but >>>> my guess is that it was related to one of these specificities: >>>> * this column only has a single value >>>> * there is no nulls >>>> * the query returns no results >>>> >>>> To give you a simpler example, the query: >>>> >>>> SELECT single_value_category FROM <partition> WHERE single_value_category >>>> LIKE 'missing' >>>> >>>> segfaulted as well in r507 for the same reasons. And this exemple doesn't >>>> involve any fancy de-escaping but share the same specificities (single >>>> value, no nulls, no results). >>>> >>>> Thanks, >>>> >>>> -----Original Message----- >>>> From: K. John Wu [mailto:[email protected]] >>>> Sent: Thursday, March 29, 2012 12:41 PM >>>> To: Dominique Prunier >>>> Cc: FastBit Users >>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>> >>>> Hi, Dominique, >>>> >>>> I see what you are trying to do. The weird_category only has value >>>> "\"val%\"", therefore, the where clause "weird_category LIKE >>>> '\"val\\%'" should match every row, but "weird_category LIKE 'val\\%'" >>>> should match no row. >>>> >>>> If you can step into category::patternSearch, you will see that >>>> "val\\%" has been stripped to "val%", which will have the same outcome >>>> as you intended, but it stripped away too many back slashes. You >>>> intend category::patternSearch to see "val\%" to match the literal >>>> percent sign (%), however, because the back slash was stripped twice, >>>> you only got a bare percent sign left, which means it is a wild card >>>> character, not a literal character as you intended. >>>> >>>> My theory is this. The string "val\\%" becomes "val\%" when it gets >>>> to inside the C code. The runtime system has stripped away the first >>>> back slash. The constructor of ibis::qLike take away the second one. >>>> >>>> Since we have gone back and forth many times on this, I will wait for >>>> you confirmation before doing anything about it. >>>> >>>> John >>>> >>>> >>>> On 3/29/12 8:09 AM, Dominique Prunier wrote: >>>>> Hey John, >>>>> >>>>> When i'll have some time, i'll make my test suite dump the queries and >>>>> expected results so that you try it yourself. It only tests category (and >>>>> a small bit of long) data types for the things i'm doing but it can be >>>>> useful. >>>>> >>>>> In the meantime, here is my test partition (see attached) and the query >>>>> that generated the segfault with r507 was: >>>>> >>>>> SELECT weird_category FROM <partition> WHERE weird_category LIKE >>>>> '"val\\%' (which is not supposed to return any result) >>>>> >>>>> It might help you understand what could have happened. >>>>> >>>>> Thanks, >>>>> >>>>> -----Original Message----- >>>>> From: K. John Wu [mailto:[email protected]] >>>>> Sent: Thursday, March 29, 2012 10:55 AM >>>>> To: Dominique Prunier >>>>> Cc: FastBit Users >>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>> >>>>> Good to know. The problem was then I did not check the array bounds. >>>>> Odd though, I did not think the values could be out of bounds.. >>>>> >>>>> On 3/29/12 7:24 AM, Dominique Prunier wrote: >>>>>> Hi John, >>>>>> >>>>>> I ran the query with r507, and apparently, the problem was there: >>>>>> >>>>>> ==14864== Invalid read of size 8 >>>>>> ==14864== at 0x550067A: ibis::index::sumBins(ibis::array_t<unsigned >>>>>> int> const&, ibis::bitvector&) const (index.cpp:6371) >>>>>> ==14864== by 0x576E76D: ibis::category::patternSearch(char const*, >>>>>> ibis::bitvector&) const (category.cpp:871) >>>>>> ==14864== by 0x509933C: ibis::part::patternSearch(ibis::qLike const&, >>>>>> ibis::bitvector&) const (part.cpp:3260) >>>>>> ==14864== by 0x545FF03: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3962) >>>>>> ==14864== by 0x545B63D: ibis::query::computeHits() (query.cpp:2771) >>>>>> ==14864== by 0x5452413: ibis::query::evaluate(bool) (query.cpp:847) >>>>>> ==14864== by 0x587328C: fastbit_build_query (capi.cpp:477) >>>>>> ==14864== by 0x4030F8: main (main.cpp:38) >>>>>> >>>>>> In r508, the problem is gone ! >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -----Original Message----- >>>>>> From: K. John Wu [mailto:[email protected]] >>>>>> Sent: Thursday, March 29, 2012 12:33 AM >>>>>> To: Dominique Prunier >>>>>> Cc: FastBit Users >>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>> >>>>>> Hi, Dominique, >>>>>> >>>>>> The stack trace shows that it is invoking a copy constructor of the >>>>>> ibis::bitvector class when it encountered the seg fault. Not sure >>>>>> what is the problem here. I have tried to reproduce the problem by >>>>>> modifying an existing test suite check-maurel. However, the code >>>>>> seems to work. >>>>>> >>>>>> There is a minor change ibis::index::sumBins to check that the >>>>>> incoming array contains only values less than bits.size() (the number >>>>>> of bitvectors stored in an index object - ibis::direkte is an index >>>>>> object). This might prevent attempting to out-of-bound accesses. The >>>>>> change is in SVN Revision 508. >>>>>> >>>>>> If you are able to find more information. Please let me know. >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> On 3/28/12 2:07 PM, Dominique Prunier wrote: >>>>>>> Woops, r507 segfaults right away in: >>>>>>> >>>>>>> C [libfastbit.so.0.0.9+0x800fbb] >>>>>>> ibis::bitvector::bitvector(ibis::bitvector const&)+0x23 >>>>>>> C [libfastbit.so.0.0.9+0x6d16c1] >>>>>>> ibis::index::sumBins(ibis::array_t<unsigned> const&, ibis::bitvector&) >>>>>>> const+0x103 >>>>>>> C [libfastbit.so.0.0.9+0x93f76e] ibis::category::patternSearch(char >>>>>>> const*, ibis::bitvector&) const+0x3c8 >>>>>>> C [libfastbit.so.0.0.9+0x26a33d] >>>>>>> ibis::part::patternSearch(ibis::qLike const&, ibis::bitvector&) >>>>>>> const+0xcf >>>>>>> C [libfastbit.so.0.0.9+0x630f04] ibis::query::doEvaluate(ibis::qExpr >>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const+0xe1a >>>>>>> C [libfastbit.so.0.0.9+0x62c63e] ibis::query::computeHits()+0x356 >>>>>>> C [libfastbit.so.0.0.9+0x623414] ibis::query::evaluate(bool)+0x4a6 >>>>>>> C [libfastbit.so.0.0.9+0xa4428d] float+0x3c0 >>>>>>> C [jna1347076628273574295.tmp+0x11b20] float+0x4c >>>>>>> >>>>>>> I'll try to investigate this latter but the last query of my tests it >>>>>>> executed was >>>>>>> >>>>>>> SELECT weird_category FROM /tmp/junit8378310225719591578 WHERE >>>>>>> weird_category LIKE '"val\\%' >>>>>>> >>>>>>> I'll try to investigate this, but it might be related to the fact that >>>>>>> this query is supposed to return no result. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: [email protected] >>>>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>>>> Prunier >>>>>>> Sent: Wednesday, March 28, 2012 4:58 PM >>>>>>> To: K. John Wu >>>>>>> Cc: FastBit Users >>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>> >>>>>>> Hey John, >>>>>>> >>>>>>> I'll have a look to r507, probably tomorrow. To limit the risk, i've >>>>>>> chosen the well tested r506 as my stable version using the >>>>>>> FASTBIT_EMPTY_STRING_AS_NULL define. >>>>>>> >>>>>>> My problem with null mask is that i write my partitions in pure Java >>>>>>> code. It is quite straightforward to write data files and -part.txt, >>>>>>> but writing a NULL mask is something else. Besides, I don't really need >>>>>>> a difference between NULL and empty anyway, since treating empty >>>>>>> strings as NULLs is no different than some RDBMS engines that we >>>>>>> support (Oracle does that for example). >>>>>>> >>>>>>> I have one quick question about category columns. Do they really have a >>>>>>> separate NULL mask or do they use the bitmap stored at key 0 as their >>>>>>> NULL mask ? If so, that means that keeping the >>>>>>> FASTBIT_EMPTY_STRING_AS_NULL would allow me to indirectly build some >>>>>>> real NULL values which would work with the NOT NULL syntax (which, by >>>>>>> the way, in SQL is IS NOT NULL/IS NULL, maybe it would be worth >>>>>>> changing it). >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>> Sent: Wednesday, March 28, 2012 3:26 PM >>>>>>> To: Dominique Prunier >>>>>>> Cc: FastBit Users >>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>> >>>>>>> Hi, Dominique, >>>>>>> >>>>>>> I have added code to accept "colname NOT NULL" in the where clauses. >>>>>>> The new code is in SVN Revision 507. >>>>>>> >>>>>>> The new revision should also consolidate the handling of many >>>>>>> bitvectors in category::patternSearch and address the issue of >>>>>>> possibly missing calls to index::activate (which leads to incorrect >>>>>>> answers). >>>>>>> >>>>>>> You can input null values through tablex::readCSV. There is an >>>>>>> example in tests/Makefile.am in the way it generates data partition >>>>>>> w7. The test case 15 of really-small also makes use of the new >>>>>>> expression "NOT NULL". >>>>>>> >>>>>>> Let me know if you spot any problems. >>>>>>> >>>>>>> John >>>>>>> >>>>>>> >>>>>>> On 3/28/12 10:45 AM, Dominique Prunier wrote: >>>>>>>> Hey John, >>>>>>>> >>>>>>>> My guess is that empty strings are used more often that we'd think for >>>>>>>> the same use case: use them as NULL marker because it is the easiest >>>>>>>> way to both insert and query (which is exactly what i use them for). >>>>>>>> Even if it is not an exact synonym of the SQL NULL (especially for >>>>>>>> propagation), for most use cases, it is close enough. The fact that >>>>>>>> its key is fixed and well known provides a valid workaround to >>>>>>>> simulate IS NULL/IS NOT NULL predicates and since it is excluded from >>>>>>>> most string related predicates (most importantly, a LIKE '%' wont >>>>>>>> select it), it is a quite easy and intuitive to use. >>>>>>>> >>>>>>>> Besides, the column data file for strings can't represent NULL values >>>>>>>> so you would have to deal with null masks, which make things >>>>>>>> significantly more complex (especially, to create a partition from >>>>>>>> scratch without using the FastBit library) and potentially more >>>>>>>> resource hungry (have to read/write null mask, ...). >>>>>>>> >>>>>>>> That being said, my only problem with making the distinction between >>>>>>>> NULL and empty in the dictionary is the current revision's inability >>>>>>>> to query empty strings and/or NULLs (which is currently worked around >>>>>>>> by the uint alternative, =0) and the fact that it matches a pattern >>>>>>>> like '%'. Since there is no guarantee on the empty string key, i don't >>>>>>>> have any workaround for now. >>>>>>>> >>>>>>>> For the time being, i'll stick with the FASTBIT_EMPTY_STRING_AS_NULL >>>>>>>> define waiting for the ambiguity to vanish. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>> Sent: Wednesday, March 28, 2012 1:22 PM >>>>>>>> To: Dominique Prunier >>>>>>>> Cc: FastBit Users >>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>>> >>>>>>>> The ability to distinguish between empty string and null string can be >>>>>>>> useful. The ambiguity in the current code is related to some earlier >>>>>>>> design oversight, which we intend to correct. My hope is that not too >>>>>>>> many people have empty strings that want to do something with, so >>>>>>>> breaking this compatibility is not a big deal. >>>>>>>> >>>>>>>> There are also a problem with the implementation of >>>>>>>> category::patternSearch that neglects to read the bitmaps from index >>>>>>>> files (neglecting to call index::activate). I am consolidating some >>>>>>>> code to make use of the existing strategies involving summation of a >>>>>>>> large number of bitmaps in ibis::index::sumBits and >>>>>>>> ibis::index::sumBins. These functions should encapsulate the idea of >>>>>>>> when to decompress bitmaps a lot better than the simple version >>>>>>>> currently in category::patternSearch. >>>>>>>> >>>>>>>> I am doing tests now and will let everyone know when I am ready to >>>>>>>> check the code in. >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> On 3/28/12 8:44 AM, Dominique Prunier wrote: >>>>>>>>> Thanks John for your help. It is always very appreciated. >>>>>>>>> >>>>>>>>> With the macro FASTBIT_EMPTY_STRING_AS_NULL enabled, all my test >>>>>>>>> cases now works. I'll test performance next but it could be a good >>>>>>>>> candidate for my stable version. >>>>>>>>> >>>>>>>>> About the FASTBIT_EMPTY_STRING_AS_NULL, what do you think would be >>>>>>>>> the best default ? For my use case, it is obviously to enable it, but >>>>>>>>> for everybody else i don't know. My concern is the backward >>>>>>>>> compatibility here, especially the fact that it influences the index >>>>>>>>> creation, and not usage. This means that somebody who upgrades won't >>>>>>>>> notice this change before it regenerates indexes. What's your opinion >>>>>>>>> on this ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>> Sent: Wednesday, March 28, 2012 11:20 AM >>>>>>>>> To: Dominique Prunier >>>>>>>>> Cc: FastBit Users >>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>>>> >>>>>>>>> Hi, Dominique, >>>>>>>>> >>>>>>>>> Thanks for the suggestions and and test cases. Just checked in a set >>>>>>>>> of changes as SVN Revision 506. Here is a bit more explanation. >>>>>>>>> >>>>>>>>> On item 1, I have taken the option you've suggested in the first >>>>>>>>> message, i.e., use the macro FASTBIT_EMPTY_STRING_AS_NULL >>>>>>>>> >>>>>>>>> On item 2, I have restored the escaping of using backslash as you >>>>>>>>> requested. >>>>>>>>> >>>>>>>>> Will go through you tests cases next and see what else I need to >>>>>>>>> change. >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/28/12 8:02 AM, Dominique Prunier wrote: >>>>>>>>>> Hi John, >>>>>>>>>> >>>>>>>>>> Here is two simple test cases for my two issues with r505. >>>>>>>>>> >>>>>>>>>> 1. empty-strings.zip: there is no way to select empty string anymore >>>>>>>>>> >>>>>>>>>> **** FAILED (bad result count) with where clause << a='' >> >>>>>>>>>> >>>>>>>>>> **** FAILED (bad result count) with where clause << a=0 >> >>>>>>>>>> >>>>>>>>>> **** FAILED (hard) with where clause << a LIKE '' >> >>>>>>>>>> >>>>>>>>>> 2. de-escaping.zip: there is no way to select a string with a >>>>>>>>>> reserved char in it >>>>>>>>>> >>>>>>>>>> **** FAILED (bad result count) with where clause << a='it\'s good' >> >>>>>>>>>> >>>>>>>>>> Warning -- ibis::whereParser encountered syntax error, unexpected >>>>>>>>>> name string at location a='it's good':1.7 >>>>>>>>>> Warning -- query[96w0-3xn8Tg----1]::setWhereClause -- failed to >>>>>>>>>> parse the WHERE clause "a='it's good'" >>>>>>>>>> Warning -- fastbit_build_query failed to assign conditions (a='it's >>>>>>>>>> good') to a query >>>>>>>>>> **** FAILED (hard) with where clause << a='it's good' >> >>>>>>>>>> >>>>>>>>>> Warning -- ibis::whereParser encountered syntax error, unexpected >>>>>>>>>> $undefined at location a=it's good:1.5 >>>>>>>>>> Warning -- query[96w0-3xn8Tg----2]::setWhereClause -- failed to >>>>>>>>>> parse the WHERE clause "a=it's good" >>>>>>>>>> Warning -- fastbit_build_query failed to assign conditions (a=it's >>>>>>>>>> good) to a query >>>>>>>>>> **** FAILED (hard) with where clause << a=it's good >> >>>>>>>>>> >>>>>>>>>> With the attached patch, it fixes my test cases (except a LIKE '' >>>>>>>>>> that didn't work before and a='it's good' and a=it's good whieh are >>>>>>>>>> invalid where clauses). >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: [email protected] >>>>>>>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>>>>>>> Prunier >>>>>>>>>> Sent: Wednesday, March 28, 2012 10:04 AM >>>>>>>>>> To: K. John Wu >>>>>>>>>> Cc: FastBit Users >>>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>>>>> >>>>>>>>>> Hey John, >>>>>>>>>> >>>>>>>>>> I'm trying r505 right know, i have two questions/remarks: >>>>>>>>>> >>>>>>>>>> 1. I saw a change about the dictionary now able to accept empty >>>>>>>>>> strings and treat them as normal strings instead of NULLs. This >>>>>>>>>> breaks quite a lot of my test cases, specifically those which used >>>>>>>>>> to test category=0 or !=0 to simulate the IS NULL/IS NOT NULL >>>>>>>>>> because testing for an empty string doesn't work (='' or LIKE '' >>>>>>>>>> used to fail or return invalid results). What would be your >>>>>>>>>> recommendation for that use case ? Could we add a define for those >>>>>>>>>> relying on this behavior ? Something like: >>>>>>>>>> >>>>>>>>>> - //if (*str == 0) return 0; >>>>>>>>>> +#ifdef FASTBIT_EMPTY_STRING_AS_NULL >>>>>>>>>> + if (*str == 0) return 0; >>>>>>>>>> +#endif >>>>>>>>>> >>>>>>>>>> 2. I can't find anywhere the de-escaping patch in r505, am i >>>>>>>>>> missing something ? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>>> Sent: Wednesday, March 28, 2012 1:55 AM >>>>>>>>>> To: Dominique Prunier >>>>>>>>>> Cc: FastBit Users >>>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>>>>>>> >>>>>>>>>> Hi, Dominique, >>>>>>>>>> >>>>>>>>>> There are some updates to involving merge of dictionaries and to >>>>>>>>>> exercise the operations involving unmatched quotes. The new code is >>>>>>>>>> SVN R505. >>>>>>>>>> >>>>>>>>>> Please let me know if you have any additional questions. >>>>>>>>>> >>>>>>>>>> John >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3/27/12 1:03 PM, Dominique Prunier wrote: >>>>>>>>>>> Hey John, >>>>>>>>>>> >>>>>>>>>>> There is definitely a need for FastBit escaping. The escaping >>>>>>>>>>> problem is not at the shell level (though we could have one there) >>>>>>>>>>> since in pure C/C++ code, there's no shell involved when i'm >>>>>>>>>>> building a where clause from a string. The problem is at the where >>>>>>>>>>> clause parsing level (in the lexer to be more precise) to be able >>>>>>>>>>> to express string literals among other things (and not only metas, >>>>>>>>>>> it is also white spaces, ...). >>>>>>>>>>> Typically, my test that fails is as simple as calling >>>>>>>>>>> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected >>>>>>>>>>> to create a qString << a = it's good >> but now, it creates a >>>>>>>>>>> qString << a = it\'s good >> which is wrong. The attached patch >>>>>>>>>>> restores the descaping, but _not_ the double quote stripping >>>>>>>>>>> (because it is already handled in the lexer). All my test cases >>>>>>>>>>> works after applying it on r503. >>>>>>>>>> >>>>>>>>>> The constructors for ibis::qString and ibis::qLike really should not >>>>>>>>>> strip away anything. In your case, you should be able to do the >>>>>>>>>> following >>>>>>>>>> >>>>>>>>>> .../tcapi data-dir "a=\"it's good\"" >>>>>>>>>> >>>>>>>>>> if you are using fastbit_build_query, you can use the same string >>>>>>>>>> fastbit_build_query(..., ..., "a=\"it's good\""); >>>>>>>>>> >>>>>>>>>> Since FastBit regular expression only support four meta characters ? >>>>>>>>>> * >>>>>>>>>> _ %. There is no need to escape anything. It is probably cleaner to >>>>>>>>>> not introduce stripping of anything special (except the outer most >>>>>>>>>> quotes, which should be only done once). >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> About the decompression, thanks for the link, this is very >>>>>>>>>>> interesting stuff ! But my point here is not about questing the >>>>>>>>>>> fact the decompression can be better is some case, i was just under >>>>>>>>>>> the impression that the hit vector given to category::patternSearch >>>>>>>>>>> was _always_ already decompressed since it is ultimately a >>>>>>>>>>> bitvector that has been created from scratch for query evaluation >>>>>>>>>>> (it would need verification though). My guess is that the few >>>>>>>>>>> percent of performance i'm loosing here are attributable to the >>>>>>>>>>> check (hits.isCompressed() && hits.bytes()*mult + bv->bytes() > >>>>>>>>>>> hits.size()), since it gets executed _A LOT_ of times. I'll try to >>>>>>>>>>> investigate it a little bit further. >>>>>>>>>> >>>>>>>>>> I have rearranged the tests for decompression in layers which >>>>>>>>>> hopefully will eliminate the need to perform more expensive tests in >>>>>>>>>> your case that presumably involve a fairly small number of values. >>>>>>>>>> _______________________________________________ >>>>>>>>>> FastBit-users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>>> _______________________________________________ >>>>>>> FastBit-users mailing list >>>>>>> [email protected] >>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
