Good to know. The problem was then I did not check the array bounds. Odd though, I did not think the values could be out of bounds..
On 3/29/12 7:24 AM, Dominique Prunier wrote: > Hi John, > > I ran the query with r507, and apparently, the problem was there: > > ==14864== Invalid read of size 8 > ==14864== at 0x550067A: ibis::index::sumBins(ibis::array_t<unsigned int> > const&, ibis::bitvector&) const (index.cpp:6371) > ==14864== by 0x576E76D: ibis::category::patternSearch(char const*, > ibis::bitvector&) const (category.cpp:871) > ==14864== by 0x509933C: ibis::part::patternSearch(ibis::qLike const&, > ibis::bitvector&) const (part.cpp:3260) > ==14864== by 0x545FF03: ibis::query::doEvaluate(ibis::qExpr const*, > ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3962) > ==14864== by 0x545B63D: ibis::query::computeHits() (query.cpp:2771) > ==14864== by 0x5452413: ibis::query::evaluate(bool) (query.cpp:847) > ==14864== by 0x587328C: fastbit_build_query (capi.cpp:477) > ==14864== by 0x4030F8: main (main.cpp:38) > > In r508, the problem is gone ! > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Thursday, March 29, 2012 12:33 AM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 > > Hi, Dominique, > > The stack trace shows that it is invoking a copy constructor of the > ibis::bitvector class when it encountered the seg fault. Not sure > what is the problem here. I have tried to reproduce the problem by > modifying an existing test suite check-maurel. However, the code > seems to work. > > There is a minor change ibis::index::sumBins to check that the > incoming array contains only values less than bits.size() (the number > of bitvectors stored in an index object - ibis::direkte is an index > object). This might prevent attempting to out-of-bound accesses. The > change is in SVN Revision 508. > > If you are able to find more information. Please let me know. > > John > > > On 3/28/12 2:07 PM, Dominique Prunier wrote: >> Woops, r507 segfaults right away in: >> >> C [libfastbit.so.0.0.9+0x800fbb] >> ibis::bitvector::bitvector(ibis::bitvector const&)+0x23 >> C [libfastbit.so.0.0.9+0x6d16c1] >> ibis::index::sumBins(ibis::array_t<unsigned> const&, ibis::bitvector&) >> const+0x103 >> C [libfastbit.so.0.0.9+0x93f76e] ibis::category::patternSearch(char >> const*, ibis::bitvector&) const+0x3c8 >> C [libfastbit.so.0.0.9+0x26a33d] ibis::part::patternSearch(ibis::qLike >> const&, ibis::bitvector&) const+0xcf >> C [libfastbit.so.0.0.9+0x630f04] ibis::query::doEvaluate(ibis::qExpr >> const*, ibis::bitvector const&, ibis::bitvector&) const+0xe1a >> C [libfastbit.so.0.0.9+0x62c63e] ibis::query::computeHits()+0x356 >> C [libfastbit.so.0.0.9+0x623414] ibis::query::evaluate(bool)+0x4a6 >> C [libfastbit.so.0.0.9+0xa4428d] float+0x3c0 >> C [jna1347076628273574295.tmp+0x11b20] float+0x4c >> >> I'll try to investigate this latter but the last query of my tests it >> executed was >> >> SELECT weird_category FROM /tmp/junit8378310225719591578 WHERE >> weird_category LIKE '"val\\%' >> >> I'll try to investigate this, but it might be related to the fact that this >> query is supposed to return no result. >> >> Thanks, >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Dominique Prunier >> Sent: Wednesday, March 28, 2012 4:58 PM >> To: K. John Wu >> Cc: FastBit Users >> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >> >> Hey John, >> >> I'll have a look to r507, probably tomorrow. To limit the risk, i've chosen >> the well tested r506 as my stable version using the >> FASTBIT_EMPTY_STRING_AS_NULL define. >> >> My problem with null mask is that i write my partitions in pure Java code. >> It is quite straightforward to write data files and -part.txt, but writing a >> NULL mask is something else. Besides, I don't really need a difference >> between NULL and empty anyway, since treating empty strings as NULLs is no >> different than some RDBMS engines that we support (Oracle does that for >> example). >> >> I have one quick question about category columns. Do they really have a >> separate NULL mask or do they use the bitmap stored at key 0 as their NULL >> mask ? If so, that means that keeping the FASTBIT_EMPTY_STRING_AS_NULL would >> allow me to indirectly build some real NULL values which would work with the >> NOT NULL syntax (which, by the way, in SQL is IS NOT NULL/IS NULL, maybe it >> would be worth changing it). >> >> Thanks, >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Wednesday, March 28, 2012 3:26 PM >> To: Dominique Prunier >> Cc: FastBit Users >> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >> >> Hi, Dominique, >> >> I have added code to accept "colname NOT NULL" in the where clauses. >> The new code is in SVN Revision 507. >> >> The new revision should also consolidate the handling of many >> bitvectors in category::patternSearch and address the issue of >> possibly missing calls to index::activate (which leads to incorrect >> answers). >> >> You can input null values through tablex::readCSV. There is an >> example in tests/Makefile.am in the way it generates data partition >> w7. The test case 15 of really-small also makes use of the new >> expression "NOT NULL". >> >> Let me know if you spot any problems. >> >> John >> >> >> On 3/28/12 10:45 AM, Dominique Prunier wrote: >>> Hey John, >>> >>> My guess is that empty strings are used more often that we'd think for the >>> same use case: use them as NULL marker because it is the easiest way to >>> both insert and query (which is exactly what i use them for). Even if it is >>> not an exact synonym of the SQL NULL (especially for propagation), for most >>> use cases, it is close enough. The fact that its key is fixed and well >>> known provides a valid workaround to simulate IS NULL/IS NOT NULL >>> predicates and since it is excluded from most string related predicates >>> (most importantly, a LIKE '%' wont select it), it is a quite easy and >>> intuitive to use. >>> >>> Besides, the column data file for strings can't represent NULL values so >>> you would have to deal with null masks, which make things significantly >>> more complex (especially, to create a partition from scratch without using >>> the FastBit library) and potentially more resource hungry (have to >>> read/write null mask, ...). >>> >>> That being said, my only problem with making the distinction between NULL >>> and empty in the dictionary is the current revision's inability to query >>> empty strings and/or NULLs (which is currently worked around by the uint >>> alternative, =0) and the fact that it matches a pattern like '%'. Since >>> there is no guarantee on the empty string key, i don't have any workaround >>> for now. >>> >>> For the time being, i'll stick with the FASTBIT_EMPTY_STRING_AS_NULL define >>> waiting for the ambiguity to vanish. >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Wednesday, March 28, 2012 1:22 PM >>> To: Dominique Prunier >>> Cc: FastBit Users >>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>> >>> The ability to distinguish between empty string and null string can be >>> useful. The ambiguity in the current code is related to some earlier >>> design oversight, which we intend to correct. My hope is that not too >>> many people have empty strings that want to do something with, so >>> breaking this compatibility is not a big deal. >>> >>> There are also a problem with the implementation of >>> category::patternSearch that neglects to read the bitmaps from index >>> files (neglecting to call index::activate). I am consolidating some >>> code to make use of the existing strategies involving summation of a >>> large number of bitmaps in ibis::index::sumBits and >>> ibis::index::sumBins. These functions should encapsulate the idea of >>> when to decompress bitmaps a lot better than the simple version >>> currently in category::patternSearch. >>> >>> I am doing tests now and will let everyone know when I am ready to >>> check the code in. >>> >>> John >>> >>> >>> On 3/28/12 8:44 AM, Dominique Prunier wrote: >>>> Thanks John for your help. It is always very appreciated. >>>> >>>> With the macro FASTBIT_EMPTY_STRING_AS_NULL enabled, all my test cases now >>>> works. I'll test performance next but it could be a good candidate for my >>>> stable version. >>>> >>>> About the FASTBIT_EMPTY_STRING_AS_NULL, what do you think would be the >>>> best default ? For my use case, it is obviously to enable it, but for >>>> everybody else i don't know. My concern is the backward compatibility >>>> here, especially the fact that it influences the index creation, and not >>>> usage. This means that somebody who upgrades won't notice this change >>>> before it regenerates indexes. What's your opinion on this ? >>>> >>>> Thanks, >>>> >>>> -----Original Message----- >>>> From: K. John Wu [mailto:[email protected]] >>>> Sent: Wednesday, March 28, 2012 11:20 AM >>>> To: Dominique Prunier >>>> Cc: FastBit Users >>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>> >>>> Hi, Dominique, >>>> >>>> Thanks for the suggestions and and test cases. Just checked in a set >>>> of changes as SVN Revision 506. Here is a bit more explanation. >>>> >>>> On item 1, I have taken the option you've suggested in the first >>>> message, i.e., use the macro FASTBIT_EMPTY_STRING_AS_NULL >>>> >>>> On item 2, I have restored the escaping of using backslash as you >>>> requested. >>>> >>>> Will go through you tests cases next and see what else I need to change. >>>> >>>> John >>>> >>>> >>>> On 3/28/12 8:02 AM, Dominique Prunier wrote: >>>>> Hi John, >>>>> >>>>> Here is two simple test cases for my two issues with r505. >>>>> >>>>> 1. empty-strings.zip: there is no way to select empty string anymore >>>>> >>>>> **** FAILED (bad result count) with where clause << a='' >> >>>>> >>>>> **** FAILED (bad result count) with where clause << a=0 >> >>>>> >>>>> **** FAILED (hard) with where clause << a LIKE '' >> >>>>> >>>>> 2. de-escaping.zip: there is no way to select a string with a reserved >>>>> char in it >>>>> >>>>> **** FAILED (bad result count) with where clause << a='it\'s good' >> >>>>> >>>>> Warning -- ibis::whereParser encountered syntax error, unexpected name >>>>> string at location a='it's good':1.7 >>>>> Warning -- query[96w0-3xn8Tg----1]::setWhereClause -- failed to parse the >>>>> WHERE clause "a='it's good'" >>>>> Warning -- fastbit_build_query failed to assign conditions (a='it's >>>>> good') to a query >>>>> **** FAILED (hard) with where clause << a='it's good' >> >>>>> >>>>> Warning -- ibis::whereParser encountered syntax error, unexpected >>>>> $undefined at location a=it's good:1.5 >>>>> Warning -- query[96w0-3xn8Tg----2]::setWhereClause -- failed to parse the >>>>> WHERE clause "a=it's good" >>>>> Warning -- fastbit_build_query failed to assign conditions (a=it's good) >>>>> to a query >>>>> **** FAILED (hard) with where clause << a=it's good >> >>>>> >>>>> With the attached patch, it fixes my test cases (except a LIKE '' that >>>>> didn't work before and a='it's good' and a=it's good whieh are invalid >>>>> where clauses). >>>>> >>>>> Thanks, >>>>> >>>>> -----Original Message----- >>>>> From: [email protected] >>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>> Prunier >>>>> Sent: Wednesday, March 28, 2012 10:04 AM >>>>> To: K. John Wu >>>>> Cc: FastBit Users >>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>> >>>>> Hey John, >>>>> >>>>> I'm trying r505 right know, i have two questions/remarks: >>>>> >>>>> 1. I saw a change about the dictionary now able to accept empty strings >>>>> and treat them as normal strings instead of NULLs. This breaks quite a >>>>> lot of my test cases, specifically those which used to test category=0 or >>>>> !=0 to simulate the IS NULL/IS NOT NULL because testing for an empty >>>>> string doesn't work (='' or LIKE '' used to fail or return invalid >>>>> results). What would be your recommendation for that use case ? Could we >>>>> add a define for those relying on this behavior ? Something like: >>>>> >>>>> - //if (*str == 0) return 0; >>>>> +#ifdef FASTBIT_EMPTY_STRING_AS_NULL >>>>> + if (*str == 0) return 0; >>>>> +#endif >>>>> >>>>> 2. I can't find anywhere the de-escaping patch in r505, am i missing >>>>> something ? >>>>> >>>>> Thanks, >>>>> >>>>> -----Original Message----- >>>>> From: K. John Wu [mailto:[email protected]] >>>>> Sent: Wednesday, March 28, 2012 1:55 AM >>>>> To: Dominique Prunier >>>>> Cc: FastBit Users >>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501 >>>>> >>>>> Hi, Dominique, >>>>> >>>>> There are some updates to involving merge of dictionaries and to >>>>> exercise the operations involving unmatched quotes. The new code is >>>>> SVN R505. >>>>> >>>>> Please let me know if you have any additional questions. >>>>> >>>>> John >>>>> >>>>> >>>>> On 3/27/12 1:03 PM, Dominique Prunier wrote: >>>>>> Hey John, >>>>>> >>>>>> There is definitely a need for FastBit escaping. The escaping problem is >>>>>> not at the shell level (though we could have one there) since in pure >>>>>> C/C++ code, there's no shell involved when i'm building a where clause >>>>>> from a string. The problem is at the where clause parsing level (in the >>>>>> lexer to be more precise) to be able to express string literals among >>>>>> other things (and not only metas, it is also white spaces, ...). >>>>>> Typically, my test that fails is as simple as calling >>>>>> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected to >>>>>> create a qString << a = it's good >> but now, it creates a qString << a >>>>>> = it\'s good >> which is wrong. The attached patch restores the >>>>>> descaping, but _not_ the double quote stripping (because it is already >>>>>> handled in the lexer). All my test cases works after applying it on r503. >>>>> >>>>> The constructors for ibis::qString and ibis::qLike really should not >>>>> strip away anything. In your case, you should be able to do the following >>>>> >>>>> .../tcapi data-dir "a=\"it's good\"" >>>>> >>>>> if you are using fastbit_build_query, you can use the same string >>>>> fastbit_build_query(..., ..., "a=\"it's good\""); >>>>> >>>>> Since FastBit regular expression only support four meta characters ? * >>>>> _ %. There is no need to escape anything. It is probably cleaner to >>>>> not introduce stripping of anything special (except the outer most >>>>> quotes, which should be only done once). >>>>> >>>>>> >>>>>> About the decompression, thanks for the link, this is very interesting >>>>>> stuff ! But my point here is not about questing the fact the >>>>>> decompression can be better is some case, i was just under the >>>>>> impression that the hit vector given to category::patternSearch was >>>>>> _always_ already decompressed since it is ultimately a bitvector that >>>>>> has been created from scratch for query evaluation (it would need >>>>>> verification though). My guess is that the few percent of performance >>>>>> i'm loosing here are attributable to the check (hits.isCompressed() && >>>>>> hits.bytes()*mult + bv->bytes() > hits.size()), since it gets executed >>>>>> _A LOT_ of times. I'll try to investigate it a little bit further. >>>>> >>>>> I have rearranged the tests for decompression in layers which >>>>> hopefully will eliminate the need to perform more expensive tests in >>>>> your case that presumably involve a fairly small number of values. >>>>> _______________________________________________ >>>>> FastBit-users mailing list >>>>> [email protected] >>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >> _______________________________________________ >> FastBit-users mailing list >> [email protected] >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
