Hi, Dominique,

The stack trace shows that it is invoking a copy constructor of the
ibis::bitvector class when it encountered the seg fault.  Not sure
what is the problem here.  I have tried to reproduce the problem by
modifying an existing test suite check-maurel.  However, the code
seems to work.

There is a minor change ibis::index::sumBins to check that the
incoming array contains only values less than bits.size() (the number
of bitvectors stored in an index object - ibis::direkte is an index
object).  This might prevent attempting to out-of-bound accesses.  The
change is in SVN Revision 508.

If you are able to find more information.  Please let me know.

John


On 3/28/12 2:07 PM, Dominique Prunier wrote:
> Woops, r507 segfaults right away in:
> 
> C  [libfastbit.so.0.0.9+0x800fbb]  ibis::bitvector::bitvector(ibis::bitvector 
> const&)+0x23
> C  [libfastbit.so.0.0.9+0x6d16c1]  
> ibis::index::sumBins(ibis::array_t<unsigned> const&, ibis::bitvector&) 
> const+0x103
> C  [libfastbit.so.0.0.9+0x93f76e]  ibis::category::patternSearch(char const*, 
> ibis::bitvector&) const+0x3c8
> C  [libfastbit.so.0.0.9+0x26a33d]  ibis::part::patternSearch(ibis::qLike 
> const&, ibis::bitvector&) const+0xcf
> C  [libfastbit.so.0.0.9+0x630f04]  ibis::query::doEvaluate(ibis::qExpr 
> const*, ibis::bitvector const&, ibis::bitvector&) const+0xe1a
> C  [libfastbit.so.0.0.9+0x62c63e]  ibis::query::computeHits()+0x356
> C  [libfastbit.so.0.0.9+0x623414]  ibis::query::evaluate(bool)+0x4a6
> C  [libfastbit.so.0.0.9+0xa4428d]  float+0x3c0
> C  [jna1347076628273574295.tmp+0x11b20]  float+0x4c
> 
> I'll try to investigate this latter but the last query of my tests it 
> executed was
> 
> SELECT weird_category FROM /tmp/junit8378310225719591578 WHERE weird_category 
> LIKE '"val\\%'
> 
> I'll try to investigate this, but it might be related to the fact that this 
> query is supposed to return no result.
> 
> Thanks,
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Dominique Prunier
> Sent: Wednesday, March 28, 2012 4:58 PM
> To: K. John Wu
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
> 
> Hey John,
> 
> I'll have a look to r507, probably tomorrow. To limit the risk, i've chosen 
> the well tested r506 as my stable version using the 
> FASTBIT_EMPTY_STRING_AS_NULL define.
> 
> My problem with null mask is that i write my partitions in pure Java code. It 
> is quite straightforward to write data files and -part.txt, but writing a 
> NULL mask is something else. Besides, I don't really need a difference 
> between NULL and empty anyway, since treating empty strings as NULLs is no 
> different than some RDBMS engines that we support (Oracle does that for 
> example).
> 
> I have one quick question about category columns. Do they really have a 
> separate NULL mask or do they use the bitmap stored at key 0 as their NULL 
> mask ? If so, that means that keeping the FASTBIT_EMPTY_STRING_AS_NULL would 
> allow me to indirectly build some real NULL values which would work with the 
> NOT NULL syntax (which, by the way, in SQL is IS NOT NULL/IS NULL, maybe it 
> would be worth changing it).
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Wednesday, March 28, 2012 3:26 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
> 
> Hi, Dominique,
> 
> I have added code to accept "colname NOT NULL" in the where clauses.
> The new code is in SVN Revision 507.
> 
> The new revision should also consolidate the handling of many
> bitvectors in category::patternSearch and address the issue of
> possibly missing calls to index::activate (which leads to incorrect
> answers).
> 
> You can input null values through tablex::readCSV.  There is an
> example in tests/Makefile.am in the way it generates data partition
> w7.  The test case 15 of really-small also makes use of the new
> expression "NOT NULL".
> 
> Let me know if you spot any problems.
> 
> John
> 
> 
> On 3/28/12 10:45 AM, Dominique Prunier wrote:
>> Hey John,
>>
>> My guess is that empty strings are used more often that we'd think for the 
>> same use case: use them as NULL marker because it is the easiest way to both 
>> insert and query (which is exactly what i use them for). Even if it is not 
>> an exact synonym of the SQL NULL (especially for propagation), for most use 
>> cases, it is close enough. The fact that its key is fixed and well known 
>> provides a valid workaround to simulate IS NULL/IS NOT NULL predicates and 
>> since it is excluded from most string related predicates (most importantly, 
>> a LIKE '%' wont select it), it is a quite easy and intuitive to use. 
>>
>> Besides, the column data file for strings can't represent NULL values so you 
>> would have to deal with null masks, which make things significantly more 
>> complex (especially, to create a partition from scratch without using the 
>> FastBit library) and potentially more resource hungry (have to read/write 
>> null mask, ...).
>>
>> That being said, my only problem with making the distinction between NULL 
>> and empty in the dictionary is the current revision's inability to query 
>> empty strings and/or NULLs (which is currently worked around by the uint 
>> alternative, =0) and the fact that it matches a pattern like '%'. Since 
>> there is no guarantee on the empty string key, i don't have any workaround 
>> for now.
>>
>> For the time being, i'll stick with the FASTBIT_EMPTY_STRING_AS_NULL define 
>> waiting for the ambiguity to vanish.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]] 
>> Sent: Wednesday, March 28, 2012 1:22 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>
>> The ability to distinguish between empty string and null string can be
>> useful.  The ambiguity in the current code is related to some earlier
>> design oversight, which we intend to correct.  My hope is that not too
>> many people have empty strings that want to do something with, so
>> breaking this compatibility is not a big deal.
>>
>> There are also a problem with the implementation of
>> category::patternSearch that neglects to read the bitmaps from index
>> files (neglecting to call index::activate).  I am consolidating some
>> code to make use of the existing strategies involving summation of a
>> large number of bitmaps in ibis::index::sumBits and
>> ibis::index::sumBins.  These functions should encapsulate the idea of
>> when to decompress bitmaps a lot better than the simple version
>> currently in category::patternSearch.
>>
>> I am doing tests now and will let everyone know when I am ready to
>> check the code in.
>>
>> John
>>
>>
>> On 3/28/12 8:44 AM, Dominique Prunier wrote:
>>> Thanks John for your help. It is always very appreciated.
>>>
>>> With the macro FASTBIT_EMPTY_STRING_AS_NULL enabled, all my test cases now 
>>> works. I'll test performance next but it could be a good candidate for my 
>>> stable version.
>>>
>>> About the FASTBIT_EMPTY_STRING_AS_NULL, what do you think would be the best 
>>> default ? For my use case, it is obviously to enable it, but for everybody 
>>> else i don't know.  My concern is the backward compatibility here, 
>>> especially the fact that it influences the index creation, and not usage. 
>>> This means that somebody who upgrades won't notice this change before it 
>>> regenerates indexes. What's your opinion on this ?
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]] 
>>> Sent: Wednesday, March 28, 2012 11:20 AM
>>> To: Dominique Prunier
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>
>>> Hi, Dominique,
>>>
>>> Thanks for the suggestions and and test cases.  Just checked in a set
>>> of changes as SVN Revision 506.  Here is a bit more explanation.
>>>
>>> On item 1, I have taken the option you've suggested in the first
>>> message, i.e., use the macro FASTBIT_EMPTY_STRING_AS_NULL
>>>
>>> On item 2, I have restored the escaping of using backslash as you
>>> requested.
>>>
>>> Will go through you tests cases next and see what else I need to change.
>>>
>>> John
>>>
>>>
>>> On 3/28/12 8:02 AM, Dominique Prunier wrote:
>>>> Hi John,
>>>>
>>>> Here is two simple test cases for my two issues with r505.
>>>>
>>>> 1. empty-strings.zip: there is no way to select empty string anymore
>>>>
>>>> **** FAILED (bad result count) with where clause << a='' >>
>>>>
>>>> **** FAILED (bad result count) with where clause << a=0 >>
>>>>
>>>> **** FAILED (hard) with where clause << a LIKE '' >>
>>>>
>>>> 2. de-escaping.zip: there is no way to select a string with a reserved 
>>>> char in it
>>>>
>>>> **** FAILED (bad result count) with where clause << a='it\'s good' >>
>>>>
>>>> Warning -- ibis::whereParser encountered syntax error, unexpected name 
>>>> string at location a='it's good':1.7
>>>> Warning -- query[96w0-3xn8Tg----1]::setWhereClause -- failed to parse the 
>>>> WHERE clause "a='it's good'"
>>>> Warning -- fastbit_build_query failed to assign conditions (a='it's good') 
>>>> to a query
>>>> **** FAILED (hard) with where clause << a='it's good' >>
>>>>
>>>> Warning -- ibis::whereParser encountered syntax error, unexpected 
>>>> $undefined at location a=it's good:1.5
>>>> Warning -- query[96w0-3xn8Tg----2]::setWhereClause -- failed to parse the 
>>>> WHERE clause "a=it's good"
>>>> Warning -- fastbit_build_query failed to assign conditions (a=it's good) 
>>>> to a query
>>>> **** FAILED (hard) with where clause << a=it's good >>
>>>>
>>>> With the attached patch, it fixes my test cases (except a LIKE '' that 
>>>> didn't work before and a='it's good' and a=it's good whieh are invalid 
>>>> where clauses).
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: [email protected] 
>>>> [mailto:[email protected]] On Behalf Of Dominique 
>>>> Prunier
>>>> Sent: Wednesday, March 28, 2012 10:04 AM
>>>> To: K. John Wu
>>>> Cc: FastBit Users
>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>
>>>> Hey John,
>>>>
>>>> I'm trying r505 right know, i have two questions/remarks:
>>>>
>>>>  1. I saw a change about the dictionary now able to accept empty strings 
>>>> and treat them as normal strings instead of NULLs. This breaks quite a lot 
>>>> of my test cases, specifically those which used to test category=0 or !=0 
>>>> to simulate the IS NULL/IS NOT NULL because testing for an empty string 
>>>> doesn't work (='' or LIKE '' used to fail or return invalid results). What 
>>>> would be your recommendation for that use case ? Could we add a define for 
>>>> those relying on this behavior ? Something like:
>>>>
>>>> -    //if (*str == 0) return 0;
>>>> +#ifdef FASTBIT_EMPTY_STRING_AS_NULL
>>>> +    if (*str == 0) return 0;
>>>> +#endif
>>>>
>>>>  2. I can't find anywhere the de-escaping patch in r505, am i missing 
>>>> something ?
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]] 
>>>> Sent: Wednesday, March 28, 2012 1:55 AM
>>>> To: Dominique Prunier
>>>> Cc: FastBit Users
>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>
>>>> Hi, Dominique,
>>>>
>>>> There are some updates to involving merge of dictionaries and to
>>>> exercise the operations involving unmatched quotes.  The new code is
>>>> SVN R505.
>>>>
>>>> Please let me know if you have any additional questions.
>>>>
>>>> John
>>>>
>>>>
>>>> On 3/27/12 1:03 PM, Dominique Prunier wrote:
>>>>> Hey John,
>>>>>
>>>>> There is definitely a need for FastBit escaping. The escaping problem is 
>>>>> not at the shell level (though we could have one there) since in pure 
>>>>> C/C++ code, there's no shell involved when i'm building a where clause 
>>>>> from a string. The problem is at the where clause parsing level (in the 
>>>>> lexer to be more precise) to be able to express string literals among 
>>>>> other things (and not only metas, it is also white spaces, ...).
>>>>> Typically, my test that fails is as simple as calling 
>>>>> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected to 
>>>>> create a qString << a = it's good >> but now, it creates a qString << a = 
>>>>> it\'s good >> which is wrong. The attached patch restores the descaping, 
>>>>> but _not_ the double quote stripping (because it is already handled in 
>>>>> the lexer). All my test cases works after applying it on r503.
>>>>
>>>> The constructors for ibis::qString and ibis::qLike really should not
>>>> strip away anything.  In your case, you should be able to do the following
>>>>
>>>> .../tcapi data-dir "a=\"it's good\""
>>>>
>>>> if you are using fastbit_build_query, you can use the same string
>>>> fastbit_build_query(..., ..., "a=\"it's good\"");
>>>>
>>>> Since FastBit regular expression only support four meta characters ? *
>>>> _ %.  There is no need to escape anything.  It is probably cleaner to
>>>> not introduce stripping of anything special (except the outer most
>>>> quotes, which should be only done once).
>>>>
>>>>>
>>>>> About the decompression, thanks for the link, this is very interesting 
>>>>> stuff ! But my point here is not about questing the fact the 
>>>>> decompression can be better is some case, i was just under the impression 
>>>>> that the hit vector given to category::patternSearch was _always_ already 
>>>>> decompressed since it is ultimately a bitvector that has been created 
>>>>> from scratch for query evaluation (it would need verification though). My 
>>>>> guess is that the few percent of performance i'm loosing here are 
>>>>> attributable to the check (hits.isCompressed() && hits.bytes()*mult + 
>>>>> bv->bytes() > hits.size()), since it gets executed _A LOT_ of times. I'll 
>>>>> try to investigate it a little bit further.
>>>>
>>>> I have rearranged the tests for decompression in layers which
>>>> hopefully will eliminate the need to perform more expensive tests in
>>>> your case that presumably involve a fairly small number of values.
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to