Hey John,

When i'll have some time, i'll make my test suite dump the queries and expected 
results so that you try it yourself. It only tests category (and a small bit of 
long) data types for the things i'm doing but it can be useful.

In the meantime, here is my test partition (see attached) and the query that 
generated the segfault with r507 was:

SELECT weird_category FROM <partition> WHERE weird_category LIKE '"val\\%' 
(which is not supposed to return any result)

It might help you understand what could have happened.

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]]
Sent: Thursday, March 29, 2012 10:55 AM
To: Dominique Prunier
Cc: FastBit Users
Subject: Re: [FastBit-users] PATCH: perf boost on top of r501

Good to know.  The problem was then I did not check the array bounds.
 Odd though, I did not think the values could be out of bounds..

On 3/29/12 7:24 AM, Dominique Prunier wrote:
> Hi John,
>
> I ran the query with r507, and apparently, the problem was there:
>
> ==14864== Invalid read of size 8
> ==14864==    at 0x550067A: ibis::index::sumBins(ibis::array_t<unsigned int> 
> const&, ibis::bitvector&) const (index.cpp:6371)
> ==14864==    by 0x576E76D: ibis::category::patternSearch(char const*, 
> ibis::bitvector&) const (category.cpp:871)
> ==14864==    by 0x509933C: ibis::part::patternSearch(ibis::qLike const&, 
> ibis::bitvector&) const (part.cpp:3260)
> ==14864==    by 0x545FF03: ibis::query::doEvaluate(ibis::qExpr const*, 
> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3962)
> ==14864==    by 0x545B63D: ibis::query::computeHits() (query.cpp:2771)
> ==14864==    by 0x5452413: ibis::query::evaluate(bool) (query.cpp:847)
> ==14864==    by 0x587328C: fastbit_build_query (capi.cpp:477)
> ==14864==    by 0x4030F8: main (main.cpp:38)
>
> In r508, the problem is gone !
>
> Thanks,
>
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]]
> Sent: Thursday, March 29, 2012 12:33 AM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>
> Hi, Dominique,
>
> The stack trace shows that it is invoking a copy constructor of the
> ibis::bitvector class when it encountered the seg fault.  Not sure
> what is the problem here.  I have tried to reproduce the problem by
> modifying an existing test suite check-maurel.  However, the code
> seems to work.
>
> There is a minor change ibis::index::sumBins to check that the
> incoming array contains only values less than bits.size() (the number
> of bitvectors stored in an index object - ibis::direkte is an index
> object).  This might prevent attempting to out-of-bound accesses.  The
> change is in SVN Revision 508.
>
> If you are able to find more information.  Please let me know.
>
> John
>
>
> On 3/28/12 2:07 PM, Dominique Prunier wrote:
>> Woops, r507 segfaults right away in:
>>
>> C  [libfastbit.so.0.0.9+0x800fbb]  
>> ibis::bitvector::bitvector(ibis::bitvector const&)+0x23
>> C  [libfastbit.so.0.0.9+0x6d16c1]  
>> ibis::index::sumBins(ibis::array_t<unsigned> const&, ibis::bitvector&) 
>> const+0x103
>> C  [libfastbit.so.0.0.9+0x93f76e]  ibis::category::patternSearch(char 
>> const*, ibis::bitvector&) const+0x3c8
>> C  [libfastbit.so.0.0.9+0x26a33d]  ibis::part::patternSearch(ibis::qLike 
>> const&, ibis::bitvector&) const+0xcf
>> C  [libfastbit.so.0.0.9+0x630f04]  ibis::query::doEvaluate(ibis::qExpr 
>> const*, ibis::bitvector const&, ibis::bitvector&) const+0xe1a
>> C  [libfastbit.so.0.0.9+0x62c63e]  ibis::query::computeHits()+0x356
>> C  [libfastbit.so.0.0.9+0x623414]  ibis::query::evaluate(bool)+0x4a6
>> C  [libfastbit.so.0.0.9+0xa4428d]  float+0x3c0
>> C  [jna1347076628273574295.tmp+0x11b20]  float+0x4c
>>
>> I'll try to investigate this latter but the last query of my tests it 
>> executed was
>>
>> SELECT weird_category FROM /tmp/junit8378310225719591578 WHERE 
>> weird_category LIKE '"val\\%'
>>
>> I'll try to investigate this, but it might be related to the fact that this 
>> query is supposed to return no result.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: [email protected] 
>> [mailto:[email protected]] On Behalf Of Dominique Prunier
>> Sent: Wednesday, March 28, 2012 4:58 PM
>> To: K. John Wu
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>
>> Hey John,
>>
>> I'll have a look to r507, probably tomorrow. To limit the risk, i've chosen 
>> the well tested r506 as my stable version using the 
>> FASTBIT_EMPTY_STRING_AS_NULL define.
>>
>> My problem with null mask is that i write my partitions in pure Java code. 
>> It is quite straightforward to write data files and -part.txt, but writing a 
>> NULL mask is something else. Besides, I don't really need a difference 
>> between NULL and empty anyway, since treating empty strings as NULLs is no 
>> different than some RDBMS engines that we support (Oracle does that for 
>> example).
>>
>> I have one quick question about category columns. Do they really have a 
>> separate NULL mask or do they use the bitmap stored at key 0 as their NULL 
>> mask ? If so, that means that keeping the FASTBIT_EMPTY_STRING_AS_NULL would 
>> allow me to indirectly build some real NULL values which would work with the 
>> NOT NULL syntax (which, by the way, in SQL is IS NOT NULL/IS NULL, maybe it 
>> would be worth changing it).
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]]
>> Sent: Wednesday, March 28, 2012 3:26 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>
>> Hi, Dominique,
>>
>> I have added code to accept "colname NOT NULL" in the where clauses.
>> The new code is in SVN Revision 507.
>>
>> The new revision should also consolidate the handling of many
>> bitvectors in category::patternSearch and address the issue of
>> possibly missing calls to index::activate (which leads to incorrect
>> answers).
>>
>> You can input null values through tablex::readCSV.  There is an
>> example in tests/Makefile.am in the way it generates data partition
>> w7.  The test case 15 of really-small also makes use of the new
>> expression "NOT NULL".
>>
>> Let me know if you spot any problems.
>>
>> John
>>
>>
>> On 3/28/12 10:45 AM, Dominique Prunier wrote:
>>> Hey John,
>>>
>>> My guess is that empty strings are used more often that we'd think for the 
>>> same use case: use them as NULL marker because it is the easiest way to 
>>> both insert and query (which is exactly what i use them for). Even if it is 
>>> not an exact synonym of the SQL NULL (especially for propagation), for most 
>>> use cases, it is close enough. The fact that its key is fixed and well 
>>> known provides a valid workaround to simulate IS NULL/IS NOT NULL 
>>> predicates and since it is excluded from most string related predicates 
>>> (most importantly, a LIKE '%' wont select it), it is a quite easy and 
>>> intuitive to use.
>>>
>>> Besides, the column data file for strings can't represent NULL values so 
>>> you would have to deal with null masks, which make things significantly 
>>> more complex (especially, to create a partition from scratch without using 
>>> the FastBit library) and potentially more resource hungry (have to 
>>> read/write null mask, ...).
>>>
>>> That being said, my only problem with making the distinction between NULL 
>>> and empty in the dictionary is the current revision's inability to query 
>>> empty strings and/or NULLs (which is currently worked around by the uint 
>>> alternative, =0) and the fact that it matches a pattern like '%'. Since 
>>> there is no guarantee on the empty string key, i don't have any workaround 
>>> for now.
>>>
>>> For the time being, i'll stick with the FASTBIT_EMPTY_STRING_AS_NULL define 
>>> waiting for the ambiguity to vanish.
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]]
>>> Sent: Wednesday, March 28, 2012 1:22 PM
>>> To: Dominique Prunier
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>
>>> The ability to distinguish between empty string and null string can be
>>> useful.  The ambiguity in the current code is related to some earlier
>>> design oversight, which we intend to correct.  My hope is that not too
>>> many people have empty strings that want to do something with, so
>>> breaking this compatibility is not a big deal.
>>>
>>> There are also a problem with the implementation of
>>> category::patternSearch that neglects to read the bitmaps from index
>>> files (neglecting to call index::activate).  I am consolidating some
>>> code to make use of the existing strategies involving summation of a
>>> large number of bitmaps in ibis::index::sumBits and
>>> ibis::index::sumBins.  These functions should encapsulate the idea of
>>> when to decompress bitmaps a lot better than the simple version
>>> currently in category::patternSearch.
>>>
>>> I am doing tests now and will let everyone know when I am ready to
>>> check the code in.
>>>
>>> John
>>>
>>>
>>> On 3/28/12 8:44 AM, Dominique Prunier wrote:
>>>> Thanks John for your help. It is always very appreciated.
>>>>
>>>> With the macro FASTBIT_EMPTY_STRING_AS_NULL enabled, all my test cases now 
>>>> works. I'll test performance next but it could be a good candidate for my 
>>>> stable version.
>>>>
>>>> About the FASTBIT_EMPTY_STRING_AS_NULL, what do you think would be the 
>>>> best default ? For my use case, it is obviously to enable it, but for 
>>>> everybody else i don't know.  My concern is the backward compatibility 
>>>> here, especially the fact that it influences the index creation, and not 
>>>> usage. This means that somebody who upgrades won't notice this change 
>>>> before it regenerates indexes. What's your opinion on this ?
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]]
>>>> Sent: Wednesday, March 28, 2012 11:20 AM
>>>> To: Dominique Prunier
>>>> Cc: FastBit Users
>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>
>>>> Hi, Dominique,
>>>>
>>>> Thanks for the suggestions and and test cases.  Just checked in a set
>>>> of changes as SVN Revision 506.  Here is a bit more explanation.
>>>>
>>>> On item 1, I have taken the option you've suggested in the first
>>>> message, i.e., use the macro FASTBIT_EMPTY_STRING_AS_NULL
>>>>
>>>> On item 2, I have restored the escaping of using backslash as you
>>>> requested.
>>>>
>>>> Will go through you tests cases next and see what else I need to change.
>>>>
>>>> John
>>>>
>>>>
>>>> On 3/28/12 8:02 AM, Dominique Prunier wrote:
>>>>> Hi John,
>>>>>
>>>>> Here is two simple test cases for my two issues with r505.
>>>>>
>>>>> 1. empty-strings.zip: there is no way to select empty string anymore
>>>>>
>>>>> **** FAILED (bad result count) with where clause << a='' >>
>>>>>
>>>>> **** FAILED (bad result count) with where clause << a=0 >>
>>>>>
>>>>> **** FAILED (hard) with where clause << a LIKE '' >>
>>>>>
>>>>> 2. de-escaping.zip: there is no way to select a string with a reserved 
>>>>> char in it
>>>>>
>>>>> **** FAILED (bad result count) with where clause << a='it\'s good' >>
>>>>>
>>>>> Warning -- ibis::whereParser encountered syntax error, unexpected name 
>>>>> string at location a='it's good':1.7
>>>>> Warning -- query[96w0-3xn8Tg----1]::setWhereClause -- failed to parse the 
>>>>> WHERE clause "a='it's good'"
>>>>> Warning -- fastbit_build_query failed to assign conditions (a='it's 
>>>>> good') to a query
>>>>> **** FAILED (hard) with where clause << a='it's good' >>
>>>>>
>>>>> Warning -- ibis::whereParser encountered syntax error, unexpected 
>>>>> $undefined at location a=it's good:1.5
>>>>> Warning -- query[96w0-3xn8Tg----2]::setWhereClause -- failed to parse the 
>>>>> WHERE clause "a=it's good"
>>>>> Warning -- fastbit_build_query failed to assign conditions (a=it's good) 
>>>>> to a query
>>>>> **** FAILED (hard) with where clause << a=it's good >>
>>>>>
>>>>> With the attached patch, it fixes my test cases (except a LIKE '' that 
>>>>> didn't work before and a='it's good' and a=it's good whieh are invalid 
>>>>> where clauses).
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] 
>>>>> [mailto:[email protected]] On Behalf Of Dominique 
>>>>> Prunier
>>>>> Sent: Wednesday, March 28, 2012 10:04 AM
>>>>> To: K. John Wu
>>>>> Cc: FastBit Users
>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>
>>>>> Hey John,
>>>>>
>>>>> I'm trying r505 right know, i have two questions/remarks:
>>>>>
>>>>>  1. I saw a change about the dictionary now able to accept empty strings 
>>>>> and treat them as normal strings instead of NULLs. This breaks quite a 
>>>>> lot of my test cases, specifically those which used to test category=0 or 
>>>>> !=0 to simulate the IS NULL/IS NOT NULL because testing for an empty 
>>>>> string doesn't work (='' or LIKE '' used to fail or return invalid 
>>>>> results). What would be your recommendation for that use case ? Could we 
>>>>> add a define for those relying on this behavior ? Something like:
>>>>>
>>>>> -    //if (*str == 0) return 0;
>>>>> +#ifdef FASTBIT_EMPTY_STRING_AS_NULL
>>>>> +    if (*str == 0) return 0;
>>>>> +#endif
>>>>>
>>>>>  2. I can't find anywhere the de-escaping patch in r505, am i missing 
>>>>> something ?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -----Original Message-----
>>>>> From: K. John Wu [mailto:[email protected]]
>>>>> Sent: Wednesday, March 28, 2012 1:55 AM
>>>>> To: Dominique Prunier
>>>>> Cc: FastBit Users
>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>
>>>>> Hi, Dominique,
>>>>>
>>>>> There are some updates to involving merge of dictionaries and to
>>>>> exercise the operations involving unmatched quotes.  The new code is
>>>>> SVN R505.
>>>>>
>>>>> Please let me know if you have any additional questions.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 3/27/12 1:03 PM, Dominique Prunier wrote:
>>>>>> Hey John,
>>>>>>
>>>>>> There is definitely a need for FastBit escaping. The escaping problem is 
>>>>>> not at the shell level (though we could have one there) since in pure 
>>>>>> C/C++ code, there's no shell involved when i'm building a where clause 
>>>>>> from a string. The problem is at the where clause parsing level (in the 
>>>>>> lexer to be more precise) to be able to express string literals among 
>>>>>> other things (and not only metas, it is also white spaces, ...).
>>>>>> Typically, my test that fails is as simple as calling 
>>>>>> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected to 
>>>>>> create a qString << a = it's good >> but now, it creates a qString << a 
>>>>>> = it\'s good >> which is wrong. The attached patch restores the 
>>>>>> descaping, but _not_ the double quote stripping (because it is already 
>>>>>> handled in the lexer). All my test cases works after applying it on r503.
>>>>>
>>>>> The constructors for ibis::qString and ibis::qLike really should not
>>>>> strip away anything.  In your case, you should be able to do the following
>>>>>
>>>>> .../tcapi data-dir "a=\"it's good\""
>>>>>
>>>>> if you are using fastbit_build_query, you can use the same string
>>>>> fastbit_build_query(..., ..., "a=\"it's good\"");
>>>>>
>>>>> Since FastBit regular expression only support four meta characters ? *
>>>>> _ %.  There is no need to escape anything.  It is probably cleaner to
>>>>> not introduce stripping of anything special (except the outer most
>>>>> quotes, which should be only done once).
>>>>>
>>>>>>
>>>>>> About the decompression, thanks for the link, this is very interesting 
>>>>>> stuff ! But my point here is not about questing the fact the 
>>>>>> decompression can be better is some case, i was just under the 
>>>>>> impression that the hit vector given to category::patternSearch was 
>>>>>> _always_ already decompressed since it is ultimately a bitvector that 
>>>>>> has been created from scratch for query evaluation (it would need 
>>>>>> verification though). My guess is that the few percent of performance 
>>>>>> i'm loosing here are attributable to the check (hits.isCompressed() && 
>>>>>> hits.bytes()*mult + bv->bytes() > hits.size()), since it gets executed 
>>>>>> _A LOT_ of times. I'll try to investigate it a little bit further.
>>>>>
>>>>> I have rearranged the tests for decompression in layers which
>>>>> hopefully will eliminate the need to perform more expensive tests in
>>>>> your case that presumably involve a fairly small number of values.
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Attachment: partition.tar.bz2
Description: partition.tar.bz2

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to