Hi, Dominique,

Thanks for the updates.  Except the issue with win/xMinGW.mak, looks
like all others will not affect whether the source tar ball will
produce the correct library or executable.  Therefore, I will only
include them in the next stable release.  In the mean time, your
changes have been incorporated into the SVN Revision 512.

Thanks again.

John


On 4/3/12 6:30 AM, Dominique Prunier wrote:
> Hi John,
> 
> I'm happy to hear that FastBit reached 1.3.0. It's been a lot of improvements 
> over 1.2.8 !
> I'm testing right now r511 (i guess it is the same as the stable release).
> So far, it behaves like r510 did for me: stable and fast !
> 
> I noticed something strange in the packaging however, it seems that there are 
> some differences between the repository at r511 are and the zip:
> --- r511
> +++ ibis1.3.0
> -contrib/fbmerge/.deps/fbmerge.Po
> -doc/contact.html
> -doc/header.html
> -doc/rara.html
> +src/._colValues.cpp
> +src/._colValues.h
> -src/fastbit-0.7.pc.in
> +src/fastbit-config.h
> -tests/scripts/star2002.sh
> +win/._MinGW.mak
> -win/static_pthread_init.c
> -win/static_pthread.patch
> -win/xMinGW.mak
> 
> While we are at it, i have a small fix to push to xMinGW.mak (since we rename 
> it, we have to change the following line):
> 
> fastbit.dll: $(FRC)
> -       $(MAKE) -f MinGW.mak DEF="$(DEF) -DCXX_USE_DLL -DDLL_EXPORT" $(OBJ)
> +       $(MAKE) -f xMinGW.mak DEF="$(DEF) -DCXX_USE_DLL -DDLL_EXPORT" $(OBJ)
> 
> These are only minor things, i don't think it is really worth changing for 
> 1.3.0.
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]]
> Sent: Monday, April 02, 2012 11:32 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
> 
> Hi, Dominique,
> 
> Here is the link to the latest stable release
> <https://codeforge.lbl.gov/frs/download.php/389/fastbit-ibis1.3.0.tar.gz>
> 
> Let me know spot anything that needs attention.
> 
> Thanks for all the help.
> 
> John
> 
> 
> 
> On 4/2/12 6:51 AM, Dominique Prunier wrote:
>> Hey John,
>>
>> I just tested r510 and i'm please to say that not only it passes all my test 
>> cases (with -DFASTBIT_EMPTY_STRING_AS_NULL), but it is also unbelievably 
>> faster (more than 25% in my test) ! I'm not sure exactly why but this is 
>> greatly appreciated. Could it be the move from the simple loop in 
>> category::patternSearch to the much more sophisticated index::sumBins ?
>>
>> Do you have a lot of thing to change before the stable release ? I admit 
>> that i'd prefer use a stable release too rather than r506.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]]
>> Sent: Saturday, March 31, 2012 1:33 AM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>
>> Hi, Dominique,
>>
>> I have added a couple of more test cases involving special characters
>> in the test suite used by 'make check'.  The latest SVN revision is 510.
>>
>> Things seem to work OK.  I would like to wrap things for a stable
>> research soon.  If you find anything that needs attention please let
>> me know.
>>
>> John
>>
>>
>> On 3/29/12 1:23 PM, Dominique Prunier wrote:
>>> Hey John,
>>>
>>> I found the problem. When the exact value doesn't exists in the dictionary, 
>>> operator[] is supposed to return size()+1 but here it returns 3 instead of 
>>> 2 (it returns raw_.size()+1 instead of key_.size()+1) which make the 
>>> following code fail in dictionary::patternSearch:
>>>
>>>     if (!meta) {
>>>         uint32_t code = operator[](prefix.c_str());
>>>         if (code != size() + 1) {
>>>             matches.push_back(code);
>>>         }
>>>         return;
>>>     }
>>>
>>> We probably never saw it before for at least 3 reasons:
>>>  * it only affects linear search from dictionary::operator[] since in the 
>>> other case it returns raw_.size() so it can't happen is the dic is larger 
>>> than 16
>>>  * index::getBitvector that was previously used in category::patternSearch 
>>> validate the given index and return 0 if it is out of bounds
>>>  * category::patternSearch was validating that index::getBitvector didn't 
>>> return NULL
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]]
>>> Sent: Thursday, March 29, 2012 3:48 PM
>>> To: Dominique Prunier
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>
>>> Hi, Dominique,
>>>
>>> The query seg faulted in r507 because ibis::dictionary::patternSearch
>>> placed the number 3 into the output array, however, the dictionary has
>>> only one value "\"val%\"".  This creates an opportunity for
>>> ibis::index::sumBins attempt to access bits[3], but there are only two
>>> values in bits.  Any idea why is ibis::dictionary::patternSearch
>>> producing 3?
>>>
>>> John
>>>
>>>
>>> On 3/29/12 10:05 AM, Dominique Prunier wrote:
>>>> Hey John,
>>>>
>>>> I'm sorry, my test case was not really "minimal" and too complex for what 
>>>> i wanted to show you.
>>>> Please don't change the escaping, the query evaluation works just fine.
>>>>
>>>> Just for your information, this test case was here to validate one thing: 
>>>> consistent de-escaping in all layers:
>>>>  - C source : "'\"val\\\\%'"
>>>>  - Compiled : '"val\\%'
>>>>  - After lexer : "val\\%
>>>>  - In qLike : "val\%
>>>> Ultimately, this tests try to validate that % escaping works in 
>>>> patternMatch by not treating % as a wildcard but as a regular character, 
>>>> thus this is not supposed to return any match (it ends up being an 
>>>> equivalent of = "val%). And it works just as expected. There is nothing to 
>>>> change about it.
>>>>
>>>> However, it just happened to be a query that used to segfault in r507, but 
>>>> my guess is that it was related to one of these specificities:
>>>>  * this column only has a single value
>>>>  * there is no nulls
>>>>  * the query returns no results
>>>>
>>>> To give you a simpler example, the query:
>>>>
>>>> SELECT single_value_category FROM <partition> WHERE single_value_category 
>>>> LIKE 'missing'
>>>>
>>>> segfaulted as well in r507 for the same reasons. And this exemple doesn't 
>>>> involve any fancy de-escaping but share the same specificities (single 
>>>> value, no nulls, no results).
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]]
>>>> Sent: Thursday, March 29, 2012 12:41 PM
>>>> To: Dominique Prunier
>>>> Cc: FastBit Users
>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>
>>>> Hi, Dominique,
>>>>
>>>> I see what you are trying to do.  The weird_category only has value
>>>> "\"val%\"", therefore, the where clause "weird_category LIKE
>>>> '\"val\\%'" should match every row, but "weird_category LIKE 'val\\%'"
>>>> should match no row.
>>>>
>>>> If you can step into category::patternSearch, you will see that
>>>> "val\\%" has been stripped to "val%", which will have the same outcome
>>>> as you intended, but it stripped away too many back slashes.  You
>>>> intend category::patternSearch to see "val\%" to match the literal
>>>> percent sign (%), however, because the back slash was stripped twice,
>>>> you only got a bare percent sign left, which means it is a wild card
>>>> character, not a literal character as you intended.
>>>>
>>>> My theory is this.  The string "val\\%" becomes "val\%" when it gets
>>>> to inside the C code.  The runtime system has stripped away the first
>>>> back slash.  The constructor of ibis::qLike take away the second one.
>>>>
>>>> Since we have gone back and forth many times on this, I will wait for
>>>> you confirmation before doing anything about it.
>>>>
>>>> John
>>>>
>>>>
>>>> On 3/29/12 8:09 AM, Dominique Prunier wrote:
>>>>> Hey John,
>>>>>
>>>>> When i'll have some time, i'll make my test suite dump the queries and 
>>>>> expected results so that you try it yourself. It only tests category (and 
>>>>> a small bit of long) data types for the things i'm doing but it can be 
>>>>> useful.
>>>>>
>>>>> In the meantime, here is my test partition (see attached) and the query 
>>>>> that generated the segfault with r507 was:
>>>>>
>>>>> SELECT weird_category FROM <partition> WHERE weird_category LIKE 
>>>>> '"val\\%' (which is not supposed to return any result)
>>>>>
>>>>> It might help you understand what could have happened.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -----Original Message-----
>>>>> From: K. John Wu [mailto:[email protected]]
>>>>> Sent: Thursday, March 29, 2012 10:55 AM
>>>>> To: Dominique Prunier
>>>>> Cc: FastBit Users
>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>
>>>>> Good to know.  The problem was then I did not check the array bounds.
>>>>>  Odd though, I did not think the values could be out of bounds..
>>>>>
>>>>> On 3/29/12 7:24 AM, Dominique Prunier wrote:
>>>>>> Hi John,
>>>>>>
>>>>>> I ran the query with r507, and apparently, the problem was there:
>>>>>>
>>>>>> ==14864== Invalid read of size 8
>>>>>> ==14864==    at 0x550067A: ibis::index::sumBins(ibis::array_t<unsigned 
>>>>>> int> const&, ibis::bitvector&) const (index.cpp:6371)
>>>>>> ==14864==    by 0x576E76D: ibis::category::patternSearch(char const*, 
>>>>>> ibis::bitvector&) const (category.cpp:871)
>>>>>> ==14864==    by 0x509933C: ibis::part::patternSearch(ibis::qLike const&, 
>>>>>> ibis::bitvector&) const (part.cpp:3260)
>>>>>> ==14864==    by 0x545FF03: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3962)
>>>>>> ==14864==    by 0x545B63D: ibis::query::computeHits() (query.cpp:2771)
>>>>>> ==14864==    by 0x5452413: ibis::query::evaluate(bool) (query.cpp:847)
>>>>>> ==14864==    by 0x587328C: fastbit_build_query (capi.cpp:477)
>>>>>> ==14864==    by 0x4030F8: main (main.cpp:38)
>>>>>>
>>>>>> In r508, the problem is gone !
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>> Sent: Thursday, March 29, 2012 12:33 AM
>>>>>> To: Dominique Prunier
>>>>>> Cc: FastBit Users
>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>>
>>>>>> Hi, Dominique,
>>>>>>
>>>>>> The stack trace shows that it is invoking a copy constructor of the
>>>>>> ibis::bitvector class when it encountered the seg fault.  Not sure
>>>>>> what is the problem here.  I have tried to reproduce the problem by
>>>>>> modifying an existing test suite check-maurel.  However, the code
>>>>>> seems to work.
>>>>>>
>>>>>> There is a minor change ibis::index::sumBins to check that the
>>>>>> incoming array contains only values less than bits.size() (the number
>>>>>> of bitvectors stored in an index object - ibis::direkte is an index
>>>>>> object).  This might prevent attempting to out-of-bound accesses.  The
>>>>>> change is in SVN Revision 508.
>>>>>>
>>>>>> If you are able to find more information.  Please let me know.
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 3/28/12 2:07 PM, Dominique Prunier wrote:
>>>>>>> Woops, r507 segfaults right away in:
>>>>>>>
>>>>>>> C  [libfastbit.so.0.0.9+0x800fbb]  
>>>>>>> ibis::bitvector::bitvector(ibis::bitvector const&)+0x23
>>>>>>> C  [libfastbit.so.0.0.9+0x6d16c1]  
>>>>>>> ibis::index::sumBins(ibis::array_t<unsigned> const&, ibis::bitvector&) 
>>>>>>> const+0x103
>>>>>>> C  [libfastbit.so.0.0.9+0x93f76e]  ibis::category::patternSearch(char 
>>>>>>> const*, ibis::bitvector&) const+0x3c8
>>>>>>> C  [libfastbit.so.0.0.9+0x26a33d]  
>>>>>>> ibis::part::patternSearch(ibis::qLike const&, ibis::bitvector&) 
>>>>>>> const+0xcf
>>>>>>> C  [libfastbit.so.0.0.9+0x630f04]  ibis::query::doEvaluate(ibis::qExpr 
>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const+0xe1a
>>>>>>> C  [libfastbit.so.0.0.9+0x62c63e]  ibis::query::computeHits()+0x356
>>>>>>> C  [libfastbit.so.0.0.9+0x623414]  ibis::query::evaluate(bool)+0x4a6
>>>>>>> C  [libfastbit.so.0.0.9+0xa4428d]  float+0x3c0
>>>>>>> C  [jna1347076628273574295.tmp+0x11b20]  float+0x4c
>>>>>>>
>>>>>>> I'll try to investigate this latter but the last query of my tests it 
>>>>>>> executed was
>>>>>>>
>>>>>>> SELECT weird_category FROM /tmp/junit8378310225719591578 WHERE 
>>>>>>> weird_category LIKE '"val\\%'
>>>>>>>
>>>>>>> I'll try to investigate this, but it might be related to the fact that 
>>>>>>> this query is supposed to return no result.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: [email protected] 
>>>>>>> [mailto:[email protected]] On Behalf Of Dominique 
>>>>>>> Prunier
>>>>>>> Sent: Wednesday, March 28, 2012 4:58 PM
>>>>>>> To: K. John Wu
>>>>>>> Cc: FastBit Users
>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>>>
>>>>>>> Hey John,
>>>>>>>
>>>>>>> I'll have a look to r507, probably tomorrow. To limit the risk, i've 
>>>>>>> chosen the well tested r506 as my stable version using the 
>>>>>>> FASTBIT_EMPTY_STRING_AS_NULL define.
>>>>>>>
>>>>>>> My problem with null mask is that i write my partitions in pure Java 
>>>>>>> code. It is quite straightforward to write data files and -part.txt, 
>>>>>>> but writing a NULL mask is something else. Besides, I don't really need 
>>>>>>> a difference between NULL and empty anyway, since treating empty 
>>>>>>> strings as NULLs is no different than some RDBMS engines that we 
>>>>>>> support (Oracle does that for example).
>>>>>>>
>>>>>>> I have one quick question about category columns. Do they really have a 
>>>>>>> separate NULL mask or do they use the bitmap stored at key 0 as their 
>>>>>>> NULL mask ? If so, that means that keeping the 
>>>>>>> FASTBIT_EMPTY_STRING_AS_NULL would allow me to indirectly build some 
>>>>>>> real NULL values which would work with the NOT NULL syntax (which, by 
>>>>>>> the way, in SQL is IS NOT NULL/IS NULL, maybe it would be worth 
>>>>>>> changing it).
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>> Sent: Wednesday, March 28, 2012 3:26 PM
>>>>>>> To: Dominique Prunier
>>>>>>> Cc: FastBit Users
>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>>>
>>>>>>> Hi, Dominique,
>>>>>>>
>>>>>>> I have added code to accept "colname NOT NULL" in the where clauses.
>>>>>>> The new code is in SVN Revision 507.
>>>>>>>
>>>>>>> The new revision should also consolidate the handling of many
>>>>>>> bitvectors in category::patternSearch and address the issue of
>>>>>>> possibly missing calls to index::activate (which leads to incorrect
>>>>>>> answers).
>>>>>>>
>>>>>>> You can input null values through tablex::readCSV.  There is an
>>>>>>> example in tests/Makefile.am in the way it generates data partition
>>>>>>> w7.  The test case 15 of really-small also makes use of the new
>>>>>>> expression "NOT NULL".
>>>>>>>
>>>>>>> Let me know if you spot any problems.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 3/28/12 10:45 AM, Dominique Prunier wrote:
>>>>>>>> Hey John,
>>>>>>>>
>>>>>>>> My guess is that empty strings are used more often that we'd think for 
>>>>>>>> the same use case: use them as NULL marker because it is the easiest 
>>>>>>>> way to both insert and query (which is exactly what i use them for). 
>>>>>>>> Even if it is not an exact synonym of the SQL NULL (especially for 
>>>>>>>> propagation), for most use cases, it is close enough. The fact that 
>>>>>>>> its key is fixed and well known provides a valid workaround to 
>>>>>>>> simulate IS NULL/IS NOT NULL predicates and since it is excluded from 
>>>>>>>> most string related predicates (most importantly, a LIKE '%' wont 
>>>>>>>> select it), it is a quite easy and intuitive to use.
>>>>>>>>
>>>>>>>> Besides, the column data file for strings can't represent NULL values 
>>>>>>>> so you would have to deal with null masks, which make things 
>>>>>>>> significantly more complex (especially, to create a partition from 
>>>>>>>> scratch without using the FastBit library) and potentially more 
>>>>>>>> resource hungry (have to read/write null mask, ...).
>>>>>>>>
>>>>>>>> That being said, my only problem with making the distinction between 
>>>>>>>> NULL and empty in the dictionary is the current revision's inability 
>>>>>>>> to query empty strings and/or NULLs (which is currently worked around 
>>>>>>>> by the uint alternative, =0) and the fact that it matches a pattern 
>>>>>>>> like '%'. Since there is no guarantee on the empty string key, i don't 
>>>>>>>> have any workaround for now.
>>>>>>>>
>>>>>>>> For the time being, i'll stick with the FASTBIT_EMPTY_STRING_AS_NULL 
>>>>>>>> define waiting for the ambiguity to vanish.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>> Sent: Wednesday, March 28, 2012 1:22 PM
>>>>>>>> To: Dominique Prunier
>>>>>>>> Cc: FastBit Users
>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>>>>
>>>>>>>> The ability to distinguish between empty string and null string can be
>>>>>>>> useful.  The ambiguity in the current code is related to some earlier
>>>>>>>> design oversight, which we intend to correct.  My hope is that not too
>>>>>>>> many people have empty strings that want to do something with, so
>>>>>>>> breaking this compatibility is not a big deal.
>>>>>>>>
>>>>>>>> There are also a problem with the implementation of
>>>>>>>> category::patternSearch that neglects to read the bitmaps from index
>>>>>>>> files (neglecting to call index::activate).  I am consolidating some
>>>>>>>> code to make use of the existing strategies involving summation of a
>>>>>>>> large number of bitmaps in ibis::index::sumBits and
>>>>>>>> ibis::index::sumBins.  These functions should encapsulate the idea of
>>>>>>>> when to decompress bitmaps a lot better than the simple version
>>>>>>>> currently in category::patternSearch.
>>>>>>>>
>>>>>>>> I am doing tests now and will let everyone know when I am ready to
>>>>>>>> check the code in.
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/28/12 8:44 AM, Dominique Prunier wrote:
>>>>>>>>> Thanks John for your help. It is always very appreciated.
>>>>>>>>>
>>>>>>>>> With the macro FASTBIT_EMPTY_STRING_AS_NULL enabled, all my test 
>>>>>>>>> cases now works. I'll test performance next but it could be a good 
>>>>>>>>> candidate for my stable version.
>>>>>>>>>
>>>>>>>>> About the FASTBIT_EMPTY_STRING_AS_NULL, what do you think would be 
>>>>>>>>> the best default ? For my use case, it is obviously to enable it, but 
>>>>>>>>> for everybody else i don't know.  My concern is the backward 
>>>>>>>>> compatibility here, especially the fact that it influences the index 
>>>>>>>>> creation, and not usage. This means that somebody who upgrades won't 
>>>>>>>>> notice this change before it regenerates indexes. What's your opinion 
>>>>>>>>> on this ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>> Sent: Wednesday, March 28, 2012 11:20 AM
>>>>>>>>> To: Dominique Prunier
>>>>>>>>> Cc: FastBit Users
>>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>>>>>
>>>>>>>>> Hi, Dominique,
>>>>>>>>>
>>>>>>>>> Thanks for the suggestions and and test cases.  Just checked in a set
>>>>>>>>> of changes as SVN Revision 506.  Here is a bit more explanation.
>>>>>>>>>
>>>>>>>>> On item 1, I have taken the option you've suggested in the first
>>>>>>>>> message, i.e., use the macro FASTBIT_EMPTY_STRING_AS_NULL
>>>>>>>>>
>>>>>>>>> On item 2, I have restored the escaping of using backslash as you
>>>>>>>>> requested.
>>>>>>>>>
>>>>>>>>> Will go through you tests cases next and see what else I need to 
>>>>>>>>> change.
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/28/12 8:02 AM, Dominique Prunier wrote:
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> Here is two simple test cases for my two issues with r505.
>>>>>>>>>>
>>>>>>>>>> 1. empty-strings.zip: there is no way to select empty string anymore
>>>>>>>>>>
>>>>>>>>>> **** FAILED (bad result count) with where clause << a='' >>
>>>>>>>>>>
>>>>>>>>>> **** FAILED (bad result count) with where clause << a=0 >>
>>>>>>>>>>
>>>>>>>>>> **** FAILED (hard) with where clause << a LIKE '' >>
>>>>>>>>>>
>>>>>>>>>> 2. de-escaping.zip: there is no way to select a string with a 
>>>>>>>>>> reserved char in it
>>>>>>>>>>
>>>>>>>>>> **** FAILED (bad result count) with where clause << a='it\'s good' >>
>>>>>>>>>>
>>>>>>>>>> Warning -- ibis::whereParser encountered syntax error, unexpected 
>>>>>>>>>> name string at location a='it's good':1.7
>>>>>>>>>> Warning -- query[96w0-3xn8Tg----1]::setWhereClause -- failed to 
>>>>>>>>>> parse the WHERE clause "a='it's good'"
>>>>>>>>>> Warning -- fastbit_build_query failed to assign conditions (a='it's 
>>>>>>>>>> good') to a query
>>>>>>>>>> **** FAILED (hard) with where clause << a='it's good' >>
>>>>>>>>>>
>>>>>>>>>> Warning -- ibis::whereParser encountered syntax error, unexpected 
>>>>>>>>>> $undefined at location a=it's good:1.5
>>>>>>>>>> Warning -- query[96w0-3xn8Tg----2]::setWhereClause -- failed to 
>>>>>>>>>> parse the WHERE clause "a=it's good"
>>>>>>>>>> Warning -- fastbit_build_query failed to assign conditions (a=it's 
>>>>>>>>>> good) to a query
>>>>>>>>>> **** FAILED (hard) with where clause << a=it's good >>
>>>>>>>>>>
>>>>>>>>>> With the attached patch, it fixes my test cases (except a LIKE '' 
>>>>>>>>>> that didn't work before and a='it's good' and a=it's good whieh are 
>>>>>>>>>> invalid where clauses).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: [email protected] 
>>>>>>>>>> [mailto:[email protected]] On Behalf Of Dominique 
>>>>>>>>>> Prunier
>>>>>>>>>> Sent: Wednesday, March 28, 2012 10:04 AM
>>>>>>>>>> To: K. John Wu
>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>>>>>>
>>>>>>>>>> Hey John,
>>>>>>>>>>
>>>>>>>>>> I'm trying r505 right know, i have two questions/remarks:
>>>>>>>>>>
>>>>>>>>>>  1. I saw a change about the dictionary now able to accept empty 
>>>>>>>>>> strings and treat them as normal strings instead of NULLs. This 
>>>>>>>>>> breaks quite a lot of my test cases, specifically those which used 
>>>>>>>>>> to test category=0 or !=0 to simulate the IS NULL/IS NOT NULL 
>>>>>>>>>> because testing for an empty string doesn't work (='' or LIKE '' 
>>>>>>>>>> used to fail or return invalid results). What would be your 
>>>>>>>>>> recommendation for that use case ? Could we add a define for those 
>>>>>>>>>> relying on this behavior ? Something like:
>>>>>>>>>>
>>>>>>>>>> -    //if (*str == 0) return 0;
>>>>>>>>>> +#ifdef FASTBIT_EMPTY_STRING_AS_NULL
>>>>>>>>>> +    if (*str == 0) return 0;
>>>>>>>>>> +#endif
>>>>>>>>>>
>>>>>>>>>>  2. I can't find anywhere the de-escaping patch in r505, am i 
>>>>>>>>>> missing something ?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>>> Sent: Wednesday, March 28, 2012 1:55 AM
>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: perf boost on top of r501
>>>>>>>>>>
>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>
>>>>>>>>>> There are some updates to involving merge of dictionaries and to
>>>>>>>>>> exercise the operations involving unmatched quotes.  The new code is
>>>>>>>>>> SVN R505.
>>>>>>>>>>
>>>>>>>>>> Please let me know if you have any additional questions.
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/27/12 1:03 PM, Dominique Prunier wrote:
>>>>>>>>>>> Hey John,
>>>>>>>>>>>
>>>>>>>>>>> There is definitely a need for FastBit escaping. The escaping 
>>>>>>>>>>> problem is not at the shell level (though we could have one there) 
>>>>>>>>>>> since in pure C/C++ code, there's no shell involved when i'm 
>>>>>>>>>>> building a where clause from a string. The problem is at the where 
>>>>>>>>>>> clause parsing level (in the lexer to be more precise) to be able 
>>>>>>>>>>> to express string literals among other things (and not only metas, 
>>>>>>>>>>> it is also white spaces, ...).
>>>>>>>>>>> Typically, my test that fails is as simple as calling 
>>>>>>>>>>> fastbit_build_query(..., ..., "a='it\\'s good'"). This is expected 
>>>>>>>>>>> to create a qString << a = it's good >> but now, it creates a 
>>>>>>>>>>> qString << a = it\'s good >> which is wrong. The attached patch 
>>>>>>>>>>> restores the descaping, but _not_ the double quote stripping 
>>>>>>>>>>> (because it is already handled in the lexer). All my test cases 
>>>>>>>>>>> works after applying it on r503.
>>>>>>>>>>
>>>>>>>>>> The constructors for ibis::qString and ibis::qLike really should not
>>>>>>>>>> strip away anything.  In your case, you should be able to do the 
>>>>>>>>>> following
>>>>>>>>>>
>>>>>>>>>> .../tcapi data-dir "a=\"it's good\""
>>>>>>>>>>
>>>>>>>>>> if you are using fastbit_build_query, you can use the same string
>>>>>>>>>> fastbit_build_query(..., ..., "a=\"it's good\"");
>>>>>>>>>>
>>>>>>>>>> Since FastBit regular expression only support four meta characters ? 
>>>>>>>>>> *
>>>>>>>>>> _ %.  There is no need to escape anything.  It is probably cleaner to
>>>>>>>>>> not introduce stripping of anything special (except the outer most
>>>>>>>>>> quotes, which should be only done once).
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> About the decompression, thanks for the link, this is very 
>>>>>>>>>>> interesting stuff ! But my point here is not about questing the 
>>>>>>>>>>> fact the decompression can be better is some case, i was just under 
>>>>>>>>>>> the impression that the hit vector given to category::patternSearch 
>>>>>>>>>>> was _always_ already decompressed since it is ultimately a 
>>>>>>>>>>> bitvector that has been created from scratch for query evaluation 
>>>>>>>>>>> (it would need verification though). My guess is that the few 
>>>>>>>>>>> percent of performance i'm loosing here are attributable to the 
>>>>>>>>>>> check (hits.isCompressed() && hits.bytes()*mult + bv->bytes() > 
>>>>>>>>>>> hits.size()), since it gets executed _A LOT_ of times. I'll try to 
>>>>>>>>>>> investigate it a little bit further.
>>>>>>>>>>
>>>>>>>>>> I have rearranged the tests for decompression in layers which
>>>>>>>>>> hopefully will eliminate the need to perform more expensive tests in
>>>>>>>>>> your case that presumably involve a fairly small number of values.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>>> _______________________________________________
>>>>>>> FastBit-users mailing list
>>>>>>> [email protected]
>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to