Hey John,

I'm testing r500 right now and i have a few issues:

1. In category::stringSearch:
    else if (ind == 1 && dic.size() == 1) { // special case with one value
        getNullMask(hits); // all valid entries
    }
We can't do this shortcut, this is not nulls that make an error but also empty 
strings (which is what i have in my test case). When i remove these 3 lines, it 
works again

2. Most of my select distinct (using bundles) that returns empty strings (key 
0) on categories were broken. I think this is related to the fact that the 
bundle used to write "<NULL>" for key 0 instead of "". Unfortunately, in the C 
API (fastbit_result_set_getString), the empty string is the mark of an error, 
so there is no way to distinguish them anymore. In this particular case, this 
is probably the CAPI which is wrong, maybe we should return a null pointer on 
error instead of an empty string. If i restore the else in colUInts::write 
(else out << "<NULL>";), my queries works again

3. This is not really an issue for me, but it seems also that the quoting of 
string in colUInts::write got removed. I'm perfectly fine with that, i used to 
try to strip quotes. It actually happened at r468 but my code was checking if 
the string was quoted or not before trying to remove them, so it didn't break 
anything.

Except these, it seems all good !

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Friday, March 23, 2012 2:11 PM
To: Dominique Prunier
Cc: FastBit Users
Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
ibis::bundle::getUInt for CATEGORY

I would presume that you can jump to r500 and see how it goes..

On 3/23/12 11:03 AM, Dominique Prunier wrote:
> Thanks John, I'll try that soon. Right know i'm working on revision 497 
> because i though the sloppyCount thing was not finished yet. Should i jump to 
> r500 or simply apply the r500 on top of r497 ?
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Friday, March 23, 2012 1:45 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
> ibis::bundle::getUInt for CATEGORY
> 
> Hi, Dominique,
> 
> Just changed in edits to items 1 and 2 I mentioned earlier.  The
> updated code is SVN revision 500.  Need to work on something else for
> the rest of the day.  Please give new code a try when you get the
> chance and let us know how it goes.
> 
> Thanks.
> 
> John
> 
> 
> On 3/23/12 9:57 AM, Dominique Prunier wrote:
>> Hi John,
>>
>> Thank you very much for considering my use case valid, even if it means more 
>> work to fix Andrew's one. I appreciate it.
>>
>> On top of my head, i think getKey needs a bit more bullet proofing: check 
>> dic == 0, call prepareMembers, ... But i'm pretty sure it'll work and won't 
>> break write spec.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]] 
>> Sent: Friday, March 23, 2012 12:48 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>> ibis::bundle::getUInt for CATEGORY
>>
>> Hi, Dominique,
>>
>> Thanks for the clarification.  Looks like here are my immediate
>> actionable items.
>>
>> - allow ibis::bundle to create integer representation of categorical
>> values (because each bundle can only be associated with one data
>> partition and therefore no possibility of inconsistent dictionaries)
>>
>> - in colUInts::write allow categorical values to make use of
>> dictionaries more directly (through getKey instead of getString)
>>
>> - add function to synchronize the dictionaries when multiple data
>> partitions are involved
>>
>> The first two items should take a couple of hours to implement and
>> test, the last one might take longer.  Stay tuned.
>>
>> John
>>
>>
>>
>> On 3/22/12 4:58 PM, Dominique Prunier wrote:
>>> Hey John,
>>>
>>> Don't worry, that is exactly why i'm building my query test suite. To make 
>>> sure i catch these sooner than later.
>>>
>>> About the use of uints for categories, this is actually a very convenient 
>>> feature that i would not want to see disappear (the issue i raised only 
>>> applies to bundles (select distinct) since retrieving uints with 
>>> category::selectUInts still works fine).

>>>
>>> I understand that it raises concerns for the multi partition scenario, but 
>>> i'm pretty sure there is a way reconcialiate both use cases.
>>>
>>> I have two concerns with the change introduced in colValues.cpp in r495: 
>>> the potential performance hit in bundles for not using a colUInts (even 
>>> though this is pure guess, i have not measured it yet) and the inability to 
>>> get a string key for category columns when using bundles (which ultimately 
>>> has a performance impact for me).
>>>
>>> Using uints allows an application to handle smaller objects much more 
>>> efficiently. This applies to FastBit itself where the select distinct based 
>>> on uints is extremely efficient. In my application, i retrieves uints 
>>> instead of strings too because the this is much faster if i have processing 
>>> to do on the keys before retrieving the actual string. 
>>>
>>> About the change in colUInts::write i don't think i'm breaking the spec. 
>>> I'm writing getKey((*array)[i]), not getKey(i) so this is still the ith row 
>>> as promised.
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]] 
>>> Sent: Thursday, March 22, 2012 7:37 PM
>>> To: Dominique Prunier
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>>> ibis::bundle::getUInt for CATEGORY
>>>
>>> Hi, Dominique,
>>>
>>> Thanks for your patience while we work through these issues.
>>>
>>> SVN Revision 495 has to be replaced because the strategy of using
>>> integer representation for categorical values does not work in the
>>> general case.  That was the main reason for moving all the cases of
>>> ibis::CATEGORY back together with ibis::TEXT.  This was triggered by
>>> the test case from Andrew Olson last week.
>>>
>>> When there are multiple data partitions in a data table, the
>>> dictionaries from all partitions must agree, otherwise, the integer
>>> value of 1 for on data partition can mean a different thing in the
>>> next data partition.  The current code allows one to work with string
>>> values as order by keys no matter whether the dictionaries agree or not.
>>>
>>> The particular function, colStrings::write, you've patched can not be
>>> modified this way.  It would violate the promise of writing the ith
>>> row of the string-valued column.  Your change will print the ith word
>>> in the dictionary.
>>>
>>> If you can describe a little bit more of the issues you are having, we
>>> will see how we might resolve them.
>>>
>>> John
>>>
>>>
>>>
>>> On 3/22/12 3:50 PM, Dominique Prunier wrote:
>>>> Oh, sorry, it was r495, not line 495. I reverted the change introduced in 
>>>> src/colValues.cpp at revision 495 to restore functionality.
>>>>
>>>> However, it ended up being a bit more complex than that because of r496 
>>>> which changed the behavior of category::getString (use row index, not 
>>>> key). The problem, after i reverted colValues, is that the bundle call the 
>>>> colUInt::getString which call category::getString with the key, not the 
>>>> row index. The follwoing change should fix it but prevent use of colUINT 
>>>> with TEXT columns (i don't think it was used, not sure it even makes 
>>>> sense):
>>>>
>>>> diff --git a/src/colValues.h b/src/colValues.h
>>>> index ff7c1c3..232be61 100755
>>>> --- a/src/colValues.h
>>>> +++ b/src/colValues.h
>>>> @@ -239,10 +239,9 @@ public:
>>>>      }
>>>>      /// Write the ith element as text.
>>>>      virtual void write(std::ostream& out, uint32_t i) const {
>>>> -       if (col->type() == ibis::CATEGORY || col->type() == ibis::TEXT) {
>>>> -           std::string str;
>>>> -           col->getString((*array)[i], str);
>>>> -           if (str.empty()) {
>>>> +       if (col->type() == ibis::CATEGORY) {
>>>> +           const char * str = reinterpret_cast<const 
>>>> ibis::category*>(col)->getKey((*array)[i]);
>>>> +           if (str == 0 || (*str) == 0) {
>>>>                 out << "<NULL>";
>>>>             }
>>>>             else {
>>>>
>>>> All my test case works after this final change. I have attached the 
>>>> complete patch i'm applying on top of r497 to make everything work. I'm 
>>>> not sure i'm not breaking something else though (especially the things i 
>>>> reverted from r495).
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]] 
>>>> Sent: Thursday, March 22, 2012 6:43 PM
>>>> To: FastBit Users
>>>> Cc: Dominique Prunier
>>>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>>>> ibis::bundle::getUInt for CATEGORY
>>>>
>>>> Hi, Dominique,
>>>>
>>>> I see the problem of assign a nil pointer to a string object.  I would
>>>> prefer to handle this case a little earlier in the code.  The change
>>>> will appear in the next SVN check in.
>>>>
>>>> Would you mind check the file and line number in the original message?
>>>>  I don't see any problem with src/colValues.cpp line 495 in my copy of
>>>> the code..
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>> On 3/22/12 3:04 PM, Dominique Prunier wrote:
>>>>> While i’m at it, the following patch fixes the NUL string case (key 0)
>>>>>
>>>>>  
>>>>>
>>>>> diff --git a/src/category.cpp b/src/category.cpp
>>>>>
>>>>> index d8c1a9f..e7d28a1 100644
>>>>>
>>>>> --- a/src/category.cpp
>>>>>
>>>>> +++ b/src/category.cpp
>>>>>
>>>>> @@ -715,7 +715,7 @@ void ibis::category::getString(uint32_t i,
>>>>> std::string &str) const {
>>>>>
>>>>>         ibis::array_t<uint32_t> ints;
>>>>>
>>>>>         ierr = ibis::fileManager::instance().getFile(fnm.c_str(), ints);
>>>>>
>>>>>         if (ierr >= 0 && ints.size() == thePart->nRows()) {
>>>>>
>>>>> -           if (i < ints.size()) {
>>>>>
>>>>> +           if (i < ints.size() && ints[i] > 0) {
>>>>>
>>>>>                 str = dic[ints[i]];
>>>>>
>>>>>             }
>>>>>
>>>>>             return;
>>>>>
>>>>>  
>>>>>
>>>>> Thanks,
>>>>>
>>>>>  
>>>>>
>>>>> *From:*[email protected]
>>>>> [mailto:[email protected]] *On Behalf Of *Dominique
>>>>> Prunier
>>>>> *Sent:* Thursday, March 22, 2012 5:59 PM
>>>>> *To:* FastBit Users
>>>>> *Subject:* [FastBit-users] Changes in src/colValues.cpp@495 breaks
>>>>> ibis::bundle::getUInt for CATEGORY
>>>>>
>>>>>  
>>>>>
>>>>> Hey John,
>>>>>
>>>>>  
>>>>>
>>>>> The title says pretty much everything. I’m not sure why this has been
>>>>> changed (this is a huge changeset in a part of the lib i don’t really
>>>>> know), but it breaks the ability to access a CATEGORY string key
>>>>> though bundles. If i revert only this file, it seems to work.
>>>>>
>>>>>  
>>>>>
>>>>> Thanks,
>>>>>
>>>>>  
>>>>>
>>>>> */Dominique Prunier/**//*
>>>>>
>>>>>  APG Lead Developper
>>>>>
>>>>> Logo-W4N-100dpi
>>>>>
>>>>>  4388, rue Saint-Denis
>>>>>
>>>>>  Bureau 309
>>>>>
>>>>>  Montreal (Quebec)  H2J 2L1
>>>>>
>>>>>  Tel. +1 514-842-6767  x310
>>>>>
>>>>>  Fax +1 514-842-3989
>>>>>
>>>>>  [email protected] <mailto:[email protected]>
>>>>>
>>>>>  www.watch4net.com <http://www.watch4net.com/>
>>>>>
>>>>> / /
>>>>>
>>>>> /This message is for the designated recipient only and may contain
>>>>> privileged, proprietary, or otherwise private information. If you have
>>>>> received it in error, please notify the sender immediately and delete
>>>>> the original. Any other use of this electronic mail by you is prohibited.
>>>>>
>>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>>> contenir des informations privilégiées, propriétaires ou autrement
>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>>>> courrier électronique par vous est prohibée.///
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to