I would presume that you can jump to r500 and see how it goes..

On 3/23/12 11:03 AM, Dominique Prunier wrote:
> Thanks John, I'll try that soon. Right know i'm working on revision 497 
> because i though the sloppyCount thing was not finished yet. Should i jump to 
> r500 or simply apply the r500 on top of r497 ?
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Friday, March 23, 2012 1:45 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
> ibis::bundle::getUInt for CATEGORY
> 
> Hi, Dominique,
> 
> Just changed in edits to items 1 and 2 I mentioned earlier.  The
> updated code is SVN revision 500.  Need to work on something else for
> the rest of the day.  Please give new code a try when you get the
> chance and let us know how it goes.
> 
> Thanks.
> 
> John
> 
> 
> On 3/23/12 9:57 AM, Dominique Prunier wrote:
>> Hi John,
>>
>> Thank you very much for considering my use case valid, even if it means more 
>> work to fix Andrew's one. I appreciate it.
>>
>> On top of my head, i think getKey needs a bit more bullet proofing: check 
>> dic == 0, call prepareMembers, ... But i'm pretty sure it'll work and won't 
>> break write spec.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]] 
>> Sent: Friday, March 23, 2012 12:48 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>> ibis::bundle::getUInt for CATEGORY
>>
>> Hi, Dominique,
>>
>> Thanks for the clarification.  Looks like here are my immediate
>> actionable items.
>>
>> - allow ibis::bundle to create integer representation of categorical
>> values (because each bundle can only be associated with one data
>> partition and therefore no possibility of inconsistent dictionaries)
>>
>> - in colUInts::write allow categorical values to make use of
>> dictionaries more directly (through getKey instead of getString)
>>
>> - add function to synchronize the dictionaries when multiple data
>> partitions are involved
>>
>> The first two items should take a couple of hours to implement and
>> test, the last one might take longer.  Stay tuned.
>>
>> John
>>
>>
>>
>> On 3/22/12 4:58 PM, Dominique Prunier wrote:
>>> Hey John,
>>>
>>> Don't worry, that is exactly why i'm building my query test suite. To make 
>>> sure i catch these sooner than later.
>>>
>>> About the use of uints for categories, this is actually a very convenient 
>>> feature that i would not want to see disappear (the issue i raised only 
>>> applies to bundles (select distinct) since retrieving uints with 
>>> category::selectUInts still works fine).
>>>
>>> I understand that it raises concerns for the multi partition scenario, but 
>>> i'm pretty sure there is a way reconcialiate both use cases.
>>>
>>> I have two concerns with the change introduced in colValues.cpp in r495: 
>>> the potential performance hit in bundles for not using a colUInts (even 
>>> though this is pure guess, i have not measured it yet) and the inability to 
>>> get a string key for category columns when using bundles (which ultimately 
>>> has a performance impact for me).
>>>
>>> Using uints allows an application to handle smaller objects much more 
>>> efficiently. This applies to FastBit itself where the select distinct based 
>>> on uints is extremely efficient. In my application, i retrieves uints 
>>> instead of strings too because the this is much faster if i have processing 
>>> to do on the keys before retrieving the actual string. 
>>>
>>> About the change in colUInts::write i don't think i'm breaking the spec. 
>>> I'm writing getKey((*array)[i]), not getKey(i) so this is still the ith row 
>>> as promised.
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]] 
>>> Sent: Thursday, March 22, 2012 7:37 PM
>>> To: Dominique Prunier
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>>> ibis::bundle::getUInt for CATEGORY
>>>
>>> Hi, Dominique,
>>>
>>> Thanks for your patience while we work through these issues.
>>>
>>> SVN Revision 495 has to be replaced because the strategy of using
>>> integer representation for categorical values does not work in the
>>> general case.  That was the main reason for moving all the cases of
>>> ibis::CATEGORY back together with ibis::TEXT.  This was triggered by
>>> the test case from Andrew Olson last week.
>>>
>>> When there are multiple data partitions in a data table, the
>>> dictionaries from all partitions must agree, otherwise, the integer
>>> value of 1 for on data partition can mean a different thing in the
>>> next data partition.  The current code allows one to work with string
>>> values as order by keys no matter whether the dictionaries agree or not.
>>>
>>> The particular function, colStrings::write, you've patched can not be
>>> modified this way.  It would violate the promise of writing the ith
>>> row of the string-valued column.  Your change will print the ith word
>>> in the dictionary.
>>>
>>> If you can describe a little bit more of the issues you are having, we
>>> will see how we might resolve them.
>>>
>>> John
>>>
>>>
>>>
>>> On 3/22/12 3:50 PM, Dominique Prunier wrote:
>>>> Oh, sorry, it was r495, not line 495. I reverted the change introduced in 
>>>> src/colValues.cpp at revision 495 to restore functionality.
>>>>
>>>> However, it ended up being a bit more complex than that because of r496 
>>>> which changed the behavior of category::getString (use row index, not 
>>>> key). The problem, after i reverted colValues, is that the bundle call the 
>>>> colUInt::getString which call category::getString with the key, not the 
>>>> row index. The follwoing change should fix it but prevent use of colUINT 
>>>> with TEXT columns (i don't think it was used, not sure it even makes 
>>>> sense):
>>>>
>>>> diff --git a/src/colValues.h b/src/colValues.h
>>>> index ff7c1c3..232be61 100755
>>>> --- a/src/colValues.h
>>>> +++ b/src/colValues.h
>>>> @@ -239,10 +239,9 @@ public:
>>>>      }
>>>>      /// Write the ith element as text.
>>>>      virtual void write(std::ostream& out, uint32_t i) const {
>>>> -       if (col->type() == ibis::CATEGORY || col->type() == ibis::TEXT) {
>>>> -           std::string str;
>>>> -           col->getString((*array)[i], str);
>>>> -           if (str.empty()) {
>>>> +       if (col->type() == ibis::CATEGORY) {
>>>> +           const char * str = reinterpret_cast<const 
>>>> ibis::category*>(col)->getKey((*array)[i]);
>>>> +           if (str == 0 || (*str) == 0) {
>>>>                 out << "<NULL>";
>>>>             }
>>>>             else {
>>>>
>>>> All my test case works after this final change. I have attached the 
>>>> complete patch i'm applying on top of r497 to make everything work. I'm 
>>>> not sure i'm not breaking something else though (especially the things i 
>>>> reverted from r495).
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]] 
>>>> Sent: Thursday, March 22, 2012 6:43 PM
>>>> To: FastBit Users
>>>> Cc: Dominique Prunier
>>>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>>>> ibis::bundle::getUInt for CATEGORY
>>>>
>>>> Hi, Dominique,
>>>>
>>>> I see the problem of assign a nil pointer to a string object.  I would
>>>> prefer to handle this case a little earlier in the code.  The change
>>>> will appear in the next SVN check in.
>>>>
>>>> Would you mind check the file and line number in the original message?
>>>>  I don't see any problem with src/colValues.cpp line 495 in my copy of
>>>> the code..
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>> On 3/22/12 3:04 PM, Dominique Prunier wrote:
>>>>> While i’m at it, the following patch fixes the NUL string case (key 0)
>>>>>
>>>>>  
>>>>>
>>>>> diff --git a/src/category.cpp b/src/category.cpp
>>>>>
>>>>> index d8c1a9f..e7d28a1 100644
>>>>>
>>>>> --- a/src/category.cpp
>>>>>
>>>>> +++ b/src/category.cpp
>>>>>
>>>>> @@ -715,7 +715,7 @@ void ibis::category::getString(uint32_t i,
>>>>> std::string &str) const {
>>>>>
>>>>>         ibis::array_t<uint32_t> ints;
>>>>>
>>>>>         ierr = ibis::fileManager::instance().getFile(fnm.c_str(), ints);
>>>>>
>>>>>         if (ierr >= 0 && ints.size() == thePart->nRows()) {
>>>>>
>>>>> -           if (i < ints.size()) {
>>>>>
>>>>> +           if (i < ints.size() && ints[i] > 0) {
>>>>>
>>>>>                 str = dic[ints[i]];
>>>>>
>>>>>             }
>>>>>
>>>>>             return;
>>>>>
>>>>>  
>>>>>
>>>>> Thanks,
>>>>>
>>>>>  
>>>>>
>>>>> *From:*[email protected]
>>>>> [mailto:[email protected]] *On Behalf Of *Dominique
>>>>> Prunier
>>>>> *Sent:* Thursday, March 22, 2012 5:59 PM
>>>>> *To:* FastBit Users
>>>>> *Subject:* [FastBit-users] Changes in src/colValues.cpp@495 breaks
>>>>> ibis::bundle::getUInt for CATEGORY
>>>>>
>>>>>  
>>>>>
>>>>> Hey John,
>>>>>
>>>>>  
>>>>>
>>>>> The title says pretty much everything. I’m not sure why this has been
>>>>> changed (this is a huge changeset in a part of the lib i don’t really
>>>>> know), but it breaks the ability to access a CATEGORY string key
>>>>> though bundles. If i revert only this file, it seems to work.
>>>>>
>>>>>  
>>>>>
>>>>> Thanks,
>>>>>
>>>>>  
>>>>>
>>>>> */Dominique Prunier/**//*
>>>>>
>>>>>  APG Lead Developper
>>>>>
>>>>> Logo-W4N-100dpi
>>>>>
>>>>>  4388, rue Saint-Denis
>>>>>
>>>>>  Bureau 309
>>>>>
>>>>>  Montreal (Quebec)  H2J 2L1
>>>>>
>>>>>  Tel. +1 514-842-6767  x310
>>>>>
>>>>>  Fax +1 514-842-3989
>>>>>
>>>>>  [email protected] <mailto:[email protected]>
>>>>>
>>>>>  www.watch4net.com <http://www.watch4net.com/>
>>>>>
>>>>> / /
>>>>>
>>>>> /This message is for the designated recipient only and may contain
>>>>> privileged, proprietary, or otherwise private information. If you have
>>>>> received it in error, please notify the sender immediately and delete
>>>>> the original. Any other use of this electronic mail by you is prohibited.
>>>>>
>>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>>> contenir des informations privilégiées, propriétaires ou autrement
>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>>>> courrier électronique par vous est prohibée.///
>>>>>
>>>>>  
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to