Thanks John, I'll try that soon. Right know i'm working on revision 497 because 
i though the sloppyCount thing was not finished yet. Should i jump to r500 or 
simply apply the r500 on top of r497 ?

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Friday, March 23, 2012 1:45 PM
To: Dominique Prunier
Cc: FastBit Users
Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
ibis::bundle::getUInt for CATEGORY

Hi, Dominique,

Just changed in edits to items 1 and 2 I mentioned earlier.  The
updated code is SVN revision 500.  Need to work on something else for
the rest of the day.  Please give new code a try when you get the
chance and let us know how it goes.

Thanks.

John


On 3/23/12 9:57 AM, Dominique Prunier wrote:
> Hi John,
> 
> Thank you very much for considering my use case valid, even if it means more 
> work to fix Andrew's one. I appreciate it.
> 
> On top of my head, i think getKey needs a bit more bullet proofing: check dic 
> == 0, call prepareMembers, ... But i'm pretty sure it'll work and won't break 
> write spec.
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Friday, March 23, 2012 12:48 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
> ibis::bundle::getUInt for CATEGORY
> 
> Hi, Dominique,
> 
> Thanks for the clarification.  Looks like here are my immediate
> actionable items.
> 
> - allow ibis::bundle to create integer representation of categorical
> values (because each bundle can only be associated with one data
> partition and therefore no possibility of inconsistent dictionaries)
> 
> - in colUInts::write allow categorical values to make use of
> dictionaries more directly (through getKey instead of getString)
> 
> - add function to synchronize the dictionaries when multiple data
> partitions are involved
> 
> The first two items should take a couple of hours to implement and
> test, the last one might take longer.  Stay tuned.
> 
> John
> 
> 
> 
> On 3/22/12 4:58 PM, Dominique Prunier wrote:
>> Hey John,
>>
>> Don't worry, that is exactly why i'm building my query test suite. To make 
>> sure i catch these sooner than later.
>>
>> About the use of uints for categories, this is actually a very convenient 
>> feature that i would not want to see disappear (the issue i raised only 
>> applies to bundles (select distinct) since retrieving uints with 
>> category::selectUInts still works fine).
>>
>> I understand that it raises concerns for the multi partition scenario, but 
>> i'm pretty sure there is a way reconcialiate both use cases.
>>
>> I have two concerns with the change introduced in colValues.cpp in r495: the 
>> potential performance hit in bundles for not using a colUInts (even though 
>> this is pure guess, i have not measured it yet) and the inability to get a 
>> string key for category columns when using bundles (which ultimately has a 
>> performance impact for me).
>>
>> Using uints allows an application to handle smaller objects much more 
>> efficiently. This applies to FastBit itself where the select distinct based 
>> on uints is extremely efficient. In my application, i retrieves uints 
>> instead of strings too because the this is much faster if i have processing 
>> to do on the keys before retrieving the actual string. 
>>
>> About the change in colUInts::write i don't think i'm breaking the spec. I'm 
>> writing getKey((*array)[i]), not getKey(i) so this is still the ith row as 
>> promised.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]] 
>> Sent: Thursday, March 22, 2012 7:37 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>> ibis::bundle::getUInt for CATEGORY
>>
>> Hi, Dominique,
>>
>> Thanks for your patience while we work through these issues.
>>
>> SVN Revision 495 has to be replaced because the strategy of using
>> integer representation for categorical values does not work in the
>> general case.  That was the main reason for moving all the cases of
>> ibis::CATEGORY back together with ibis::TEXT.  This was triggered by
>> the test case from Andrew Olson last week.
>>
>> When there are multiple data partitions in a data table, the
>> dictionaries from all partitions must agree, otherwise, the integer
>> value of 1 for on data partition can mean a different thing in the
>> next data partition.  The current code allows one to work with string
>> values as order by keys no matter whether the dictionaries agree or not.
>>
>> The particular function, colStrings::write, you've patched can not be
>> modified this way.  It would violate the promise of writing the ith
>> row of the string-valued column.  Your change will print the ith word
>> in the dictionary.
>>
>> If you can describe a little bit more of the issues you are having, we
>> will see how we might resolve them.
>>
>> John
>>
>>
>>
>> On 3/22/12 3:50 PM, Dominique Prunier wrote:
>>> Oh, sorry, it was r495, not line 495. I reverted the change introduced in 
>>> src/colValues.cpp at revision 495 to restore functionality.
>>>
>>> However, it ended up being a bit more complex than that because of r496 
>>> which changed the behavior of category::getString (use row index, not key). 
>>> The problem, after i reverted colValues, is that the bundle call the 
>>> colUInt::getString which call category::getString with the key, not the row 
>>> index. The follwoing change should fix it but prevent use of colUINT with 
>>> TEXT columns (i don't think it was used, not sure it even makes sense):
>>>
>>> diff --git a/src/colValues.h b/src/colValues.h
>>> index ff7c1c3..232be61 100755
>>> --- a/src/colValues.h
>>> +++ b/src/colValues.h
>>> @@ -239,10 +239,9 @@ public:
>>>      }
>>>      /// Write the ith element as text.
>>>      virtual void write(std::ostream& out, uint32_t i) const {
>>> -       if (col->type() == ibis::CATEGORY || col->type() == ibis::TEXT) {
>>> -           std::string str;
>>> -           col->getString((*array)[i], str);
>>> -           if (str.empty()) {
>>> +       if (col->type() == ibis::CATEGORY) {
>>> +           const char * str = reinterpret_cast<const 
>>> ibis::category*>(col)->getKey((*array)[i]);
>>> +           if (str == 0 || (*str) == 0) {
>>>                 out << "<NULL>";
>>>             }
>>>             else {
>>>
>>> All my test case works after this final change. I have attached the 
>>> complete patch i'm applying on top of r497 to make everything work. I'm not 
>>> sure i'm not breaking something else though (especially the things i 
>>> reverted from r495).
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]] 
>>> Sent: Thursday, March 22, 2012 6:43 PM
>>> To: FastBit Users
>>> Cc: Dominique Prunier
>>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>>> ibis::bundle::getUInt for CATEGORY
>>>
>>> Hi, Dominique,
>>>
>>> I see the problem of assign a nil pointer to a string object.  I would
>>> prefer to handle this case a little earlier in the code.  The change
>>> will appear in the next SVN check in.
>>>
>>> Would you mind check the file and line number in the original message?
>>>  I don't see any problem with src/colValues.cpp line 495 in my copy of
>>> the code..
>>>
>>> John
>>>
>>>
>>>
>>> On 3/22/12 3:04 PM, Dominique Prunier wrote:
>>>> While i’m at it, the following patch fixes the NUL string case (key 0)
>>>>
>>>>  
>>>>
>>>> diff --git a/src/category.cpp b/src/category.cpp
>>>>
>>>> index d8c1a9f..e7d28a1 100644
>>>>
>>>> --- a/src/category.cpp
>>>>
>>>> +++ b/src/category.cpp
>>>>
>>>> @@ -715,7 +715,7 @@ void ibis::category::getString(uint32_t i,
>>>> std::string &str) const {
>>>>
>>>>         ibis::array_t<uint32_t> ints;
>>>>
>>>>         ierr = ibis::fileManager::instance().getFile(fnm.c_str(), ints);
>>>>
>>>>         if (ierr >= 0 && ints.size() == thePart->nRows()) {
>>>>
>>>> -           if (i < ints.size()) {
>>>>
>>>> +           if (i < ints.size() && ints[i] > 0) {
>>>>
>>>>                 str = dic[ints[i]];
>>>>
>>>>             }
>>>>
>>>>             return;
>>>>
>>>>  
>>>>
>>>> Thanks,
>>>>
>>>>  
>>>>
>>>> *From:*[email protected]
>>>> [mailto:[email protected]] *On Behalf Of *Dominique
>>>> Prunier
>>>> *Sent:* Thursday, March 22, 2012 5:59 PM
>>>> *To:* FastBit Users
>>>> *Subject:* [FastBit-users] Changes in src/colValues.cpp@495 breaks
>>>> ibis::bundle::getUInt for CATEGORY
>>>>
>>>>  
>>>>
>>>> Hey John,
>>>>
>>>>  
>>>>
>>>> The title says pretty much everything. I’m not sure why this has been
>>>> changed (this is a huge changeset in a part of the lib i don’t really
>>>> know), but it breaks the ability to access a CATEGORY string key
>>>> though bundles. If i revert only this file, it seems to work.
>>>>
>>>>  
>>>>
>>>> Thanks,
>>>>
>>>>  
>>>>
>>>> */Dominique Prunier/**//*
>>>>
>>>>  APG Lead Developper
>>>>
>>>> Logo-W4N-100dpi
>>>>
>>>>  4388, rue Saint-Denis
>>>>
>>>>  Bureau 309
>>>>
>>>>  Montreal (Quebec)  H2J 2L1
>>>>
>>>>  Tel. +1 514-842-6767  x310
>>>>
>>>>  Fax +1 514-842-3989
>>>>
>>>>  [email protected] <mailto:[email protected]>
>>>>
>>>>  www.watch4net.com <http://www.watch4net.com/>
>>>>
>>>> / /
>>>>
>>>> /This message is for the designated recipient only and may contain
>>>> privileged, proprietary, or otherwise private information. If you have
>>>> received it in error, please notify the sender immediately and delete
>>>> the original. Any other use of this electronic mail by you is prohibited.
>>>>
>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>> contenir des informations privilégiées, propriétaires ou autrement
>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>>> courrier électronique par vous est prohibée.///
>>>>
>>>>  
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to