Hi John,

Thank you very much for considering my use case valid, even if it means more 
work to fix Andrew's one. I appreciate it.

On top of my head, i think getKey needs a bit more bullet proofing: check dic 
== 0, call prepareMembers, ... But i'm pretty sure it'll work and won't break 
write spec.

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Friday, March 23, 2012 12:48 PM
To: Dominique Prunier
Cc: FastBit Users
Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
ibis::bundle::getUInt for CATEGORY

Hi, Dominique,

Thanks for the clarification.  Looks like here are my immediate
actionable items.

- allow ibis::bundle to create integer representation of categorical
values (because each bundle can only be associated with one data
partition and therefore no possibility of inconsistent dictionaries)

- in colUInts::write allow categorical values to make use of
dictionaries more directly (through getKey instead of getString)

- add function to synchronize the dictionaries when multiple data
partitions are involved

The first two items should take a couple of hours to implement and
test, the last one might take longer.  Stay tuned.

John



On 3/22/12 4:58 PM, Dominique Prunier wrote:
> Hey John,
> 
> Don't worry, that is exactly why i'm building my query test suite. To make 
> sure i catch these sooner than later.
> 
> About the use of uints for categories, this is actually a very convenient 
> feature that i would not want to see disappear (the issue i raised only 
> applies to bundles (select distinct) since retrieving uints with 
> category::selectUInts still works fine).
> 
> I understand that it raises concerns for the multi partition scenario, but 
> i'm pretty sure there is a way reconcialiate both use cases.
> 
> I have two concerns with the change introduced in colValues.cpp in r495: the 
> potential performance hit in bundles for not using a colUInts (even though 
> this is pure guess, i have not measured it yet) and the inability to get a 
> string key for category columns when using bundles (which ultimately has a 
> performance impact for me).
> 
> Using uints allows an application to handle smaller objects much more 
> efficiently. This applies to FastBit itself where the select distinct based 
> on uints is extremely efficient. In my application, i retrieves uints instead 
> of strings too because the this is much faster if i have processing to do on 
> the keys before retrieving the actual string. 
> 
> About the change in colUInts::write i don't think i'm breaking the spec. I'm 
> writing getKey((*array)[i]), not getKey(i) so this is still the ith row as 
> promised.
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Thursday, March 22, 2012 7:37 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
> ibis::bundle::getUInt for CATEGORY
> 
> Hi, Dominique,
> 
> Thanks for your patience while we work through these issues.
> 
> SVN Revision 495 has to be replaced because the strategy of using
> integer representation for categorical values does not work in the
> general case.  That was the main reason for moving all the cases of
> ibis::CATEGORY back together with ibis::TEXT.  This was triggered by
> the test case from Andrew Olson last week.
> 
> When there are multiple data partitions in a data table, the
> dictionaries from all partitions must agree, otherwise, the integer
> value of 1 for on data partition can mean a different thing in the
> next data partition.  The current code allows one to work with string
> values as order by keys no matter whether the dictionaries agree or not.
> 
> The particular function, colStrings::write, you've patched can not be
> modified this way.  It would violate the promise of writing the ith
> row of the string-valued column.  Your change will print the ith word
> in the dictionary.
> 
> If you can describe a little bit more of the issues you are having, we
> will see how we might resolve them.
> 
> John
> 
> 
> 
> On 3/22/12 3:50 PM, Dominique Prunier wrote:
>> Oh, sorry, it was r495, not line 495. I reverted the change introduced in 
>> src/colValues.cpp at revision 495 to restore functionality.
>>
>> However, it ended up being a bit more complex than that because of r496 
>> which changed the behavior of category::getString (use row index, not key). 
>> The problem, after i reverted colValues, is that the bundle call the 
>> colUInt::getString which call category::getString with the key, not the row 
>> index. The follwoing change should fix it but prevent use of colUINT with 
>> TEXT columns (i don't think it was used, not sure it even makes sense):
>>
>> diff --git a/src/colValues.h b/src/colValues.h
>> index ff7c1c3..232be61 100755
>> --- a/src/colValues.h
>> +++ b/src/colValues.h
>> @@ -239,10 +239,9 @@ public:
>>      }
>>      /// Write the ith element as text.
>>      virtual void write(std::ostream& out, uint32_t i) const {
>> -       if (col->type() == ibis::CATEGORY || col->type() == ibis::TEXT) {
>> -           std::string str;
>> -           col->getString((*array)[i], str);
>> -           if (str.empty()) {
>> +       if (col->type() == ibis::CATEGORY) {
>> +           const char * str = reinterpret_cast<const 
>> ibis::category*>(col)->getKey((*array)[i]);
>> +           if (str == 0 || (*str) == 0) {
>>                 out << "<NULL>";
>>             }
>>             else {
>>
>> All my test case works after this final change. I have attached the complete 
>> patch i'm applying on top of r497 to make everything work. I'm not sure i'm 
>> not breaking something else though (especially the things i reverted from 
>> r495).
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]] 
>> Sent: Thursday, March 22, 2012 6:43 PM
>> To: FastBit Users
>> Cc: Dominique Prunier
>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks 
>> ibis::bundle::getUInt for CATEGORY
>>
>> Hi, Dominique,
>>
>> I see the problem of assign a nil pointer to a string object.  I would
>> prefer to handle this case a little earlier in the code.  The change
>> will appear in the next SVN check in.
>>
>> Would you mind check the file and line number in the original message?
>>  I don't see any problem with src/colValues.cpp line 495 in my copy of
>> the code..
>>
>> John
>>
>>
>>
>> On 3/22/12 3:04 PM, Dominique Prunier wrote:
>>> While i’m at it, the following patch fixes the NUL string case (key 0)
>>>
>>>  
>>>
>>> diff --git a/src/category.cpp b/src/category.cpp
>>>
>>> index d8c1a9f..e7d28a1 100644
>>>
>>> --- a/src/category.cpp
>>>
>>> +++ b/src/category.cpp
>>>
>>> @@ -715,7 +715,7 @@ void ibis::category::getString(uint32_t i,
>>> std::string &str) const {
>>>
>>>         ibis::array_t<uint32_t> ints;
>>>
>>>         ierr = ibis::fileManager::instance().getFile(fnm.c_str(), ints);
>>>
>>>         if (ierr >= 0 && ints.size() == thePart->nRows()) {
>>>
>>> -           if (i < ints.size()) {
>>>
>>> +           if (i < ints.size() && ints[i] > 0) {
>>>
>>>                 str = dic[ints[i]];
>>>
>>>             }
>>>
>>>             return;
>>>
>>>  
>>>
>>> Thanks,
>>>
>>>  
>>>
>>> *From:*[email protected]
>>> [mailto:[email protected]] *On Behalf Of *Dominique
>>> Prunier
>>> *Sent:* Thursday, March 22, 2012 5:59 PM
>>> *To:* FastBit Users
>>> *Subject:* [FastBit-users] Changes in src/colValues.cpp@495 breaks
>>> ibis::bundle::getUInt for CATEGORY
>>>
>>>  
>>>
>>> Hey John,
>>>
>>>  
>>>
>>> The title says pretty much everything. I’m not sure why this has been
>>> changed (this is a huge changeset in a part of the lib i don’t really
>>> know), but it breaks the ability to access a CATEGORY string key
>>> though bundles. If i revert only this file, it seems to work.
>>>
>>>  
>>>
>>> Thanks,
>>>
>>>  
>>>
>>> */Dominique Prunier/**//*
>>>
>>>  APG Lead Developper
>>>
>>> Logo-W4N-100dpi
>>>
>>>  4388, rue Saint-Denis
>>>
>>>  Bureau 309
>>>
>>>  Montreal (Quebec)  H2J 2L1
>>>
>>>  Tel. +1 514-842-6767  x310
>>>
>>>  Fax +1 514-842-3989
>>>
>>>  [email protected] <mailto:[email protected]>
>>>
>>>  www.watch4net.com <http://www.watch4net.com/>
>>>
>>> / /
>>>
>>> /This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise private information. If you have
>>> received it in error, please notify the sender immediately and delete
>>> the original. Any other use of this electronic mail by you is prohibited.
>>>
>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>> contenir des informations privilégiées, propriétaires ou autrement
>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>> courrier électronique par vous est prohibée.///
>>>
>>>  
>>>
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to