Hi, Dominique, Just changed in edits to items 1 and 2 I mentioned earlier. The updated code is SVN revision 500. Need to work on something else for the rest of the day. Please give new code a try when you get the chance and let us know how it goes.
Thanks. John On 3/23/12 9:57 AM, Dominique Prunier wrote: > Hi John, > > Thank you very much for considering my use case valid, even if it means more > work to fix Andrew's one. I appreciate it. > > On top of my head, i think getKey needs a bit more bullet proofing: check dic > == 0, call prepareMembers, ... But i'm pretty sure it'll work and won't break > write spec. > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Friday, March 23, 2012 12:48 PM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks > ibis::bundle::getUInt for CATEGORY > > Hi, Dominique, > > Thanks for the clarification. Looks like here are my immediate > actionable items. > > - allow ibis::bundle to create integer representation of categorical > values (because each bundle can only be associated with one data > partition and therefore no possibility of inconsistent dictionaries) > > - in colUInts::write allow categorical values to make use of > dictionaries more directly (through getKey instead of getString) > > - add function to synchronize the dictionaries when multiple data > partitions are involved > > The first two items should take a couple of hours to implement and > test, the last one might take longer. Stay tuned. > > John > > > > On 3/22/12 4:58 PM, Dominique Prunier wrote: >> Hey John, >> >> Don't worry, that is exactly why i'm building my query test suite. To make >> sure i catch these sooner than later. >> >> About the use of uints for categories, this is actually a very convenient >> feature that i would not want to see disappear (the issue i raised only >> applies to bundles (select distinct) since retrieving uints with >> category::selectUInts still works fine). >> >> I understand that it raises concerns for the multi partition scenario, but >> i'm pretty sure there is a way reconcialiate both use cases. >> >> I have two concerns with the change introduced in colValues.cpp in r495: the >> potential performance hit in bundles for not using a colUInts (even though >> this is pure guess, i have not measured it yet) and the inability to get a >> string key for category columns when using bundles (which ultimately has a >> performance impact for me). >> >> Using uints allows an application to handle smaller objects much more >> efficiently. This applies to FastBit itself where the select distinct based >> on uints is extremely efficient. In my application, i retrieves uints >> instead of strings too because the this is much faster if i have processing >> to do on the keys before retrieving the actual string. >> >> About the change in colUInts::write i don't think i'm breaking the spec. I'm >> writing getKey((*array)[i]), not getKey(i) so this is still the ith row as >> promised. >> >> Thanks, >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Thursday, March 22, 2012 7:37 PM >> To: Dominique Prunier >> Cc: FastBit Users >> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks >> ibis::bundle::getUInt for CATEGORY >> >> Hi, Dominique, >> >> Thanks for your patience while we work through these issues. >> >> SVN Revision 495 has to be replaced because the strategy of using >> integer representation for categorical values does not work in the >> general case. That was the main reason for moving all the cases of >> ibis::CATEGORY back together with ibis::TEXT. This was triggered by >> the test case from Andrew Olson last week. >> >> When there are multiple data partitions in a data table, the >> dictionaries from all partitions must agree, otherwise, the integer >> value of 1 for on data partition can mean a different thing in the >> next data partition. The current code allows one to work with string >> values as order by keys no matter whether the dictionaries agree or not. >> >> The particular function, colStrings::write, you've patched can not be >> modified this way. It would violate the promise of writing the ith >> row of the string-valued column. Your change will print the ith word >> in the dictionary. >> >> If you can describe a little bit more of the issues you are having, we >> will see how we might resolve them. >> >> John >> >> >> >> On 3/22/12 3:50 PM, Dominique Prunier wrote: >>> Oh, sorry, it was r495, not line 495. I reverted the change introduced in >>> src/colValues.cpp at revision 495 to restore functionality. >>> >>> However, it ended up being a bit more complex than that because of r496 >>> which changed the behavior of category::getString (use row index, not key). >>> The problem, after i reverted colValues, is that the bundle call the >>> colUInt::getString which call category::getString with the key, not the row >>> index. The follwoing change should fix it but prevent use of colUINT with >>> TEXT columns (i don't think it was used, not sure it even makes sense): >>> >>> diff --git a/src/colValues.h b/src/colValues.h >>> index ff7c1c3..232be61 100755 >>> --- a/src/colValues.h >>> +++ b/src/colValues.h >>> @@ -239,10 +239,9 @@ public: >>> } >>> /// Write the ith element as text. >>> virtual void write(std::ostream& out, uint32_t i) const { >>> - if (col->type() == ibis::CATEGORY || col->type() == ibis::TEXT) { >>> - std::string str; >>> - col->getString((*array)[i], str); >>> - if (str.empty()) { >>> + if (col->type() == ibis::CATEGORY) { >>> + const char * str = reinterpret_cast<const >>> ibis::category*>(col)->getKey((*array)[i]); >>> + if (str == 0 || (*str) == 0) { >>> out << "<NULL>"; >>> } >>> else { >>> >>> All my test case works after this final change. I have attached the >>> complete patch i'm applying on top of r497 to make everything work. I'm not >>> sure i'm not breaking something else though (especially the things i >>> reverted from r495). >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Thursday, March 22, 2012 6:43 PM >>> To: FastBit Users >>> Cc: Dominique Prunier >>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks >>> ibis::bundle::getUInt for CATEGORY >>> >>> Hi, Dominique, >>> >>> I see the problem of assign a nil pointer to a string object. I would >>> prefer to handle this case a little earlier in the code. The change >>> will appear in the next SVN check in. >>> >>> Would you mind check the file and line number in the original message? >>> I don't see any problem with src/colValues.cpp line 495 in my copy of >>> the code.. >>> >>> John >>> >>> >>> >>> On 3/22/12 3:04 PM, Dominique Prunier wrote: >>>> While i’m at it, the following patch fixes the NUL string case (key 0) >>>> >>>> >>>> >>>> diff --git a/src/category.cpp b/src/category.cpp >>>> >>>> index d8c1a9f..e7d28a1 100644 >>>> >>>> --- a/src/category.cpp >>>> >>>> +++ b/src/category.cpp >>>> >>>> @@ -715,7 +715,7 @@ void ibis::category::getString(uint32_t i, >>>> std::string &str) const { >>>> >>>> ibis::array_t<uint32_t> ints; >>>> >>>> ierr = ibis::fileManager::instance().getFile(fnm.c_str(), ints); >>>> >>>> if (ierr >= 0 && ints.size() == thePart->nRows()) { >>>> >>>> - if (i < ints.size()) { >>>> >>>> + if (i < ints.size() && ints[i] > 0) { >>>> >>>> str = dic[ints[i]]; >>>> >>>> } >>>> >>>> return; >>>> >>>> >>>> >>>> Thanks, >>>> >>>> >>>> >>>> *From:*[email protected] >>>> [mailto:[email protected]] *On Behalf Of *Dominique >>>> Prunier >>>> *Sent:* Thursday, March 22, 2012 5:59 PM >>>> *To:* FastBit Users >>>> *Subject:* [FastBit-users] Changes in src/colValues.cpp@495 breaks >>>> ibis::bundle::getUInt for CATEGORY >>>> >>>> >>>> >>>> Hey John, >>>> >>>> >>>> >>>> The title says pretty much everything. I’m not sure why this has been >>>> changed (this is a huge changeset in a part of the lib i don’t really >>>> know), but it breaks the ability to access a CATEGORY string key >>>> though bundles. If i revert only this file, it seems to work. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> >>>> >>>> */Dominique Prunier/**//* >>>> >>>> APG Lead Developper >>>> >>>> Logo-W4N-100dpi >>>> >>>> 4388, rue Saint-Denis >>>> >>>> Bureau 309 >>>> >>>> Montreal (Quebec) H2J 2L1 >>>> >>>> Tel. +1 514-842-6767 x310 >>>> >>>> Fax +1 514-842-3989 >>>> >>>> [email protected] <mailto:[email protected]> >>>> >>>> www.watch4net.com <http://www.watch4net.com/> >>>> >>>> / / >>>> >>>> /This message is for the designated recipient only and may contain >>>> privileged, proprietary, or otherwise private information. If you have >>>> received it in error, please notify the sender immediately and delete >>>> the original. Any other use of this electronic mail by you is prohibited. >>>> >>>> //Ce message est pour le récipiendaire désigné seulement et peut >>>> contenir des informations privilégiées, propriétaires ou autrement >>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >>>> immédiatement et effacez l'original. Toute autre utilisation de ce >>>> courrier électronique par vous est prohibée./// >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> FastBit-users mailing list >>>> [email protected] >>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
