I would presume that you can jump to r500 and see how it goes..
On 3/23/12 11:03 AM, Dominique Prunier wrote: > Thanks John, I'll try that soon. Right know i'm working on revision 497 > because i though the sloppyCount thing was not finished yet. Should i jump to > r500 or simply apply the r500 on top of r497 ? > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Friday, March 23, 2012 1:45 PM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks > ibis::bundle::getUInt for CATEGORY > > Hi, Dominique, > > Just changed in edits to items 1 and 2 I mentioned earlier. The > updated code is SVN revision 500. Need to work on something else for > the rest of the day. Please give new code a try when you get the > chance and let us know how it goes. > > Thanks. > > John > > > On 3/23/12 9:57 AM, Dominique Prunier wrote: >> Hi John, >> >> Thank you very much for considering my use case valid, even if it means more >> work to fix Andrew's one. I appreciate it. >> >> On top of my head, i think getKey needs a bit more bullet proofing: check >> dic == 0, call prepareMembers, ... But i'm pretty sure it'll work and won't >> break write spec. >> >> Thanks, >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Friday, March 23, 2012 12:48 PM >> To: Dominique Prunier >> Cc: FastBit Users >> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks >> ibis::bundle::getUInt for CATEGORY >> >> Hi, Dominique, >> >> Thanks for the clarification. Looks like here are my immediate >> actionable items. >> >> - allow ibis::bundle to create integer representation of categorical >> values (because each bundle can only be associated with one data >> partition and therefore no possibility of inconsistent dictionaries) >> >> - in colUInts::write allow categorical values to make use of >> dictionaries more directly (through getKey instead of getString) >> >> - add function to synchronize the dictionaries when multiple data >> partitions are involved >> >> The first two items should take a couple of hours to implement and >> test, the last one might take longer. Stay tuned. >> >> John >> >> >> >> On 3/22/12 4:58 PM, Dominique Prunier wrote: >>> Hey John, >>> >>> Don't worry, that is exactly why i'm building my query test suite. To make >>> sure i catch these sooner than later. >>> >>> About the use of uints for categories, this is actually a very convenient >>> feature that i would not want to see disappear (the issue i raised only >>> applies to bundles (select distinct) since retrieving uints with >>> category::selectUInts still works fine). >>> >>> I understand that it raises concerns for the multi partition scenario, but >>> i'm pretty sure there is a way reconcialiate both use cases. >>> >>> I have two concerns with the change introduced in colValues.cpp in r495: >>> the potential performance hit in bundles for not using a colUInts (even >>> though this is pure guess, i have not measured it yet) and the inability to >>> get a string key for category columns when using bundles (which ultimately >>> has a performance impact for me). >>> >>> Using uints allows an application to handle smaller objects much more >>> efficiently. This applies to FastBit itself where the select distinct based >>> on uints is extremely efficient. In my application, i retrieves uints >>> instead of strings too because the this is much faster if i have processing >>> to do on the keys before retrieving the actual string. >>> >>> About the change in colUInts::write i don't think i'm breaking the spec. >>> I'm writing getKey((*array)[i]), not getKey(i) so this is still the ith row >>> as promised. >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Thursday, March 22, 2012 7:37 PM >>> To: Dominique Prunier >>> Cc: FastBit Users >>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks >>> ibis::bundle::getUInt for CATEGORY >>> >>> Hi, Dominique, >>> >>> Thanks for your patience while we work through these issues. >>> >>> SVN Revision 495 has to be replaced because the strategy of using >>> integer representation for categorical values does not work in the >>> general case. That was the main reason for moving all the cases of >>> ibis::CATEGORY back together with ibis::TEXT. This was triggered by >>> the test case from Andrew Olson last week. >>> >>> When there are multiple data partitions in a data table, the >>> dictionaries from all partitions must agree, otherwise, the integer >>> value of 1 for on data partition can mean a different thing in the >>> next data partition. The current code allows one to work with string >>> values as order by keys no matter whether the dictionaries agree or not. >>> >>> The particular function, colStrings::write, you've patched can not be >>> modified this way. It would violate the promise of writing the ith >>> row of the string-valued column. Your change will print the ith word >>> in the dictionary. >>> >>> If you can describe a little bit more of the issues you are having, we >>> will see how we might resolve them. >>> >>> John >>> >>> >>> >>> On 3/22/12 3:50 PM, Dominique Prunier wrote: >>>> Oh, sorry, it was r495, not line 495. I reverted the change introduced in >>>> src/colValues.cpp at revision 495 to restore functionality. >>>> >>>> However, it ended up being a bit more complex than that because of r496 >>>> which changed the behavior of category::getString (use row index, not >>>> key). The problem, after i reverted colValues, is that the bundle call the >>>> colUInt::getString which call category::getString with the key, not the >>>> row index. The follwoing change should fix it but prevent use of colUINT >>>> with TEXT columns (i don't think it was used, not sure it even makes >>>> sense): >>>> >>>> diff --git a/src/colValues.h b/src/colValues.h >>>> index ff7c1c3..232be61 100755 >>>> --- a/src/colValues.h >>>> +++ b/src/colValues.h >>>> @@ -239,10 +239,9 @@ public: >>>> } >>>> /// Write the ith element as text. >>>> virtual void write(std::ostream& out, uint32_t i) const { >>>> - if (col->type() == ibis::CATEGORY || col->type() == ibis::TEXT) { >>>> - std::string str; >>>> - col->getString((*array)[i], str); >>>> - if (str.empty()) { >>>> + if (col->type() == ibis::CATEGORY) { >>>> + const char * str = reinterpret_cast<const >>>> ibis::category*>(col)->getKey((*array)[i]); >>>> + if (str == 0 || (*str) == 0) { >>>> out << "<NULL>"; >>>> } >>>> else { >>>> >>>> All my test case works after this final change. I have attached the >>>> complete patch i'm applying on top of r497 to make everything work. I'm >>>> not sure i'm not breaking something else though (especially the things i >>>> reverted from r495). >>>> >>>> Thanks, >>>> >>>> -----Original Message----- >>>> From: K. John Wu [mailto:[email protected]] >>>> Sent: Thursday, March 22, 2012 6:43 PM >>>> To: FastBit Users >>>> Cc: Dominique Prunier >>>> Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks >>>> ibis::bundle::getUInt for CATEGORY >>>> >>>> Hi, Dominique, >>>> >>>> I see the problem of assign a nil pointer to a string object. I would >>>> prefer to handle this case a little earlier in the code. The change >>>> will appear in the next SVN check in. >>>> >>>> Would you mind check the file and line number in the original message? >>>> I don't see any problem with src/colValues.cpp line 495 in my copy of >>>> the code.. >>>> >>>> John >>>> >>>> >>>> >>>> On 3/22/12 3:04 PM, Dominique Prunier wrote: >>>>> While i’m at it, the following patch fixes the NUL string case (key 0) >>>>> >>>>> >>>>> >>>>> diff --git a/src/category.cpp b/src/category.cpp >>>>> >>>>> index d8c1a9f..e7d28a1 100644 >>>>> >>>>> --- a/src/category.cpp >>>>> >>>>> +++ b/src/category.cpp >>>>> >>>>> @@ -715,7 +715,7 @@ void ibis::category::getString(uint32_t i, >>>>> std::string &str) const { >>>>> >>>>> ibis::array_t<uint32_t> ints; >>>>> >>>>> ierr = ibis::fileManager::instance().getFile(fnm.c_str(), ints); >>>>> >>>>> if (ierr >= 0 && ints.size() == thePart->nRows()) { >>>>> >>>>> - if (i < ints.size()) { >>>>> >>>>> + if (i < ints.size() && ints[i] > 0) { >>>>> >>>>> str = dic[ints[i]]; >>>>> >>>>> } >>>>> >>>>> return; >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> >>>>> *From:*[email protected] >>>>> [mailto:[email protected]] *On Behalf Of *Dominique >>>>> Prunier >>>>> *Sent:* Thursday, March 22, 2012 5:59 PM >>>>> *To:* FastBit Users >>>>> *Subject:* [FastBit-users] Changes in src/colValues.cpp@495 breaks >>>>> ibis::bundle::getUInt for CATEGORY >>>>> >>>>> >>>>> >>>>> Hey John, >>>>> >>>>> >>>>> >>>>> The title says pretty much everything. I’m not sure why this has been >>>>> changed (this is a huge changeset in a part of the lib i don’t really >>>>> know), but it breaks the ability to access a CATEGORY string key >>>>> though bundles. If i revert only this file, it seems to work. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> >>>>> */Dominique Prunier/**//* >>>>> >>>>> APG Lead Developper >>>>> >>>>> Logo-W4N-100dpi >>>>> >>>>> 4388, rue Saint-Denis >>>>> >>>>> Bureau 309 >>>>> >>>>> Montreal (Quebec) H2J 2L1 >>>>> >>>>> Tel. +1 514-842-6767 x310 >>>>> >>>>> Fax +1 514-842-3989 >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> >>>>> www.watch4net.com <http://www.watch4net.com/> >>>>> >>>>> / / >>>>> >>>>> /This message is for the designated recipient only and may contain >>>>> privileged, proprietary, or otherwise private information. If you have >>>>> received it in error, please notify the sender immediately and delete >>>>> the original. Any other use of this electronic mail by you is prohibited. >>>>> >>>>> //Ce message est pour le récipiendaire désigné seulement et peut >>>>> contenir des informations privilégiées, propriétaires ou autrement >>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >>>>> immédiatement et effacez l'original. Toute autre utilisation de ce >>>>> courrier électronique par vous est prohibée./// >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> FastBit-users mailing list >>>>> [email protected] >>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
