Hey John, Don't worry, that is exactly why i'm building my query test suite. To make sure i catch these sooner than later.
About the use of uints for categories, this is actually a very convenient feature that i would not want to see disappear (the issue i raised only applies to bundles (select distinct) since retrieving uints with category::selectUInts still works fine). I understand that it raises concerns for the multi partition scenario, but i'm pretty sure there is a way reconcialiate both use cases. I have two concerns with the change introduced in colValues.cpp in r495: the potential performance hit in bundles for not using a colUInts (even though this is pure guess, i have not measured it yet) and the inability to get a string key for category columns when using bundles (which ultimately has a performance impact for me). Using uints allows an application to handle smaller objects much more efficiently. This applies to FastBit itself where the select distinct based on uints is extremely efficient. In my application, i retrieves uints instead of strings too because the this is much faster if i have processing to do on the keys before retrieving the actual string. About the change in colUInts::write i don't think i'm breaking the spec. I'm writing getKey((*array)[i]), not getKey(i) so this is still the ith row as promised. Thanks, -----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Thursday, March 22, 2012 7:37 PM To: Dominique Prunier Cc: FastBit Users Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks ibis::bundle::getUInt for CATEGORY Hi, Dominique, Thanks for your patience while we work through these issues. SVN Revision 495 has to be replaced because the strategy of using integer representation for categorical values does not work in the general case. That was the main reason for moving all the cases of ibis::CATEGORY back together with ibis::TEXT. This was triggered by the test case from Andrew Olson last week. When there are multiple data partitions in a data table, the dictionaries from all partitions must agree, otherwise, the integer value of 1 for on data partition can mean a different thing in the next data partition. The current code allows one to work with string values as order by keys no matter whether the dictionaries agree or not. The particular function, colStrings::write, you've patched can not be modified this way. It would violate the promise of writing the ith row of the string-valued column. Your change will print the ith word in the dictionary. If you can describe a little bit more of the issues you are having, we will see how we might resolve them. John On 3/22/12 3:50 PM, Dominique Prunier wrote: > Oh, sorry, it was r495, not line 495. I reverted the change introduced in > src/colValues.cpp at revision 495 to restore functionality. > > However, it ended up being a bit more complex than that because of r496 which > changed the behavior of category::getString (use row index, not key). The > problem, after i reverted colValues, is that the bundle call the > colUInt::getString which call category::getString with the key, not the row > index. The follwoing change should fix it but prevent use of colUINT with > TEXT columns (i don't think it was used, not sure it even makes sense): > > diff --git a/src/colValues.h b/src/colValues.h > index ff7c1c3..232be61 100755 > --- a/src/colValues.h > +++ b/src/colValues.h > @@ -239,10 +239,9 @@ public: > } > /// Write the ith element as text. > virtual void write(std::ostream& out, uint32_t i) const { > - if (col->type() == ibis::CATEGORY || col->type() == ibis::TEXT) { > - std::string str; > - col->getString((*array)[i], str); > - if (str.empty()) { > + if (col->type() == ibis::CATEGORY) { > + const char * str = reinterpret_cast<const > ibis::category*>(col)->getKey((*array)[i]); > + if (str == 0 || (*str) == 0) { > out << "<NULL>"; > } > else { > > All my test case works after this final change. I have attached the complete > patch i'm applying on top of r497 to make everything work. I'm not sure i'm > not breaking something else though (especially the things i reverted from > r495). > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Thursday, March 22, 2012 6:43 PM > To: FastBit Users > Cc: Dominique Prunier > Subject: Re: [FastBit-users] Changes in src/colValues.cpp@495 breaks > ibis::bundle::getUInt for CATEGORY > > Hi, Dominique, > > I see the problem of assign a nil pointer to a string object. I would > prefer to handle this case a little earlier in the code. The change > will appear in the next SVN check in. > > Would you mind check the file and line number in the original message? > I don't see any problem with src/colValues.cpp line 495 in my copy of > the code.. > > John > > > > On 3/22/12 3:04 PM, Dominique Prunier wrote: >> While i’m at it, the following patch fixes the NUL string case (key 0) >> >> >> >> diff --git a/src/category.cpp b/src/category.cpp >> >> index d8c1a9f..e7d28a1 100644 >> >> --- a/src/category.cpp >> >> +++ b/src/category.cpp >> >> @@ -715,7 +715,7 @@ void ibis::category::getString(uint32_t i, >> std::string &str) const { >> >> ibis::array_t<uint32_t> ints; >> >> ierr = ibis::fileManager::instance().getFile(fnm.c_str(), ints); >> >> if (ierr >= 0 && ints.size() == thePart->nRows()) { >> >> - if (i < ints.size()) { >> >> + if (i < ints.size() && ints[i] > 0) { >> >> str = dic[ints[i]]; >> >> } >> >> return; >> >> >> >> Thanks, >> >> >> >> *From:*[email protected] >> [mailto:[email protected]] *On Behalf Of *Dominique >> Prunier >> *Sent:* Thursday, March 22, 2012 5:59 PM >> *To:* FastBit Users >> *Subject:* [FastBit-users] Changes in src/colValues.cpp@495 breaks >> ibis::bundle::getUInt for CATEGORY >> >> >> >> Hey John, >> >> >> >> The title says pretty much everything. I’m not sure why this has been >> changed (this is a huge changeset in a part of the lib i don’t really >> know), but it breaks the ability to access a CATEGORY string key >> though bundles. If i revert only this file, it seems to work. >> >> >> >> Thanks, >> >> >> >> */Dominique Prunier/**//* >> >> APG Lead Developper >> >> Logo-W4N-100dpi >> >> 4388, rue Saint-Denis >> >> Bureau 309 >> >> Montreal (Quebec) H2J 2L1 >> >> Tel. +1 514-842-6767 x310 >> >> Fax +1 514-842-3989 >> >> [email protected] <mailto:[email protected]> >> >> www.watch4net.com <http://www.watch4net.com/> >> >> / / >> >> /This message is for the designated recipient only and may contain >> privileged, proprietary, or otherwise private information. If you have >> received it in error, please notify the sender immediately and delete >> the original. Any other use of this electronic mail by you is prohibited. >> >> //Ce message est pour le récipiendaire désigné seulement et peut >> contenir des informations privilégiées, propriétaires ou autrement >> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >> immédiatement et effacez l'original. Toute autre utilisation de ce >> courrier électronique par vous est prohibée./// >> >> >> >> >> >> _______________________________________________ >> FastBit-users mailing list >> [email protected] >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
