The assumption that within a CF only IColumns of the same type (C or SC) will be compared is baked in pretty deeply.
-Jonathan On Wed, Aug 12, 2009 at 11:23 PM, Evan Weaver<[email protected]> wrote: > Incidentally, is there any specific reason the collation has to be > pre-defined at the CF? What if any column could be an optional > supercolumn with a collation set at runtime? Then all CFs would be the > same. > > Evan > > On Wed, Aug 12, 2009 at 10:02 PM, Jonathan Ellis<[email protected]> wrote: >> If thrift were sane it would look something like >> >> struct Column { >> byte[] name, >> optional list<Column> subcolumns, >> optional int64 timestamp, >> optional byte[] value >> } >> >> "you can either have the subcolumns, or the timestamp and value" seems >> reasonable to me. >> >> of course in the real world, thrift can't do recursive structures, so >> we'd have to go with Column/SubColumn like SuperColumn/Column today. >> So... maybe not really an improvement after all. :) >> >> (Why am I not surprised to find out that protocol buffers does support >> this? Sigh.) >> >> On Wed, Aug 12, 2009 at 8:51 PM, Evan Weaver<[email protected]> wrote: >>> Hmm, my Ruby client internally refers to columns and subcolumns, >>> rather than supercolumns and columns...mainly because the subcolumn >>> position is optional, but the column_or_supercolumn position is not. >>> So there is something we agree on. >>> >>> Do you think the lack of a timestamp in the supercolumn is confusing? >>> It's still not exactly a kind of column. >>> >>> Evan >>> >>> On Wed, Aug 12, 2009 at 9:47 PM, Jonathan Ellis<[email protected]> wrote: >>>> I agree with the proposition that the SuperColumn name is weak. >>>> (Although not, as I mentioned, Column or ColumnFamily.) And I could >>>> go with schema over keyspace. >>>> >>>> One option to deal with SC would be to excise the term SC (and SCF >>>> from the config) and instead just have Columns, which may or may not >>>> have SubColumns. You would define this as >>>> >>>> <ColumnFamily withSubColumns="true" .../> >>>> >>>> "Insert a subcolumn named A into the Column named B" fits pretty well >>>> with how I think of things working. And now you just have Rows and >>>> Columns! Just like a RDB! :P >>>> >>>> -Jonathan >>>> >>>> On Wed, Aug 12, 2009 at 8:34 PM, Evan Weaver<[email protected]> wrote: >>>>> Points taken, and I agree, except in my experience the current names >>>>> are not Pretty Good but rather Pretty Weird; the primary issues being >>>>> column family and super column. >>>>> >>>>> If we go by the shorter-is-better principle, we might get: >>>>> >>>>> Cluster >>>>> Schema >>>>> Row set >>>>> Row w/key >>>>> Field set >>>>> Field >>>>> >>>>> "You take the user's key, and use that to insert into the Row Set >>>>> 'user_associations' at Field Set 'user_timeline,' a field named with a >>>>> time-based UUID representing now, and with a value of the new tweet's >>>>> key." >>>>> >>>>> But let me study for a while and come up with a more researched proposal. >>>>> >>>>> Evan >>>>> >>>>> On Wed, Aug 12, 2009 at 9:21 PM, Jonathan Ellis<[email protected]> wrote: >>>>>> On Wed, Aug 12, 2009 at 7:52 PM, Michael >>>>>> Koziarski<[email protected]> wrote: >>>>>>> However I think it's worth considering this from a strategic >>>>>>> perspective, looking at how we want the project do grow and change, >>>>>>> rather than just as it is right now. The key to successful adoption >>>>>>> is having a successful elevator pitch, you can start using a database >>>>>>> without understanding relational-algebra because 'table' and 'column' >>>>>>> are such simple ways to reason about the tool. As it stands >>>>>>> cassandra's takes a whiteboard and 15 minutes, before people get what >>>>>>> you're talking about. >>>>>> >>>>>> If you want to explain it as "sort of like a relational db" then >>>>>> >>>>>> table -> CF >>>>>> column -> column >>>>>> key -> key >>>>>> row -> row >>>>>> >>>>>> That's the simple case, then all you have is "supercolumns can contain >>>>>> a list of simple columns." >>>>>> >>>>>> That really doesn't seem so hard to me. I have explained this to >>>>>> *managers*. >>>>>> >>>>>>> Assuming the project gets anything like the adoption it deserves, the >>>>>>> users we have today will be a *tiny minority* of the users we have in >>>>>>> the future. So imposing costs on the current userbase which will give >>>>>>> huge benefits to future users, should be something we're willing to >>>>>>> do. In fact it's something that has been done repeatedly over the >>>>>>> last few weeks. >>>>>> >>>>>> I agree. But as I said before I just don't see this as being an >>>>>> improvement. >>>>>> >>>>>>> Given those changes went in without debate, I'm not sure what the >>>>>>> reluctance is for making changes to the nomenclature for the project. >>>>>> >>>>>> As above. >>>>>> >>>>>>> Speaking as someone who's only been doing this a month, the naming is >>>>>>> *still* confusing, and when I talk with people who wonder what >>>>>>> cassandra is all about I get blank looks when telling them what things >>>>>>> are called. If you step back and want to tell someone how you'd >>>>>>> insert a tweet into someone's timeline using evan's weblog post: >>>>>>> >>>>>>> "You just take the user's key, and use that to insert into the >>>>>>> SuperColumnFamily 'UserAssociations' at SubColumn 'user_timeline', a >>>>>>> ColumnName of a time based uuid representing now, and a value of the >>>>>>> new tweet's key" >>>>>>> >>>>>>> Column is in the name of 3 of the 5 concepts expressed, and in each >>>>>>> cases it's different. >>>>>> >>>>>> When you're inserting something nested 3 levels deep a certain amount >>>>>> of verbosity is unavoidable. With Evan's nomenclature, >>>>>> >>>>>> "You take the user's record ID, and use that to insert into the Record >>>>>> Collection 'user associations' at Attribute Collection >>>>>> 'user_timeline,' an Attribute named with a time based uuid >>>>>> representing now, and with a value of the new tweet's key." >>>>>> >>>>>> I think that is a negative improvement. Yay, now we are talking about >>>>>> Attribute Collections and Attributes instead of SuperColumns and >>>>>> Columns. The same objections ("one object's name contains the >>>>>> other's!) apply, plus the new one of sounding so generic that it could >>>>>> apply to practically any system. >>>>>> >>>>>> -Jonathan >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Evan Weaver >>>>> >>>> >>> >>> >>> >>> -- >>> Evan Weaver >>> >> > > > > -- > Evan Weaver >
