Re: Fixing the data model names

Jonathan Ellis Thu, 13 Aug 2009 06:20:04 -0700

The assumption that within a CF only IColumns of the same type (C or
SC) will be compared is baked in pretty deeply.


-Jonathan

On Wed, Aug 12, 2009 at 11:23 PM, Evan Weaver<[email protected]> wrote:
> Incidentally, is there any specific reason the collation has to be
> pre-defined at the CF? What if any column could be an optional
> supercolumn with a collation set at runtime? Then all CFs would be the
> same.
>
> Evan
>
> On Wed, Aug 12, 2009 at 10:02 PM, Jonathan Ellis<[email protected]> wrote:
>> If thrift were sane it would look something like
>>
>> struct Column {
>>  byte[] name,
>>  optional list<Column> subcolumns,
>>  optional int64 timestamp,
>>  optional byte[] value
>> }
>>
>> "you can either have the subcolumns, or the timestamp and value" seems
>> reasonable to me.
>>
>> of course in the real world, thrift can't do recursive structures, so
>> we'd have to go with Column/SubColumn like SuperColumn/Column today.
>> So... maybe not really an improvement after all. :)
>>
>> (Why am I not surprised to find out that protocol buffers does support
>> this?  Sigh.)
>>
>> On Wed, Aug 12, 2009 at 8:51 PM, Evan Weaver<[email protected]> wrote:
>>> Hmm, my Ruby client internally refers to columns and subcolumns,
>>> rather than supercolumns and columns...mainly because the subcolumn
>>> position is optional, but the column_or_supercolumn position is not.
>>> So there is something we agree on.
>>>
>>> Do you think the lack of a timestamp in the supercolumn is confusing?
>>> It's still not exactly a kind of column.
>>>
>>> Evan
>>>
>>> On Wed, Aug 12, 2009 at 9:47 PM, Jonathan Ellis<[email protected]> wrote:
>>>> I agree with the proposition that the SuperColumn name is weak.
>>>> (Although not, as I mentioned, Column or ColumnFamily.)  And I could
>>>> go with schema over keyspace.
>>>>
>>>> One option to deal with SC would be to excise the term SC (and SCF
>>>> from the config) and instead just have Columns, which may or may not
>>>> have SubColumns.  You would define this as
>>>>
>>>> <ColumnFamily withSubColumns="true" .../>
>>>>
>>>> "Insert a subcolumn named A into the Column named B" fits pretty well
>>>> with how I think of things working.  And now you just have Rows and
>>>> Columns!  Just like a RDB! :P
>>>>
>>>> -Jonathan
>>>>
>>>> On Wed, Aug 12, 2009 at 8:34 PM, Evan Weaver<[email protected]> wrote:
>>>>> Points taken, and I agree, except in my experience the current names
>>>>> are not Pretty Good but rather Pretty Weird; the primary issues being
>>>>> column family and super column.
>>>>>
>>>>> If we go by the shorter-is-better principle, we might get:
>>>>>
>>>>> Cluster
>>>>> Schema
>>>>> Row set
>>>>> Row w/key
>>>>> Field set
>>>>> Field
>>>>>
>>>>> "You take the user's key, and use that to insert into the Row Set
>>>>> 'user_associations' at Field Set 'user_timeline,' a field named with a
>>>>> time-based UUID representing now, and with a value of the new tweet's
>>>>> key."
>>>>>
>>>>> But let me study for a while and come up with a more researched proposal.
>>>>>
>>>>> Evan
>>>>>
>>>>> On Wed, Aug 12, 2009 at 9:21 PM, Jonathan Ellis<[email protected]> wrote:
>>>>>> On Wed, Aug 12, 2009 at 7:52 PM, Michael 
>>>>>> Koziarski<[email protected]> wrote:
>>>>>>> However I think it's worth considering this from a strategic
>>>>>>> perspective, looking at how we want the project do grow and change,
>>>>>>> rather than just as it is right now.  The key to successful adoption
>>>>>>> is having a successful elevator pitch,  you can start using a database
>>>>>>> without understanding relational-algebra because 'table' and 'column'
>>>>>>> are such simple ways to reason about the tool.  As it stands
>>>>>>> cassandra's takes a whiteboard and 15 minutes, before people get what
>>>>>>> you're talking about.
>>>>>>
>>>>>> If you want to explain it as "sort of like a relational db" then
>>>>>>
>>>>>> table -> CF
>>>>>> column -> column
>>>>>> key -> key
>>>>>> row -> row
>>>>>>
>>>>>> That's the simple case, then all you have is "supercolumns can contain
>>>>>> a list of simple columns."
>>>>>>
>>>>>> That really doesn't seem so hard to me.  I have explained this to 
>>>>>> *managers*.
>>>>>>
>>>>>>> Assuming the project gets anything like the adoption it deserves, the
>>>>>>> users we have today will be a *tiny minority* of the users we have in
>>>>>>> the future.  So imposing costs on the current userbase which will give
>>>>>>> huge benefits to future users, should be something we're willing to
>>>>>>> do.  In fact it's something that has been done repeatedly over the
>>>>>>> last few weeks.
>>>>>>
>>>>>> I agree.  But as I said before I just don't see this as being an 
>>>>>> improvement.
>>>>>>
>>>>>>> Given those changes went in without debate, I'm not sure what the
>>>>>>> reluctance is for making changes to the nomenclature for the project.
>>>>>>
>>>>>> As above.
>>>>>>
>>>>>>> Speaking as someone who's only been doing this a month, the naming is
>>>>>>> *still* confusing, and when I talk with people who wonder what
>>>>>>> cassandra is all about I get blank looks when telling them what things
>>>>>>> are called.  If you step back and want to tell someone how you'd
>>>>>>> insert a tweet into someone's timeline using evan's weblog post:
>>>>>>>
>>>>>>>  "You just take the user's key, and use that to insert into the
>>>>>>> SuperColumnFamily 'UserAssociations' at SubColumn 'user_timeline', a
>>>>>>> ColumnName of a time based uuid representing now, and a value of the
>>>>>>> new tweet's key"
>>>>>>>
>>>>>>> Column is in the name of 3 of the 5 concepts expressed, and in each
>>>>>>> cases it's different.
>>>>>>
>>>>>> When you're inserting something nested 3 levels deep a certain amount
>>>>>> of verbosity is unavoidable.  With Evan's nomenclature,
>>>>>>
>>>>>> "You take the user's record ID, and use that to insert into the Record
>>>>>> Collection 'user associations' at Attribute Collection
>>>>>> 'user_timeline,' an Attribute named with a time based uuid
>>>>>> representing now, and with a value of the new tweet's key."
>>>>>>
>>>>>> I think that is a negative improvement.  Yay, now we are talking about
>>>>>> Attribute Collections and Attributes instead of SuperColumns and
>>>>>> Columns.  The same objections ("one object's name contains the
>>>>>> other's!) apply, plus the new one of sounding so generic that it could
>>>>>> apply to practically any system.
>>>>>>
>>>>>> -Jonathan
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Evan Weaver
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Evan Weaver
>>>
>>
>
>
>
> --
> Evan Weaver
>

Re: Fixing the data model names

Reply via email to