Re: Fixing the data model names

Mark McBride Tue, 11 Aug 2009 22:37:30 -0700

It seems to me that what would be most helpful, regardless of changes,
is having a document that describes the data model in more detail than
the current data model wiki page.  I can take a stab at creating a new
page that includes examples if that would be useful.


On Tue, Aug 11, 2009 at 10:34 PM, Arin Sarkissian<[email protected]> wrote:
> I agree that the names are pretty horrible for a newbie...
>
> I'll echo the concerns that the RDBMS vernacular messes with a
> newcomer's head. I feel like the words "Row" and "Column" are way too
> loaded since most people have an RDBMS background... BUT
>
> In the BigTable paper we've got the term "Column Family". This term is
> also used in HBase and Hypertable. Since the term's out there in the
> wild I wouldn't feel comfortable ditching it and making something up
> to fill its spot. That would lead to a scenario where folks with
> experience with Hbase, Hypertable and Bigtable get confused (or think
> the naming is dumb) but would lesson the confusion for RDBMS peeps.
> Doesn't sound like the right tradeoff: 4 sets of folks have something
> new to digest instead of 1.
>
> The "bad" terms are "column" and "row". That's where the real issues
> arise... but given the fact that I believe we should keep "column
> family" i have no idea what we'd call the things inside the CF? It
> would be odd as hell to have a CF contain "records" etc. Does that
> mean we should keep it called "column"? IMO w/o an awesome
> alternative, yes.
>
> The word "row" should go away tho...
> When I first started using cassandra I thought that: a key pointed to
> a row and that row had one of each column family. This isn't the case
> but the RDBMS terms + SQL-ish thinking caused me and many other to
> assume as much. Took us a while to figure that out...
>
> But realistically how much of this confusion could be avoided with a
> legit example? Once you see a good example you start getting it. A lot
> of people have been pointed towards the ThriftIterface page on the
> wiki which clears up next to nothing:
> http://wiki.apache.org/cassandra/ThriftInterface . There's stuff like
> "edges", "base_attributes" etc. It's next door to nonsensical..
>
> What if we had a real example that people could relate to... a model a
> blog or something along those lines & update the
> http://wiki.apache.org/cassandra/ThriftInterface page to show how each
> on the API methods would be used to accomplish basic tasks... ex: get
> all comments for a blog entry, list entires in time order, list
> entries tagged "bar", find all entries with "foo" in the body (kinda
> like the Facebook mail search example).
>
> -Arin
>
>
>
> On Tue, Aug 11, 2009 at 10:09 PM, Curt Micol<[email protected]> wrote:
>> Hello,
>>
>> I am hardly a developer, so this isn't directly addressed to me, but
>> if I may comment on a couple of things from an outsider's
>> (non-developer, new to this scale of database) perspective.
>>
>> On Wed, Aug 12, 2009 at 12:38 AM, Eric Evans<[email protected]> wrote:
>>> On Tue, 2009-08-11 at 10:37 -0700, Evan Weaver wrote:
>>>> In my experience, the naming of the data model has been a huge barrier
>>>> to entry for users of Cassandra. This goes both for people familiar
>>>> with SQL, and for people familiar with BigTable. I would like to
>>>> change this before 0.4, since the 0.3 to 0.4 transition is the Great
>>>> API Breakening.
>>
>> I agree that there is a barrier, specifically because most people have
>> no experience with this type of data structure and as you mention are
>> coming from SQL.  Clearer names along with more documentation/examples
>> will help grow the user base of Cassandra quite a bit.
>>
>>>> So technically this is not a bikeshed, because I'm happy to do all the
>>>> work. I'll even submit a patch for Digg's Python client. Since there
>>>> are no production deployments of ASF, and only a couple
>>>> well-maintained clients, now is the time to break the world. A few
>>>> hours of work now will pay off richly in terms of community
>>>> involvement and reduced noob-explanation-time.
>>
>> I would offer my services here also if a change were accepted.
>>
>> And while I don't know what the exact names should be (nor am I
>> qualified tbh), I think they should be clearer than they are. At this
>> point they seem to be a mixture of RDBMS and Document DB terms.  The
>> change to 'keyspace' from 'table' I think was a first step in this
>> process, but it should be taken further and all names normalized
>> across the board to properly represent their relationship with each
>> other. At least that's my very humble opinion.
>>
>> In response to Mr. Evan's comment regarding the Bigtable paper, does
>> the Cassandra community want this to be a requirement for using the
>> software? I would think not.  Sure, most early adopters are coming
>> from that paper, but it shouldn't be a source of entry to use the
>> database, but rather to develop it.
>>
>> Again, my opinion carries little weight, but +1 from this user.
>>
>> Thanks for everyone's hard work, I am really excited to see how this
>> project continues to progress.
>>
>> --
>> # Curt Micol
>>
>

Re: Fixing the data model names

Reply via email to