It seems to me that what would be most helpful, regardless of changes, is having a document that describes the data model in more detail than the current data model wiki page. I can take a stab at creating a new page that includes examples if that would be useful.
On Tue, Aug 11, 2009 at 10:34 PM, Arin Sarkissian<[email protected]> wrote: > I agree that the names are pretty horrible for a newbie... > > I'll echo the concerns that the RDBMS vernacular messes with a > newcomer's head. I feel like the words "Row" and "Column" are way too > loaded since most people have an RDBMS background... BUT > > In the BigTable paper we've got the term "Column Family". This term is > also used in HBase and Hypertable. Since the term's out there in the > wild I wouldn't feel comfortable ditching it and making something up > to fill its spot. That would lead to a scenario where folks with > experience with Hbase, Hypertable and Bigtable get confused (or think > the naming is dumb) but would lesson the confusion for RDBMS peeps. > Doesn't sound like the right tradeoff: 4 sets of folks have something > new to digest instead of 1. > > The "bad" terms are "column" and "row". That's where the real issues > arise... but given the fact that I believe we should keep "column > family" i have no idea what we'd call the things inside the CF? It > would be odd as hell to have a CF contain "records" etc. Does that > mean we should keep it called "column"? IMO w/o an awesome > alternative, yes. > > The word "row" should go away tho... > When I first started using cassandra I thought that: a key pointed to > a row and that row had one of each column family. This isn't the case > but the RDBMS terms + SQL-ish thinking caused me and many other to > assume as much. Took us a while to figure that out... > > But realistically how much of this confusion could be avoided with a > legit example? Once you see a good example you start getting it. A lot > of people have been pointed towards the ThriftIterface page on the > wiki which clears up next to nothing: > http://wiki.apache.org/cassandra/ThriftInterface . There's stuff like > "edges", "base_attributes" etc. It's next door to nonsensical.. > > What if we had a real example that people could relate to... a model a > blog or something along those lines & update the > http://wiki.apache.org/cassandra/ThriftInterface page to show how each > on the API methods would be used to accomplish basic tasks... ex: get > all comments for a blog entry, list entires in time order, list > entries tagged "bar", find all entries with "foo" in the body (kinda > like the Facebook mail search example). > > -Arin > > > > On Tue, Aug 11, 2009 at 10:09 PM, Curt Micol<[email protected]> wrote: >> Hello, >> >> I am hardly a developer, so this isn't directly addressed to me, but >> if I may comment on a couple of things from an outsider's >> (non-developer, new to this scale of database) perspective. >> >> On Wed, Aug 12, 2009 at 12:38 AM, Eric Evans<[email protected]> wrote: >>> On Tue, 2009-08-11 at 10:37 -0700, Evan Weaver wrote: >>>> In my experience, the naming of the data model has been a huge barrier >>>> to entry for users of Cassandra. This goes both for people familiar >>>> with SQL, and for people familiar with BigTable. I would like to >>>> change this before 0.4, since the 0.3 to 0.4 transition is the Great >>>> API Breakening. >> >> I agree that there is a barrier, specifically because most people have >> no experience with this type of data structure and as you mention are >> coming from SQL. Clearer names along with more documentation/examples >> will help grow the user base of Cassandra quite a bit. >> >>>> So technically this is not a bikeshed, because I'm happy to do all the >>>> work. I'll even submit a patch for Digg's Python client. Since there >>>> are no production deployments of ASF, and only a couple >>>> well-maintained clients, now is the time to break the world. A few >>>> hours of work now will pay off richly in terms of community >>>> involvement and reduced noob-explanation-time. >> >> I would offer my services here also if a change were accepted. >> >> And while I don't know what the exact names should be (nor am I >> qualified tbh), I think they should be clearer than they are. At this >> point they seem to be a mixture of RDBMS and Document DB terms. The >> change to 'keyspace' from 'table' I think was a first step in this >> process, but it should be taken further and all names normalized >> across the board to properly represent their relationship with each >> other. At least that's my very humble opinion. >> >> In response to Mr. Evan's comment regarding the Bigtable paper, does >> the Cassandra community want this to be a requirement for using the >> software? I would think not. Sure, most early adopters are coming >> from that paper, but it shouldn't be a source of entry to use the >> database, but rather to develop it. >> >> Again, my opinion carries little weight, but +1 from this user. >> >> Thanks for everyone's hard work, I am really excited to see how this >> project continues to progress. >> >> -- >> # Curt Micol >> >
