Did you read the previous thread about this? http://markmail.org/thread/qbocotgkan4mg73w
I don't think your proposals are too good...I have a new proposal based on feedback in the previous thread, that I will send soon. But I wanted some comments on the misconceptions themselves. Evan On Tue, Aug 18, 2009 at 1:33 AM, Curt Micol<[email protected]> wrote: > I've been thinking about this for a number of days, and again, while I am not > a > developer I thought I might toss in a proposal if that's okay. > > Since putting together a schema diagram and having a number of people review > it, I think a change is warranted. Too many people are coming from the RDBMS > world and the terms used by Cassandra are conflicting with those terms they > are already familiar with. > > The TLDR version is as follows: > > Object (Column) > ObjectFamily (ColumnFamily) > Directory (Row) > ObjectContainer (SuperColumn) > Namespace (Keyspace) > > The long version... > > Object (Column) > As Evan has stated repeatedly, column is a bit misleading especially when > compared to other types of database systems. I think this is probably the > most important change to the data model names, and exactly where I started > since this is the 'core' of Cassandra. Object gives the impression that this > is a piece of data, it's relatively structured but the name gives no > impression how strict that structure is. 'Objects' have names that have values > and timestamps. Simple and too the point. 'Object' doesn't come with the > preconceived notions that 'column' comes with and leaves room for Cassandra to > define what an 'object' is without any conflict to preexisting data > structures. > > By changing this, we can move up the ladder to other data types and > easily rename them to something that 'contains objects' or 'accesses objects'. > This allows us to describe the data model in the name structure without > having to get too deep into the definition. > > Directory (Row) > 'row' is currently unnamed, but still a structure that exists in the model. > It's not specifically data itself, but more of a mapping of how to get to > objects (using keys). 'Directory' fills this void quite well. It is easily > explained as a path to get to data and not data itself. > > ObjectFamily (ColumnFamily) > There's no argument that the one direct link to the BigTable paper is 'column > families'. It's perhaps the only structure that is virtually the same in both > pieces of software. Considering this, I think we need to avoid too drastic a > change. With that said, I think a change is necessary due to the differences > in columns between the two databases. 'object family' is descriptive of the > relation between objects and removes any reference to tabular structures while > keeping a loose relationship to 'column family' in the BigTable paper. > > ObjectContainer (SuperColumn) > I could see this being shortened to 'container' in every day conversation. > However, 'objectcontainer' fits nicely with the rest of the data model names > and is descriptive of it's purpose and use. Ultimately a 'supercolumn' is > nothing more than a named container of columns (and I've seen on at least 3 > different occasions the word container used to describe supercolumns). > 'supercolumn' had no real connection to what exactly it was defining, but with > 'object container' we have a clear understanding that we are naming the > structure that holds objects. Or as I explained it to a friend, we are naming > the 'jar' and not the 'honey'. :) > > Namespace (Keyspace) > This one I go back and forth on. I know it's been changed from 'Table' to > 'keyspace' and Evan proposed 'database', but I think that 'namespace' is > really what it is we are talking about. Wikipedia has this as the first line > to describe 'namespace': > > A namespace is an abstract container or environment created to hold a > logical grouping of unique identifiers or symbols (i.e., names). > > Originally I thought 'objectspace' would fit better, but I think 'namespace' > comes with a better history and is clearer to what this structure really is. > Especially when you relate the name namespace to how it is used in Ruby, > Python > and Java. Ultimately though, I think I prefer 'keyspace' over 'table' > or 'database'. > > The only issue I see with all of these names is the potential conflict with > programming languages and their objects. I know next to nothing about Java so > I don't know if there would be a conflict here. I've ran the following Google > search 'reserved words in *' where '*' is Ruby, Python, Java and C++ and > received no mention of 'object' being a reserved word in any of those > languages. > > I also grep'd through current source code and there doesn't seem to be any > real conflicts that couldn't be named something else so as not to conflict > with this naming structure. > > In the end, I think it's a good idea to look at this and work out a solution. > Documentation and tutorials are going to help, but I think people are so > entrenched in the RDBMS world that there is somewhat of a barrier to > understanding Cassandra's data model. > > Thanks for your time, > > -- > # Curt Micol > -- Evan Weaver
