[Cassandra Wiki] Trivial Update of "ThomasBoose/EERD mode l components to Cassandra Column family's" by ThomasBoose

Apache Wiki Fri, 10 Dec 2010 06:34:07 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "ThomasBoose/EERD model components to Cassandra Column family's" page has 
been changed by ThomasBoose.
http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Cassandra%20Column%20family%27s?action=diff&rev1=1&rev2=2

--------------------------------------------------

  This page describes model tranformations from EERD concepts into Cassandra 
ColumnFamily concepts. All input is welcome.
  
  == DBMS layer ==
- At several spots in this document you wil find suggestions to implement 
trivial DBMS functionality by hand. At this stage, I would suggest to 
programmers to implement at least 4 tiers when using cassandra as a backend 
server. One would be the database layer by cassandra itself, One would be a 
tier implementing DBMS rules, another for business rules finishing with an 
application tier.
+ At several spots in this document you wil find suggestions to implement 
trivial DBMS functionality by hand. At this stage, I would suggest to 
programmers to implement at least 4 tiers when using Cassandra as a backend 
server. One would be the database layer by cassandra itself, One would be a 
tier implementing DBMS rules, another for business rules finishing with an 
application tier.
  
  In this DBMS tier functions should be available for keeping data consistend 
based on datarules and it would throw exceptions when indexes are changed or 
orders are given to delete key's agains DBMS rules.
  
  If this is not yet making sence, read on.
  
  == Indexing ==
- In order to add an index to a column, other then the columnfamily's key, we 
should to create a second columnfamily. Every insert, which can be either an 
insert or update in cassandra, on the original columnfamily we will update the 
corresponding index.
+ In order to add an index to a column, other then the ColumnFamily key, we 
should to create a second ColumnFamily. Every insert, which can be either an 
insert or update in Cassandra, on the original columnfamily we will update the 
corresponding index.
  
- Think of a columnfamily cf_Person (examples in Python using pycassa)
+ Think of a ColumnFamily cf_Person (examples in Python using pycassa)
  
+ {{{
  cf_Person.insert('234', {'name':'Karel','City:'Haarlem'})
- cfi_Person_City.insert (Haarlem', {'234':''})
+ cfi_Person_City.insert ('Haarlem', {'234':''})
- 
+ }}}
- This way a hash will be created containing columns for every person's key 
that lives in a specific City. The ColumnFamily architecture of Cassandra can 
store a unlimited number of columns for each key. This meens that when deleting 
a person it reference in the cfi_Person_City index should be removed first. 
When updating a person, maybe moving to anothor City, we have to remove the 
element from the cfi_Person_City first and then store it with the corresponding 
new City.
+ This way a hash will be created containing columns for every person's key 
that lives in a specific City. The ColumnFamily architecture of Cassandra can 
store a unlimited number of columns for each key. This meens that when deleting 
a person it's reference in the cfi_Person_City index should be removed first. 
When updating a person, maybe moving to anothor City, we have to remove the 
element from the cfi_Person_City first and then store it with the corresponding 
new City.'' ''
  
  == Relations ==
  === 1 on 1 ===
  Typicly you'll find three kinds of 1 on 1 relations in a relational model. I 
will address them one at a time.
  
  ==== Equal elements ====
- Sometimes all the elements are part of both collections on either side of the 
relationship. The reasons these collections are moddeled seperately are most 
often based on security issues or functional differences. One solution in a 
Cassandra database would be the same as you would implement such a relation in 
an RDBMS. Simply by sharing the same key in both columnfamilies. Inserting a 
key in one of these columnfamily's would insert the same in the other and vise 
versa. Updating an existing key in either columnfamily would not result in any 
change in the other. Deleting a key from one columnfamily will result in 
deleting the same key in the other family as well, providing this would be 
allowed.
+ Sometimes all the elements are part of both collections on either side of the 
relationship. The reasons these collections are moddeled seperately are most 
often based on security issues or functional differences. One solution in a 
Cassandra database would be the same as you would implement such a relation in 
an RDBMS. Simply by sharing the same key in both ColumnFamily'ss. Inserting a 
key in one of these ColumnFamily's would insert the same in the other and vise 
versa. Updating an existing key in either ColumnFamily would not result in any 
change in the other. Deleting a key from one ColumnFamily will result in 
deleting the same key in the other family as well, providing this would be 
allowed.
  
  ''I'm not sure to what detaillevel security rules can apply in a Cassandra 
database. At least I know that one can creat logins per cluster.''
  
@@ -38, +39 @@

  
  In Cassandra modeling you are forced to either croslink both key's, So you 
design both key's foreign in both columnfamily's. Or you create a third 
columnfamily in which you store both keys preceded by a token to which 
columfamily you are refering. Lets focus on the first option. Say we hand out 
phones to our employees and we agree that every employee will always have one 
phone. and phones that are not used are not stored in our columnfamily. The 
phone has a phonenumber as key where the employee has a socialsecurity number. 
In order to know which number to dial when looking for employee X and who is 
calling giving a specific phonenumber we need to store both keys foreign in 
both columnfamily's.
  
- -- CF_Employee -----------------------------
+ -- CF_Employee
+ 
+ ----
+ |             | name | phone      | salary | | 123-12-1234 |John  | 
0555-123456| 10.000 |
+ 
+ ----
  |             | name | phone      | salary |
- | 123-12-1234 |John  | 0555-123456| 10.000 |
- --------------------------------------------
- |             | name | phone      | salary |
-  | 321-21-4321 |Jane  | 0555-654321| 12.000 |
-  --------------------------------------------
  
- -- CF_Phone ---------------------------
+  * | 321-21-4321 |Jane  | 0555-654321| 12.000 |
+ 
+ ----
+ -- CF_Phone
+ 
+ ----
-  |             | employee     | credit |
+  * |             | employee     | credit |
+ 
  | 0555-123456 | 123-12-1234  | 10     |
-  ---------------------------------------
+ 
+  *
+ ----
-   |             | employee     | credit |
+  * |             | employee     | credit |
+ 
  | 0555-654321 | 321-21-4321  | 5      |
-  ---------------------------------------
  
+  *
+ ----
  Using a static columnname and requiring input in the foreign key fields, 
checking the existence of the key in the other columnfamily and processing 
updates and deletes are all subject to programming in the DBMS layer. Cassandra 
itself does not, and probably will not, provide foreign key logic.  One could 
imagine an process that makes sure the cross references stay consistend:
  
  cf_Employee.insert('321-21-4321', {'name':'Jane', 'phone':'0555-654321'})
  
  if cf_Phone.multiget('0555-654321', columns='employee') == {}:
+ 
-    cf_Phone.insert ('0555-654321', {'employee':'321-21-4321'})
+  * cf_Phone.insert ('0555-654321', {'employee':'321-21-4321'})
+ 
  else:
+ 
-    if cf_Phone.get('0555-654321', columns='employee')["Employee"] <> 
'321-21-4321':
+  * if cf_Phone.get('0555-654321', columns='employee')["Employee"] <> 
'321-21-4321':
-        raise error or delete specified employee<<BR>>
+   * raise error or delete specified employee<<BR>>
+ 
  ==== Subset elements ====
- ....
+ ''.... ''

[Cassandra Wiki] Trivial Update of "ThomasBoose/EERD mode l components to Cassandra Column family's" by ThomasBoose

Reply via email to