Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "ThomasBoose/EERD model components to Cassandra Column family's" page has been changed by ThomasBoose. http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Cassandra%20Column%20family%27s?action=diff&rev1=10&rev2=11 -------------------------------------------------- If this is not yet making sence, read on. + + - == Indexing == + === Indexing === In order to add an index to a column, other then the ColumnFamily key, we should to create a second ColumnFamily. Every insert, which can be either an insert or update in Cassandra, on the original ColumnFamily we will update the corresponding index. Think of a ColumnFamily cf_Person (examples in Python using pycassa) @@ -25, +27 @@ cfi_Person_City.insert ('Haarlem', {'234':''}) }}} This way a hash will be created containing columns for every person's key that lives in a specific City. The ColumnFamily architecture of Cassandra can store a unlimited number of columns for each key. This meens that when deleting a person it's reference in the cfi_Person_City index should be removed first. When updating a person, maybe moving to anothor City, we have to remove the element from the cfi_Person_City first and then store it with the corresponding new City.'' '' + + === Deleting values === + Because of the way Cassandra clusters operate it is nearly impossible to delete values and know for sure the values are deleted on everynode. If values would simply be deleted and afterwards a node turns up which still holds the given value it would replicate ths value back to the existing nodes. Read more about deleting values at: DistributedDeletes + + Cassandra on the other hand is isanely fast at inserting and updating values. This is why I would advise any programmer trying to build DBMS logic to introduce a value that "meens " deleted and have you DBMS tier respond "Does not exist" in both cases, when values actualy do not exist or contain the "deleted" value. Cassandra makes no distinction between updates and inserts so updating from "deleted" and inserting can be achieved the same way. == Relations == === 1 on 1 === @@ -45, +52 @@ ||John ||0555-123456 ||10.000 || ||<style="text-align: center;" |2>321-21-4321 ||name ||phone ||salary || ||Jane ||0555-654321 ||12.000 || - ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CF_Phone ''' ''' || @@ -92, +98 @@ ||Jane ||SE ||Amsterdam || - ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CF_School_Unit ''' ''' || ||<style="text-align: center;" |2>SE ||name ||loc || ||software engineering ||hsl || - ||||||<tablewidth="400px" tablestyle="text-align: left;"style="text-align: center;">CFK_School_Unit_Student ''' ''' || ||<style="text-align: center;" |2>SE ||123-12-1234 ||321-21-4321 || - || || || + || || || @@ -118, +122 @@ As it is perfectly valid in a relational database to have a key value composed of several columns, in cassandra there is only one key per ColumnFamily. I did already discuss the need to create seperate ColumnFamily's for one to many relationships given the fact that you can never tell for sure whether or not maybe in the future a new relation will popup to another entity sharing the same keyvalues. This means that we will need 5 ColumnFamily's to implement the model above: CF_Order is a straight forward ColumnFamily, Modeled after the design - ||||||||<style="text-align: center;">CF_Order ''' ''' || ||<style="text-align: center;" |2>1234 ||order_date ||order_discount ||customer_id || ||20100808 ||0.4 ||1234 || - ||<style="text-align: center;" |2>4321 ||customer_id ||order_date || || + ||<style="text-align: center;" |2>4321 ||customer_id ||order_date || || - ||3451 ||20100802 || || + ||3451 ||20100802 || || - ||<style="text-align: center;" |2>1354 ||order_discount ||order_date || customer_id|| + ||<style="text-align: center;" |2>1354 ||order_discount ||order_date ||customer_id || - ||3 ||20100802 || 3451|| + ||3 ||20100802 ||3451 || The same for CF_Product - ||||||||<style="text-align: center;">CF_Product ''' ''' || ||<style="text-align: center;" |2>DSK_SGT_5GB_7200 ||description ||image ||list_price || - ||5 GB Seagate harddisk || ||130 || + ||5 GB Seagate harddisk || ||130 || ||<style="text-align: center;" |2>KBD_LGT_WRL_350 ||description ||image ||list_price || - ||Wireless keyboard || ||75 || + ||Wireless keyboard || ||75 || As we cannot itterate throu a ColumnFamily based on any attribute except its key and we wan't to use a solution to relations that is generic we create a seperate ColumnFamily for each one to many relationship. As we know that a many to many relationship is typicly solved by introducing 2 one to many relationships to a newly created ColumnFamily. So there will be one ColumnFamily to store which orders a product can be found on. - ||||||||<style="text-align: center;">CF_Product_Order ''' ''' || - ||<style="text-align: center;" |2>DSK_SGT_5GB_7200 ||1234 ||4321 || 1354|| + ||<style="text-align: center;" |2>DSK_SGT_5GB_7200 ||1234 ||4321 ||1354 || - || || || || + || || || || - ||<style="text-align: center;" |2>KBD_LGT_WRL_350 ||4321 || || || + ||<style="text-align: center;" |2>KBD_LGT_WRL_350 ||4321 || || || - || || || || + || || || || And we'll create one ColumnFamily that stores which products can be found on each order. We can see that this solution tends to invite anomalies. product that have 3 order columns but no reference to the product in the CF_Order_Product ColumnFamily. It up to the programmers of the DBMS tier to make sure genic code is available to keep the ColumnFamily's consistent al of the time. - - ||||||||<style="text-align: center;">CF_Order_Product|| + ||||||||<style="text-align: center;">CF_Order_Product || ||<style="text-align: center;" |2>1234 ||DSK_SGT_5GB_7200 || || || || || || || ||<style="text-align: center;" |2>4321 ||KBD_LGT_WRL_350 ||DSK_SGT_5GB_7200 || || @@ -163, +163 @@ || || || || + + ''' '''Last but not least we'll introduce a ColumnFamily to store the attributes connected to the relationship between Product and Order. As mentioned it is not possible to create a key containing 2 seperate values. So we'll have to introduce a new value that conatenates both values using a strikt format. These formats should be part of a design if implementing a (E)ERD in Cassandra. - ||||||||<style="text-align: center;">CF_order_line''' ''' || - ||<style="text-align: center;" |2>1234_DSK_SGT_5GB_7200 ||number ||delivery_date || || + ||<style="text-align: center;" |2>1234_DSK_SGT_5GB_7200 ||number ||delivery_date || || - ||5 || 20101215|| || + ||5 ||20101215 || || ||<style="text-align: center;" |2>4321_KBD_LGT_WRL_350 ||product_discount ||number ||delivery_date || - ||0.3 || 1||20100901 || + ||0.3 ||1 ||20100901 || ||<style="text-align: center;" |2>4321_DSK_SGT_5GB_7200 ||product_discount ||number ||delivery_date || - ||5 ||2||20100901 || + ||5 ||2 ||20100901 || ||<style="text-align: center;" |2>1354_DSK_SGT_5GB_7200 ||product_discount ||number ||delivery_date || - ||5 ||8||20100901 || + ||5 ||8 ||20100901 ||