Re: No Transactions: An Example
Hi All, Making transaction is my actual preoccupation of the moment. My need is : - update data in column family #1 - insert data in column family #2 My need is to see thes opérations in a single transaction because the data is tightly coupled. I use zookeeper/cage to make distributed lock to avoid multiple client inserting or updating on the same data. But there is a problem there is a fail when inserting in column family 2 because i have to rollback updated data of the column family #1. My reading on the subject is that to solve the fail : - Can we really consider that write never fail with cassandra from the time the execution of a mutation happened on a node. What can be the cause of fail at this point? So is it important to thinking about this potential problem? (yes in my opinion but i'm not totally sure). - Make a retry first. Is there really a chance for the second try to succeed if the first fail? - keep the transaction data to have the possibility to rollback programmatically by deleting the inserting data. The problem is on the updated data to rollback because old values are lost. I read what Read before write is a bad idea to save old values before the update. the problem remains, so how to do? Do you have any feedback on this topic? Regards, Jérémy Le jeudi 23 juin 2011 à 16:23 -0700, Les Hazlewood a écrit : Thanks for the pointer Ryan! Regards, Les
Re: No Transactions: An Example
On 6/22/2011 9:18 AM, Trevor Smith wrote: Right -- that's the part that I am more interested in fleshing out in this post. Here is one way. Use MVCC http://en.wikipedia.org/wiki/Multiversion_concurrency_control. A single global clean-up process would be acceptable since it's not a single point of failure, only a single point of accumulating back-logged work and will not affect availability as long as you are notified if that process terminates and restart it in a reasonable amount of time but this will not affect the validity of subsequent reads. So, you would have a balance column. And each update will create a balance_timestamp with a positive or negative value indicating a credit or debit. Subsequent clients will read the latest value by doing a slice from balance to balance_~ (i.e. all balance* columns). (You would have to work-out your column naming conventions so that your slices return only the pertinent columns.) Then, the clients would have to apply all the credits and debits to the balance to get the current balance. This handles the lost update problem. For the dirty read and incorrect summary problems by others reading data that is in the middle of a transaction that hasn't committed yet, I would add a final transaction column to a Transactions CF. The key would be cf.key.column, e.g., Accounts.1234.balance, 1234 being the account # and Accounts being the CF owning the balance column. Then, a new column would be added for each successful transaction (e.g., after debiting and crediting the two accounts) using the same timestamp used in balance_timestamp. So, now, a client wanting the current balance would have to do a slice for all of the transactions for that column and only apply the balance updates up to the latest transaction. Note, you might have to do something else with the transaction naming schemes to make sure they are guaranteed to be unique, but you get the idea. If the transaction fails, the client simply does not add a transaction column to Transactions and deletes any balance_timestamp columns it added to in the Accounts CF (or let's the clean-up process do it... carefully). This should avoid the need for locks and as long as each account doesn't have a crazy amount of updates, the slices shouldn't be so large as to be a significant perf hit. A note about the updates. You have to make sure the clean-up process processes the updates in order and only 1 time. If you can't guarantee these, then you'll have to make sure your updates are idempotent and commutative. Oh yeah, and you must use QUORUM read/writes, of course. Any critiques? aj
Re: No Transactions: An Example
Domonic, Thank you for your answer. I enjoy how in your day to day work you are concerned with who has the monster. It must be a fun to read your productions logs (User[Shelly] received vampire). I looked into Cages and this does seem interesting. I need to do more reading to have a better take. I am wondering though -- are there other situations, not as business critical as the trading of monsters, that still need transactions, but you have decided to not use them? If so, do you have jobs running which check data integrity on some timed basis? Trevor On Wed, Jun 22, 2011 at 1:04 PM, Dominic Williams dwilli...@system7.co.ukwrote: Hi Trevor, I hope to post on my practical experiences in this area soon - we rely heavily on complex serialized operations in FightMyMonster.com. Probably the most simple serialized operation we do is updating nugget balances when, for example, there has been a trade of monsters. Currently we use ZooKeeper/Cages (github.com/s7) to serialize our distributed ops. We don't implement transactions with rollback/commit. Rather, we lock some paths, for example /Users/bank/dominic and /Users/bank/ben, and then write with QUORUM through our Java client library Pelops. This will make several efforts to retry the operation if it fails at first, and in our line of business the fact that redundancy in the cluster means it will nearly always complete eventually is enough. Of course, in a real world money scenario that is not enough and data inconsistency caused by, say, a sudden power outage during the retry phase is not acceptable. To handle this case I would like to extend Cages at some point so that commit/rollback transactions that would be stored inside ZooKeeper are associated with the distributed locks (which are stored persistently and survive power loss for example). There is an old blog post here which talks about it http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/although this needs updating. One interesting point not discussed which I have also not heard mentioned elsewhere is that in order for serialization to work every time, before you release a lock after performing an update you must wait for a brief period = max variance between the clocks on the application nodes updating the database e.g. 1-2ms. This is because Cassandra uses the timestamps of columns that have been written during reconciliation to determine which should be persisted when they conflict. As far as scaling goes, ZooKeeper can be scaled by having several clusters and hashing lock paths to them. Alternatively, Lamport's bakery algorithm could be investigated as this shows you can have locking without a central coordinator service. Best, Dominic On 22 June 2011 15:18, Trevor Smith tre...@knewton.com wrote: Hello, I was wondering if anyone had architecture thoughts of creating a simple bank account program that does not use transactions. I think creating an example project like this would be a good thing to have for a lot of the discussions that pop up about transactions and Cassandra (and non-transactional datastores in general). Consider the simple system that has accounts, and users can transfer money between the accounts. There are these interesting papers as background (links below). Thank you. Trevor Smith http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
Re: No Transactions: An Example
AJ, Thanks for your input. I don't fully follow though how this would work with a bank scenario. Could you explain in more detail? Thanks. Trevor On Wed, Jun 22, 2011 at 6:34 PM, AJ a...@dude.podzone.net wrote: I think Sasha's idea is worth studying more. Here is a supporting read referenced in the O'Reilly Cassandra book that talks about alternatives to 2-phase commit and synchronous transactions: http://www.eaipatterns.com/**ramblings/18_starbucks.htmlhttp://www.eaipatterns.com/ramblings/18_starbucks.html If it can be done without locks and the business can handle a rare incomplete transaction, then this might be acceptable. On 6/22/2011 9:14 AM, Sasha Dolgy wrote: I would still maintain a record of the transaction ... so that I can do analysis post to determine if/when problems occurred ... On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smithtre...@knewton.com wrote: Sasha, How would you deal with a transfer between accounts in which only one half of the operation was successfully completed? Thank you. Trevor
Re: No Transactions: An Example
Hi Dominic, Thanks so much for providing this information. I was unaware of Cages and this looks like it could be used effectively for certain things. This is because Cassandra uses the timestamps of columns that have been written during reconciliation to determine which should be persisted when they conflict. Aren't vector clocks available in 0.8 now? Or more accurately, increment counters? https://issues.apache.org/jira/browse/CASSANDRA-1072 This would imply that an artificial delay is no longer necessary. Or am I missing something? Regards, Les
Re: No Transactions: An Example
On Thu, Jun 23, 2011 at 2:05 PM, Les Hazlewood l...@katasoft.com wrote: Hi Dominic, Thanks so much for providing this information. I was unaware of Cages and this looks like it could be used effectively for certain things. This is because Cassandra uses the timestamps of columns that have been written during reconciliation to determine which should be persisted when they conflict. Aren't vector clocks available in 0.8 now? Or more accurately, increment counters? https://issues.apache.org/jira/browse/CASSANDRA-1072 This would imply that an artificial delay is no longer necessary. Or am I missing something? The counters implementation that made it in doesn't use vector clocks. See https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf -ryan
Re: No Transactions: An Example
Thanks for the pointer Ryan! Regards, Les
Re: No Transactions: An Example
On 6/23/2011 7:37 AM, Trevor Smith wrote: AJ, Thanks for your input. I don't fully follow though how this would work with a bank scenario. Could you explain in more detail? Thanks. Trevor I don't know yet. I'll be researching that. My working procedure is to figure out a way to handle each class of problem that ACID addresses and see if there is an acceptable way to compensate or manage it on the client or business side; following the ideas in the article. I bet solutions exist somewhere. In short, the developer needs to be fully versed in the potential problems that could arise and have ways to deal with it. It's added responsibility for the developer, but if it keeps the infrastructure simple with reduced maintenance costs by not having to integrate another service such as ZK/Cages (as useful as they indeed are) then it may be worth it. I'll let you know what I conclude.
No Transactions: An Example
Hello, I was wondering if anyone had architecture thoughts of creating a simple bank account program that does not use transactions. I think creating an example project like this would be a good thing to have for a lot of the discussions that pop up about transactions and Cassandra (and non-transactional datastores in general). Consider the simple system that has accounts, and users can transfer money between the accounts. There are these interesting papers as background (links below). Thank you. Trevor Smith http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
Re: No Transactions: An Example
I'd implement the concept of a bank account using counters in a counter column family. one row per account ... each column for transaction data and one column for the actual balance. just so long as you use whole numbers ... no one needs pennies anymore. -sd On Wed, Jun 22, 2011 at 4:18 PM, Trevor Smith tre...@knewton.com wrote: Hello, I was wondering if anyone had architecture thoughts of creating a simple bank account program that does not use transactions. I think creating an example project like this would be a good thing to have for a lot of the discussions that pop up about transactions and Cassandra (and non-transactional datastores in general). Consider the simple system that has accounts, and users can transfer money between the accounts. There are these interesting papers as background (links below). Thank you. Trevor Smith http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
Re: No Transactions: An Example
Sasha, How would you deal with a transfer between accounts in which only one half of the operation was successfully completed? Thank you. Trevor On Wed, Jun 22, 2011 at 10:23 AM, Sasha Dolgy sdo...@gmail.com wrote: I'd implement the concept of a bank account using counters in a counter column family. one row per account ... each column for transaction data and one column for the actual balance. just so long as you use whole numbers ... no one needs pennies anymore. -sd On Wed, Jun 22, 2011 at 4:18 PM, Trevor Smith tre...@knewton.com wrote: Hello, I was wondering if anyone had architecture thoughts of creating a simple bank account program that does not use transactions. I think creating an example project like this would be a good thing to have for a lot of the discussions that pop up about transactions and Cassandra (and non-transactional datastores in general). Consider the simple system that has accounts, and users can transfer money between the accounts. There are these interesting papers as background (links below). Thank you. Trevor Smith http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
Re: No Transactions: An Example
I would still maintain a record of the transaction ... so that I can do analysis post to determine if/when problems occurred ... On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smith tre...@knewton.com wrote: Sasha, How would you deal with a transfer between accounts in which only one half of the operation was successfully completed? Thank you. Trevor
Re: No Transactions: An Example
Right -- that's the part that I am more interested in fleshing out in this post. Must one have background jobs checking the integrity of all transactions at some time interval? This gets hairy pretty quick with bank transactions (one unrolled transaction could cause many others to become unrolled...) And maybe that is all fine. I am interested in hearing the pros and cons of different approaches. Thanks. On Wed, Jun 22, 2011 at 11:14 AM, Sasha Dolgy sdo...@gmail.com wrote: I would still maintain a record of the transaction ... so that I can do analysis post to determine if/when problems occurred ... On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smith tre...@knewton.com wrote: Sasha, How would you deal with a transfer between accounts in which only one half of the operation was successfully completed? Thank you. Trevor
Re: No Transactions: An Example
Hi Trevor, I hope to post on my practical experiences in this area soon - we rely heavily on complex serialized operations in FightMyMonster.com. Probably the most simple serialized operation we do is updating nugget balances when, for example, there has been a trade of monsters. Currently we use ZooKeeper/Cages (github.com/s7) to serialize our distributed ops. We don't implement transactions with rollback/commit. Rather, we lock some paths, for example /Users/bank/dominic and /Users/bank/ben, and then write with QUORUM through our Java client library Pelops. This will make several efforts to retry the operation if it fails at first, and in our line of business the fact that redundancy in the cluster means it will nearly always complete eventually is enough. Of course, in a real world money scenario that is not enough and data inconsistency caused by, say, a sudden power outage during the retry phase is not acceptable. To handle this case I would like to extend Cages at some point so that commit/rollback transactions that would be stored inside ZooKeeper are associated with the distributed locks (which are stored persistently and survive power loss for example). There is an old blog post here which talks about it http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/although this needs updating. One interesting point not discussed which I have also not heard mentioned elsewhere is that in order for serialization to work every time, before you release a lock after performing an update you must wait for a brief period = max variance between the clocks on the application nodes updating the database e.g. 1-2ms. This is because Cassandra uses the timestamps of columns that have been written during reconciliation to determine which should be persisted when they conflict. As far as scaling goes, ZooKeeper can be scaled by having several clusters and hashing lock paths to them. Alternatively, Lamport's bakery algorithm could be investigated as this shows you can have locking without a central coordinator service. Best, Dominic On 22 June 2011 15:18, Trevor Smith tre...@knewton.com wrote: Hello, I was wondering if anyone had architecture thoughts of creating a simple bank account program that does not use transactions. I think creating an example project like this would be a good thing to have for a lot of the discussions that pop up about transactions and Cassandra (and non-transactional datastores in general). Consider the simple system that has accounts, and users can transfer money between the accounts. There are these interesting papers as background (links below). Thank you. Trevor Smith http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
Re: No Transactions: An Example
I think Sasha's idea is worth studying more. Here is a supporting read referenced in the O'Reilly Cassandra book that talks about alternatives to 2-phase commit and synchronous transactions: http://www.eaipatterns.com/ramblings/18_starbucks.html If it can be done without locks and the business can handle a rare incomplete transaction, then this might be acceptable. On 6/22/2011 9:14 AM, Sasha Dolgy wrote: I would still maintain a record of the transaction ... so that I can do analysis post to determine if/when problems occurred ... On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smithtre...@knewton.com wrote: Sasha, How would you deal with a transfer between accounts in which only one half of the operation was successfully completed? Thank you. Trevor