Re: No Transactions: An Example

2011-07-28 Thread Jeremy Sevellec
Hi All,

Making transaction is my actual preoccupation of the moment.

My need is :

- update data in column family #1
- insert data in column family #2

My need is to see thes opérations in a single transaction because the
data is tightly coupled.

I use zookeeper/cage to make distributed lock to avoid multiple client
inserting or updating on the same data.

But there is a problem there is a fail when inserting in column family 2
because i have to rollback updated data of the column family #1.


My reading on the subject is that to solve the fail :
- Can we really consider that write never fail with cassandra from the
time the execution of a mutation happened on a node. What can be the
cause of fail at this point?
So is it important to thinking about this potential problem? (yes in my
opinion but i'm not totally sure).
- Make a retry first. Is there really a chance for the second try to
succeed if the first fail?
-  keep the transaction data to have the possibility to rollback
programmatically by deleting the inserting data. The problem is on the
updated data to rollback because old values are lost. I read what Read
before write is a bad idea to save old values before the update. the
problem remains, so how to do?


Do you have any feedback on this topic?

Regards,

Jérémy



Le jeudi 23 juin 2011 à 16:23 -0700, Les Hazlewood a écrit :

 Thanks for the pointer Ryan!
 
 
 
 Regards,
 
 
 Les
 
 
 
 
 



Re: No Transactions: An Example

2011-06-29 Thread AJ


On 6/22/2011 9:18 AM, Trevor Smith wrote:
Right -- that's the part that I am more interested in fleshing out in 
this post.




Here is one way.  Use MVCC 
http://en.wikipedia.org/wiki/Multiversion_concurrency_control.  A 
single global clean-up process would be acceptable since it's not a 
single point of failure, only a single point of accumulating back-logged 
work and will not affect availability as long as you are notified if 
that process terminates and restart it in a reasonable amount of time 
but this will not affect the validity of subsequent reads.


So, you would have a balance column.  And each update will create a 
balance_timestamp with a positive or negative value indicating a 
credit or debit.  Subsequent clients will read the latest value by doing 
a slice from balance to balance_~ (i.e. all balance* columns).  
(You would have to work-out your column naming conventions so that your 
slices return only the pertinent columns.)  Then, the clients would have 
to apply all the credits and debits to the balance to get the current 
balance.


This handles the lost update problem.

For the dirty read and incorrect summary problems by others reading data 
that is in the middle of a transaction that hasn't committed yet, I 
would add a final transaction column to a Transactions CF.  The key 
would be cf.key.column, e.g., Accounts.1234.balance, 1234 being 
the account # and Accounts being the CF owning the balance column.  
Then, a new column would be added for each successful transaction (e.g., 
after debiting and crediting the two accounts) using the same timestamp 
used in balance_timestamp.  So, now, a client wanting the current 
balance would have to do a slice for all of the transactions for that 
column and only apply the balance updates up to the latest transaction.  
Note, you might have to do something else with the transaction naming 
schemes to make sure they are guaranteed to be unique, but you get the 
idea.  If the transaction fails, the client simply does not add a 
transaction column to Transactions and deletes any balance_timestamp 
columns it added to in the Accounts CF (or let's the clean-up process do 
it... carefully).


This should avoid the need for locks and as long as each account doesn't 
have a crazy amount of updates, the slices shouldn't be so large as to 
be a significant perf hit.


A note about the updates.  You have to make sure the clean-up process 
processes the updates in order and only 1 time.  If you can't guarantee 
these, then you'll have to make sure your updates are idempotent and 
commutative.


Oh yeah, and you must use QUORUM read/writes, of course.

Any critiques?

aj


Re: No Transactions: An Example

2011-06-23 Thread Trevor Smith
Domonic,

Thank you for your answer. I enjoy how in your day to day work you are
concerned with who has the monster. It must be a fun to read your
productions logs (User[Shelly] received vampire).

I looked into Cages and this does seem interesting. I need to do more
reading to have a better take. I am wondering though -- are there other
situations, not as business critical as the trading of monsters, that still
need transactions, but you have decided to not use them? If so, do you have
jobs running which check data integrity on some timed basis?

Trevor



On Wed, Jun 22, 2011 at 1:04 PM, Dominic Williams
dwilli...@system7.co.ukwrote:

 Hi Trevor,

 I hope to post on my practical experiences in this area soon - we rely
 heavily on complex serialized operations in FightMyMonster.com. Probably the
 most simple serialized operation we do is updating nugget balances when, for
 example, there has been a trade of monsters.

 Currently we use ZooKeeper/Cages (github.com/s7) to serialize our
 distributed ops.

 We don't implement transactions with rollback/commit. Rather, we lock some
 paths, for example /Users/bank/dominic and /Users/bank/ben, and then write
 with QUORUM through our Java client library Pelops. This will make several
 efforts to retry the operation if it fails at first, and in our line of
 business the fact that redundancy in the cluster means it will nearly always
 complete eventually is enough.

 Of course, in a real world money scenario that is not enough and data
 inconsistency caused by, say, a sudden power outage during the retry phase
 is not acceptable. To handle this case I would like to extend Cages at some
 point so that commit/rollback transactions that would be stored inside
 ZooKeeper are associated with the distributed locks (which are stored
 persistently and survive power loss for example). There is an old blog post
 here which talks about it
 http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/although
  this needs updating.

 One interesting point not discussed which I have also not heard mentioned
 elsewhere is that in order for serialization to work every time, before you
 release a lock after performing an update you must wait for a brief period
 = max variance between the clocks on the application nodes updating the
 database e.g. 1-2ms.

 This is because Cassandra uses the timestamps of columns that have been
 written during reconciliation to determine which should be persisted when
 they conflict.

 As far as scaling goes, ZooKeeper can be scaled by having several clusters
 and hashing lock paths to them. Alternatively, Lamport's bakery algorithm
 could be investigated as this shows you can have locking without a central
 coordinator service.

 Best, Dominic


 On 22 June 2011 15:18, Trevor Smith tre...@knewton.com wrote:

 Hello,

 I was wondering if anyone had architecture thoughts of creating a simple
 bank account program that does not use transactions. I think creating an
 example project like this would be a good thing to have for a lot of the
 discussions that pop up about transactions and Cassandra (and
 non-transactional datastores in general).

 Consider the simple system that has accounts, and users can transfer money
 between the accounts.

 There are these interesting papers as background (links below).

  Thank you.

 Trevor Smith

 http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf


 http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf

 http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf





Re: No Transactions: An Example

2011-06-23 Thread Trevor Smith
AJ,

Thanks for your input. I don't fully follow though how this would work with
a bank scenario. Could you explain in more detail?

Thanks.

Trevor

On Wed, Jun 22, 2011 at 6:34 PM, AJ a...@dude.podzone.net wrote:

 I think Sasha's idea is worth studying more.  Here is a supporting read
 referenced in the O'Reilly Cassandra book that talks about alternatives to
 2-phase commit and synchronous transactions:

 http://www.eaipatterns.com/**ramblings/18_starbucks.htmlhttp://www.eaipatterns.com/ramblings/18_starbucks.html

 If it can be done without locks and the business can handle a rare
 incomplete transaction, then this might be acceptable.



 On 6/22/2011 9:14 AM, Sasha Dolgy wrote:

 I would still maintain a record of the transaction ... so that I can
 do analysis post to determine if/when problems occurred ...

 On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smithtre...@knewton.com  wrote:

 Sasha,
 How would you deal with a transfer between accounts in which only one
 half
 of the operation was successfully completed?
 Thank you.
 Trevor





Re: No Transactions: An Example

2011-06-23 Thread Les Hazlewood
Hi Dominic,

Thanks so much for providing this information.  I was unaware of Cages and
this looks like it could be used effectively for certain things.

This is because Cassandra uses the timestamps of columns that have been
 written during reconciliation to determine which should be persisted when
 they conflict.


Aren't vector clocks available in 0.8 now? Or more accurately, increment
counters?

https://issues.apache.org/jira/browse/CASSANDRA-1072

This would imply that an artificial delay is no longer necessary.  Or am I
missing something?

Regards,

Les


Re: No Transactions: An Example

2011-06-23 Thread Ryan King
On Thu, Jun 23, 2011 at 2:05 PM, Les Hazlewood l...@katasoft.com wrote:
 Hi Dominic,
 Thanks so much for providing this information.  I was unaware of Cages and
 this looks like it could be used effectively for certain things.

 This is because Cassandra uses the timestamps of columns that have been
 written during reconciliation to determine which should be persisted when
 they conflict.

 Aren't vector clocks available in 0.8 now? Or more accurately, increment
 counters?
 https://issues.apache.org/jira/browse/CASSANDRA-1072
 This would imply that an artificial delay is no longer necessary.  Or am I
 missing something?

The counters implementation that made it in doesn't use vector clocks.
See 
https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf

-ryan


Re: No Transactions: An Example

2011-06-23 Thread Les Hazlewood
Thanks for the pointer Ryan!

Regards,

Les


Re: No Transactions: An Example

2011-06-23 Thread AJ

On 6/23/2011 7:37 AM, Trevor Smith wrote:

AJ,

Thanks for your input. I don't fully follow though how this would work 
with a bank scenario. Could you explain in more detail?


Thanks.

Trevor


I don't know yet.  I'll be researching that.  My working procedure is to 
figure out a way to handle each class of problem that ACID addresses and 
see if there is an acceptable way to compensate or manage it on the 
client or business side; following the ideas in the article.  I bet 
solutions exist somewhere.  In short, the developer needs to be fully 
versed in the potential problems that could arise and have ways to deal 
with it.  It's added responsibility for the developer, but if it keeps 
the infrastructure simple with reduced maintenance costs by not having 
to integrate another service such as ZK/Cages (as useful as they indeed 
are) then it may be worth it.  I'll let you know what I conclude.




No Transactions: An Example

2011-06-22 Thread Trevor Smith
Hello,

I was wondering if anyone had architecture thoughts of creating a simple
bank account program that does not use transactions. I think creating an
example project like this would be a good thing to have for a lot of the
discussions that pop up about transactions and Cassandra (and
non-transactional datastores in general).

Consider the simple system that has accounts, and users can transfer money
between the accounts.

There are these interesting papers as background (links below).

Thank you.

Trevor Smith

http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf

http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf


Re: No Transactions: An Example

2011-06-22 Thread Sasha Dolgy
I'd implement the concept of a bank account using counters in a
counter column family.  one row per account ... each column for
transaction data and one column for the actual balance.
just so long as you use whole numbers ... no one needs pennies anymore.
-sd

On Wed, Jun 22, 2011 at 4:18 PM, Trevor Smith tre...@knewton.com wrote:
 Hello,
 I was wondering if anyone had architecture thoughts of creating a simple
 bank account program that does not use transactions. I think creating an
 example project like this would be a good thing to have for a lot of the
 discussions that pop up about transactions and Cassandra (and
 non-transactional datastores in general).
 Consider the simple system that has accounts, and users can transfer money
 between the accounts.
 There are these interesting papers as background (links below).
 Thank you.
 Trevor Smith
 http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
 http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf
 http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf


Re: No Transactions: An Example

2011-06-22 Thread Trevor Smith
Sasha,

How would you deal with a transfer between accounts in which only one half
of the operation was successfully completed?

Thank you.

Trevor

On Wed, Jun 22, 2011 at 10:23 AM, Sasha Dolgy sdo...@gmail.com wrote:

 I'd implement the concept of a bank account using counters in a
 counter column family.  one row per account ... each column for
 transaction data and one column for the actual balance.
 just so long as you use whole numbers ... no one needs pennies anymore.
 -sd

 On Wed, Jun 22, 2011 at 4:18 PM, Trevor Smith tre...@knewton.com wrote:
  Hello,
  I was wondering if anyone had architecture thoughts of creating a simple
  bank account program that does not use transactions. I think creating an
  example project like this would be a good thing to have for a lot of the
  discussions that pop up about transactions and Cassandra (and
  non-transactional datastores in general).
  Consider the simple system that has accounts, and users can transfer
 money
  between the accounts.
  There are these interesting papers as background (links below).
  Thank you.
  Trevor Smith
  http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
 
 http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf
  http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf



Re: No Transactions: An Example

2011-06-22 Thread Sasha Dolgy
I would still maintain a record of the transaction ... so that I can
do analysis post to determine if/when problems occurred ...

On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smith tre...@knewton.com wrote:
 Sasha,
 How would you deal with a transfer between accounts in which only one half
 of the operation was successfully completed?
 Thank you.
 Trevor


Re: No Transactions: An Example

2011-06-22 Thread Trevor Smith
Right -- that's the part that I am more interested in fleshing out in this
post.

Must one have background jobs checking the integrity of all transactions at
some time interval? This gets hairy pretty quick with bank transactions (one
unrolled transaction could cause many others to become unrolled...) And
maybe that is all fine. I am interested in hearing the pros and cons of
different approaches.

Thanks.

On Wed, Jun 22, 2011 at 11:14 AM, Sasha Dolgy sdo...@gmail.com wrote:

 I would still maintain a record of the transaction ... so that I can
 do analysis post to determine if/when problems occurred ...

 On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smith tre...@knewton.com wrote:
  Sasha,
  How would you deal with a transfer between accounts in which only one
 half
  of the operation was successfully completed?
  Thank you.
  Trevor



Re: No Transactions: An Example

2011-06-22 Thread Dominic Williams
Hi Trevor,

I hope to post on my practical experiences in this area soon - we rely
heavily on complex serialized operations in FightMyMonster.com. Probably the
most simple serialized operation we do is updating nugget balances when, for
example, there has been a trade of monsters.

Currently we use ZooKeeper/Cages (github.com/s7) to serialize our
distributed ops.

We don't implement transactions with rollback/commit. Rather, we lock some
paths, for example /Users/bank/dominic and /Users/bank/ben, and then write
with QUORUM through our Java client library Pelops. This will make several
efforts to retry the operation if it fails at first, and in our line of
business the fact that redundancy in the cluster means it will nearly always
complete eventually is enough.

Of course, in a real world money scenario that is not enough and data
inconsistency caused by, say, a sudden power outage during the retry phase
is not acceptable. To handle this case I would like to extend Cages at some
point so that commit/rollback transactions that would be stored inside
ZooKeeper are associated with the distributed locks (which are stored
persistently and survive power loss for example). There is an old blog post
here which talks about it
http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/although
this needs updating.

One interesting point not discussed which I have also not heard mentioned
elsewhere is that in order for serialization to work every time, before you
release a lock after performing an update you must wait for a brief period
= max variance between the clocks on the application nodes updating the
database e.g. 1-2ms.

This is because Cassandra uses the timestamps of columns that have been
written during reconciliation to determine which should be persisted when
they conflict.

As far as scaling goes, ZooKeeper can be scaled by having several clusters
and hashing lock paths to them. Alternatively, Lamport's bakery algorithm
could be investigated as this shows you can have locking without a central
coordinator service.

Best, Dominic

On 22 June 2011 15:18, Trevor Smith tre...@knewton.com wrote:

 Hello,

 I was wondering if anyone had architecture thoughts of creating a simple
 bank account program that does not use transactions. I think creating an
 example project like this would be a good thing to have for a lot of the
 discussions that pop up about transactions and Cassandra (and
 non-transactional datastores in general).

 Consider the simple system that has accounts, and users can transfer money
 between the accounts.

 There are these interesting papers as background (links below).

  Thank you.

 Trevor Smith

 http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf


 http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf

 http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf



Re: No Transactions: An Example

2011-06-22 Thread AJ
I think Sasha's idea is worth studying more.  Here is a supporting read 
referenced in the O'Reilly Cassandra book that talks about alternatives 
to 2-phase commit and synchronous transactions:


http://www.eaipatterns.com/ramblings/18_starbucks.html

If it can be done without locks and the business can handle a rare 
incomplete transaction, then this might be acceptable.



On 6/22/2011 9:14 AM, Sasha Dolgy wrote:

I would still maintain a record of the transaction ... so that I can
do analysis post to determine if/when problems occurred ...

On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smithtre...@knewton.com  wrote:

Sasha,
How would you deal with a transfer between accounts in which only one half
of the operation was successfully completed?
Thank you.
Trevor