[jira] [Commented] (CASSANDRA-10011) I want to develop transactions for Cassandra and I want your feedback

2015-08-07 Thread Marek Lewandowski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661432#comment-14661432
 ] 

Marek Lewandowski commented on CASSANDRA-10011:
---

thanks, done that
http://www.mail-archive.com/dev@cassandra.apache.org/msg08175.html

 I want to develop transactions for Cassandra and I want your feedback
 -

 Key: CASSANDRA-10011
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10011
 Project: Cassandra
  Issue Type: New Feature
Reporter: Marek Lewandowski
  Labels: features

 Hello everyone,
 *TL;DR;* I want to develop transactions (similar to those relational ones) 
 for Cassandra, I have some ideas and I'd like to hear your feedback.
 *Long story short:* I want to develop prototype of solution that features 
 transactions spanning multiple Cassandra partitions resembling those in 
 relational databases. I understand that in Cassandra's world such 
 transactions will be quite a bit different than in relational db world. That 
 prototype or research if you will, is subject of my master's thesis. 
 It's been some time since I've been dancing around that subject and 
 postponing it, but what I understood in the process is that I cannot do 
 anything useful being hidden from community's feedback. I want you to provide 
 feedback on my ideas. You also have a power of changing it completely because 
 first and foremost _I want to develop something actually useful, not only get 
 my master's degree._
 To not scare anyone with huge wall of text, for now I'll just post very brief 
 description of my ideas and longer block of text describing how I can see 
 committing and rolling back data. I have more detailed descriptions prepared 
 already, but I don't want to share 4 pages of text in one go.
 Scope of the project (assuming I'll do it alone) is an experiment, not 
 production ready solution.
 Such experiment can use any tools possible to actually perform it.
 Baseline for any idea is:
 - asynchronous execution - think Akka and actors with non blocking execution 
 and message passing 
 - doesn't have to be completely transparent for end user - solution may 
 enforce certain workflow of interacting with it and/or introduce new concepts 
 (like akka messages instead of CQL and binary protocol).
 So far after reading a lot and thinking even more I have two ideas that I'd 
 like to share.
 h3. Ideas are (with brief description, details will come later):
 h4. Idea 1: Event streaming 
 - Imagine that every modiifcation is represented by an _Event_. 
 - Imagine you can group these events into _Event Groups_. 
 - Imagine that such groups are time series data
 - Imagine you can read such groups as a stream of events (think reactive 
 stream)
 Idea is that: you don't need to lock data when you are sure there is no one 
 else to compete with.
 There is 1 guy called _Cursor_ that reads Event Stream and executes Event 
 Groups one by one advacing its position on the stream when Event Group has 
 been executed.
 Seems like a system where you have only 1 transaction at any given time, but 
 there are many areas to optimize that and to allow more than that. However 
 I'll stop here.
 h4. Idea 2: Locking data 
 - uses additional tables to acquire read/write locks
 - seperate tables to append modifications - as in Rollback: Appending to 
 seperate table.  
 - supports different isolation levels. 
 - more traditional approach, kind of translation of traditional locking to 
 cassandra reality.
 ---
 Common part of two is approach to doing commit and rollback.
 h3. Doing Rollback and commit
 I have two ideas for rollback. I like 2nd one more, because it is simpler and 
 potentially faster.
 h4. Rollback: Query rewriting
 It modifies original data, but before that it copies original data so that 
 state can be restored. Then when failure is detected, modification query can 
 be rewritten so that original data can be restored.
 Query rewriting seems like a complex functionality. I tried few simple and a 
 little bit more complex statements and in general for basic stuff algorithm 
 is not that complicated, but to support everything CQL has to offer it might 
 be hard. 
 Still such transactional system might have some restrictions over CQL 
 statements used, because first of all when someone wants to have these 
 transactions they already want something non standard.
 I will skip details of that approach for now.
 h4. Rollback: Appending to seperate table.
 Image we have table A that we want to have transactions on.
 This requires another table A_tx which has same schema as A, but has *1 more 
 clustering column* and few new columns. A_tx will be additionally clustered 
 by 

[jira] [Created] (CASSANDRA-10011) I want to develop transactions for Cassandra and I want your feedback

2015-08-06 Thread Marek Lewandowski (JIRA)
Marek Lewandowski created CASSANDRA-10011:
-

 Summary: I want to develop transactions for Cassandra and I want 
your feedback
 Key: CASSANDRA-10011
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10011
 Project: Cassandra
  Issue Type: New Feature
Reporter: Marek Lewandowski


Hello everyone,

*TL;DR;* I want to develop transactions (similar to those relational ones) for 
Cassandra, I have some ideas and I'd like to hear your feedback.

*Long story short:* I want to develop prototype of solution that features 
transactions spanning multiple Cassandra partitions resembling those in 
relational databases. I understand that in Cassandra's world such transactions 
will be quite a bit different than in relational db world. That prototype or 
research if you will, is subject of my master's thesis. 

It's been some time since I've been dancing around that subject and postponing 
it, but what I understood in the process is that I cannot do anything useful 
being hidden from community's feedback. I want you to provide feedback on my 
ideas. You also have a power of changing it completely because first and 
foremost _I want to develop something actually useful, not only get my master's 
degree._

To not scare anyone with huge wall of text, for now I'll just post very brief 
description of my ideas and longer block of text describing how I can see 
committing and rolling back data. I have more detailed descriptions prepared 
already, but I don't want to share 4 pages of text in one go.

Scope of the project (assuming I'll do it alone) is an experiment, not 
production ready solution.
Such experiment can use any tools possible to actually perform it.

Baseline for any idea is:
- asynchronous execution - think Akka and actors with non blocking execution 
and message passing 
- doesn't have to be completely transparent for end user - solution may enforce 
certain workflow of interacting with it and/or introduce new concepts (like 
akka messages instead of CQL and binary protocol).

So far after reading a lot and thinking even more I have two ideas that I'd 
like to share.

h3. Ideas are (with brief description, details will come later):

h4. Idea 1: Event streaming 
- Imagine that every modiifcation is represented by an _Event_. 
- Imagine you can group these events into _Event Groups_. 
- Imagine that such groups are time series data
- Imagine you can read such groups as a stream of events (think reactive stream)

Idea is that: you don't need to lock data when you are sure there is no one 
else to compete with.

There is 1 guy called _Cursor_ that reads Event Stream and executes Event 
Groups one by one advacing its position on the stream when Event Group has been 
executed.

Seems like a system where you have only 1 transaction at any given time, but 
there are many areas to optimize that and to allow more than that. However I'll 
stop here.

h4. Idea 2: Locking data 
- uses additional tables to acquire read/write locks
- seperate tables to append modifications - as in Rollback: Appending to 
seperate table.  
- supports different isolation levels. 
- more traditional approach, kind of translation of traditional locking to 
cassandra reality.


---
Common part of two is approach to doing commit and rollback.
h3. Doing Rollback and commit
I have two ideas for rollback. I like 2nd one more, because it is simpler and 
potentially faster.

h4. Rollback: Query rewriting
It modifies original data, but before that it copies original data so that 
state can be restored. Then when failure is detected, modification query can be 
rewritten so that original data can be restored.

Query rewriting seems like a complex functionality. I tried few simple and a 
little bit more complex statements and in general for basic stuff algorithm is 
not that complicated, but to support everything CQL has to offer it might be 
hard. 

Still such transactional system might have some restrictions over CQL 
statements used, because first of all when someone wants to have these 
transactions they already want something non standard.

I will skip details of that approach for now.

h4. Rollback: Appending to seperate table.
Image we have table A that we want to have transactions on.
This requires another table A_tx which has same schema as A, but has *1 more 
clustering column* and few new columns. A_tx will be additionally clustered by 
transaction id. 
New columns are:
- {{is_committed boolean}}
- {{is_rolledback boolean}}
- {{is_applied boolean}}

General idea is: 
1. During transaction append changes to XXX_tx tables.
2. For rollback: nothing needs to be done (besides cleaning XXX_tx 
tables of useless data scheduled for rollback)
3. For commit: rows in each XXX_tx are marked as committed. This can be 
done