Re: expiring + counter column?

2011-05-29 Thread Yang
sorry to beat on the dead horse.

I looked at the link referred from #2103 :
https://issues.apache.org/jira/browse/CASSANDRA-2101
I agree with the reasoning in #2101 that the ultimate issue is that delete
and counter adds are not commutative. since by definition we can't achieve
predictable behavior with deletes + counter, can we redefine the behavior of
counter deletes, so that we can always guarantee the declared behavior? ---
specifically:


*we define that once a counter column is deleted, you can never add to it
again.*  attempts to add to a dead counter throws an exception   all
future adds are just ignored.  i.e. a counter column has only one life,
until all tombstones are purged from system, after which it is possible for
the counter  to have a new incarnation.  basically instead of solving the
problem raised in #2103, we declare openly that it's unsolvable (which is
true), and make the code reflect this fact.



I think this behavior would satisfy most use cases of counters. so instead
of relying on the advice to developers: do not do updates for a period
after deletes, otherwise it probably wont' work, we enforce this into the
code.


the same logic can be carried over into expiring column, since they are
essentially automatically inserted deletes. that way #2103 could be solved


I'm attaching an example below, you can refer to them if needed.

Thanks  a lot
Yang


example:
for simplicity we assume there is only one column family , one column, so we
omit column name and cf name in our notation, assume all counterColumns have
a delta value of 1, we only mark their ttl now. so c(123) means a counter
column of ttl=1, adding a delta of 1. d(456) means a tombstone with
ttl=456.

then we can have the following operations

operationresult after operation
--
c(1)count=1
d(2)count = null ( counter not present )

c(3)count = null ( add on dead counter ignored)
---


if the 2 adds arrive out of order ,  we would still guarantee eventual
consistency:

operationresult after operation

c(1)count=1
c(3)count=2   (we have 2 adds, each with
delta=1)
d(2)count=null (deleted)
--
at the end of both scenarios, the result is guaranteed to be null;
note that in the second scenario, line 2 shows a snapshot where we have a
state with count=2, which scenario 1 never sees this. this is fine, since
even regular columns can have this situation (just consider if the counter
columns were inserts/overwrites instead )



On Fri, May 27, 2011 at 5:57 PM, Jonathan Ellis jbel...@gmail.com wrote:
 No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103

 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote:
 is this combination feature available , or on track ?

 thanks
 Yang




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: expiring + counter column?

2011-05-29 Thread Yang
errata:
so c(123) means a counter column of ttl=1,  so c(123) means a
counter column of ttl=123,


On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote:

 sorry to beat on the dead horse.

 I looked at the link referred from #2103 :
 https://issues.apache.org/jira/browse/CASSANDRA-2101
 I agree with the reasoning in #2101 that the ultimate issue is that delete
 and counter adds are not commutative. since by definition we can't achieve
 predictable behavior with deletes + counter, can we redefine the behavior
 of counter deletes, so that we can always guarantee the declared behavior?
 --- specifically:


 *we define that once a counter column is deleted, you can never add to it
 again.*  attempts to add to a dead counter throws an exception   all
 future adds are just ignored.  i.e. a counter column has only one life,
 until all tombstones are purged from system, after which it is possible for
 the counter  to have a new incarnation.  basically instead of solving the
 problem raised in #2103, we declare openly that it's unsolvable (which is
 true), and make the code reflect this fact.



 I think this behavior would satisfy most use cases of counters. so instead
 of relying on the advice to developers: do not do updates for a period
 after deletes, otherwise it probably wont' work, we enforce this into the
 code.


 the same logic can be carried over into expiring column, since they are
 essentially automatically inserted deletes. that way #2103 could be solved


 I'm attaching an example below, you can refer to them if needed.

 Thanks  a lot
 Yang


 example:
 for simplicity we assume there is only one column family , one column, so
 we omit column name and cf name in our notation, assume all counterColumns
 have a delta value of 1, we only mark their ttl now. so c(123) means a
 counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with
 ttl=456.

 then we can have the following operations

 operationresult after operation
 --
 c(1)count=1
 d(2)count = null ( counter not present )

 c(3)count = null ( add on dead counter ignored)
 ---


 if the 2 adds arrive out of order ,  we would still guarantee eventual
 consistency:

 operationresult after operation

 
 c(1)count=1
 c(3)count=2   (we have 2 adds, each with
 delta=1)
 d(2)count=null (deleted)
 --
 at the end of both scenarios, the result is guaranteed to be null;
 note that in the second scenario, line 2 shows a snapshot where we have a
 state with count=2, which scenario 1 never sees this. this is fine, since
 even regular columns can have this situation (just consider if the counter
 columns were inserts/overwrites instead )



 On Fri, May 27, 2011 at 5:57 PM, Jonathan Ellis jbel...@gmail.com wrote:
  No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103
 
  On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote:
  is this combination feature available , or on track ?
 
  thanks
  Yang
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 




Re: expiring + counter column?

2011-05-29 Thread Yang
sorry in the notation, instead of ttl I mean timestamp


On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote:

 sorry to beat on the dead horse.

 I looked at the link referred from #2103 :
 https://issues.apache.org/jira/browse/CASSANDRA-2101
 I agree with the reasoning in #2101 that the ultimate issue is that delete
 and counter adds are not commutative. since by definition we can't achieve
 predictable behavior with deletes + counter, can we redefine the behavior
 of counter deletes, so that we can always guarantee the declared behavior?
 --- specifically:


 *we define that once a counter column is deleted, you can never add to it
 again.*  attempts to add to a dead counter throws an exception   all
 future adds are just ignored.  i.e. a counter column has only one life,
 until all tombstones are purged from system, after which it is possible for
 the counter  to have a new incarnation.  basically instead of solving the
 problem raised in #2103, we declare openly that it's unsolvable (which is
 true), and make the code reflect this fact.



 I think this behavior would satisfy most use cases of counters. so instead
 of relying on the advice to developers: do not do updates for a period
 after deletes, otherwise it probably wont' work, we enforce this into the
 code.


 the same logic can be carried over into expiring column, since they are
 essentially automatically inserted deletes. that way #2103 could be solved


 I'm attaching an example below, you can refer to them if needed.

 Thanks  a lot
 Yang


 example:
 for simplicity we assume there is only one column family , one column, so
 we omit column name and cf name in our notation, assume all counterColumns
 have a delta value of 1, we only mark their ttl now. so c(123) means a
 counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with
 ttl=456.

 then we can have the following operations

 operationresult after operation
 --
 c(1)count=1
 d(2)count = null ( counter not present )

 c(3)count = null ( add on dead counter ignored)
 ---


 if the 2 adds arrive out of order ,  we would still guarantee eventual
 consistency:

 operationresult after operation

 
 c(1)count=1
 c(3)count=2   (we have 2 adds, each with
 delta=1)
 d(2)count=null (deleted)
 --
 at the end of both scenarios, the result is guaranteed to be null;
 note that in the second scenario, line 2 shows a snapshot where we have a
 state with count=2, which scenario 1 never sees this. this is fine, since
 even regular columns can have this situation (just consider if the counter
 columns were inserts/overwrites instead )



 On Fri, May 27, 2011 at 5:57 PM, Jonathan Ellis jbel...@gmail.com wrote:
  No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103
 
  On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote:
  is this combination feature available , or on track ?
 
  thanks
  Yang
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 




Re: expiring + counter column?

2011-05-29 Thread aaron morton
Without commenting on the other parts of the design, this part is not possible 
attempts to add to a dead counter throws an exception 

All write operations are no look operations (write to the log, update 
memtables) we never look at the SSTables. It goes against the architecture of 
the write path to require a read from disk. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 May 2011, at 20:04, Yang wrote:

 
 sorry in the notation, instead of ttl I mean timestamp
 
 
 On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote:
 sorry to beat on the dead horse.
 
 I looked at the link referred from #2103 : 
 https://issues.apache.org/jira/browse/CASSANDRA-2101
 I agree with the reasoning in #2101 that the ultimate issue is that delete 
 and counter adds are not commutative. since by definition we can't achieve 
 predictable behavior with deletes + counter, can we redefine the behavior of 
 counter deletes, so that we can always guarantee the declared behavior? --- 
 specifically:
 
 
 we define that once a counter column is deleted, you can never add to it 
 again.  attempts to add to a dead counter throws an exception   all 
 future adds are just ignored.  i.e. a counter column has only one life, until 
 all tombstones are purged from system, after which it is possible for the 
 counter  to have a new incarnation.  basically instead of solving the problem 
 raised in #2103, we declare openly that it's unsolvable (which is true), and 
 make the code reflect this fact.
 
 
 
 I think this behavior would satisfy most use cases of counters. so instead of 
 relying on the advice to developers: do not do updates for a period after 
 deletes, otherwise it probably wont' work, we enforce this into the code. 
 
 
 the same logic can be carried over into expiring column, since they are 
 essentially automatically inserted deletes. that way #2103 could be solved
 
 
 I'm attaching an example below, you can refer to them if needed.
 
 Thanks  a lot
 Yang
 
 
 example:
 for simplicity we assume there is only one column family , one column, so we 
 omit column name and cf name in our notation, assume all counterColumns have 
 a delta value of 1, we only mark their ttl now. so c(123) means a counter 
 column of ttl=1, adding a delta of 1. d(456) means a tombstone with ttl=456. 
 
 then we can have the following operations
 
 operationresult after operation
 --
 c(1)count=1
 d(2)count = null ( counter not present )  
  
 c(3)count = null ( add on dead counter ignored)
 ---
 
 
 if the 2 adds arrive out of order ,  we would still guarantee eventual 
 consistency:
 
 operationresult after operation
 
 c(1)count=1
 c(3)count=2   (we have 2 adds, each with delta=1)
 d(2)count=null (deleted)
 --
 at the end of both scenarios, the result is guaranteed to be null;
 note that in the second scenario, line 2 shows a snapshot where we have a 
 state with count=2, which scenario 1 never sees this. this is fine, since 
 even regular columns can have this situation (just consider if the counter 
 columns were inserts/overwrites instead )
 
 
 
 On Fri, May 27, 2011 at 5:57 PM, Jonathan Ellis jbel...@gmail.com wrote:
  No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103
 
  On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote:
  is this combination feature available , or on track ?
 
  thanks
  Yang
 
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 
 



Re: expiring + counter column?

2011-05-29 Thread Yang
yeah, then maybe we can make that a silent omission. less desirable, but
still better than unpredicted behavior. (this is not that bad: currently you
can't know whether a write result really reached a quorum, i.e. become
effective, anyway)


regarding we never look at SStables, I think right now counter adds do
require a read on SStables, although asynchronously:
StorageProxy:
private static void applyCounterMutation(final IMutation mutation, final
MultimapInetAddress, InetAddress hintedEndpoints, final
IWriteResponseHandler responseHandler, final String localDataCenter, final
ConsistencyLevel consistency_level, boolean executeOnMutationStage) {
..

sendToHintedEndpoints(cm.makeReplicationMutation(), hintedEndpoints,
responseHandler, localDataCenter, false, consistency_level);

}


CounterMutation.java:
public RowMutation makeReplicationMutation() throws IOException {


Table table = Table.open(readCommand.table);
Row row = readCommand.getRow(table);

}


I think the getRow() line above does what the .pdf design doc in the JIRA
described: replication to other replicas (non-leaders) replicate only the
**sum** that I own, not individual delta that I just received. actually I'm
not quite understanding why this approach was chosen, since it makes each
write into read---write (when getReplicateOnWrite() ) , which can be slow.
I'm still trying to understand that


Thanks
Yang

On Sun, May 29, 2011 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote:

 Without commenting on the other parts of the design, this part is not
 possible attempts to add to a dead counter throws an exception 

 All write operations are no look operations (write to the log, update
 memtables) we never look at the SSTables. It goes against the architecture
 of the write path to require a read from disk.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 29 May 2011, at 20:04, Yang wrote:


 sorry in the notation, instead of ttl I mean timestamp


 On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote:

 sorry to beat on the dead horse.

 I looked at the link referred from #2103 :
 https://issues.apache.org/jira/browse/CASSANDRA-2101
 I agree with the reasoning in #2101 that the ultimate issue is that delete
 and counter adds are not commutative. since by definition we can't achieve
 predictable behavior with deletes + counter, can we redefine the behavior
 of counter deletes, so that we can always guarantee the declared behavior?
 --- specifically:


 *we define that once a counter column is deleted, you can never add to it
 again.*  attempts to add to a dead counter throws an exception   all
 future adds are just ignored.  i.e. a counter column has only one life,
 until all tombstones are purged from system, after which it is possible for
 the counter  to have a new incarnation.  basically instead of solving the
 problem raised in #2103, we declare openly that it's unsolvable (which is
 true), and make the code reflect this fact.



 I think this behavior would satisfy most use cases of counters. so instead
 of relying on the advice to developers: do not do updates for a period
 after deletes, otherwise it probably wont' work, we enforce this into the
 code.


 the same logic can be carried over into expiring column, since they are
 essentially automatically inserted deletes. that way #2103 could be solved


 I'm attaching an example below, you can refer to them if needed.

 Thanks  a lot
 Yang


 example:
 for simplicity we assume there is only one column family , one column, so
 we omit column name and cf name in our notation, assume all counterColumns
 have a delta value of 1, we only mark their ttl now. so c(123) means a
 counter column of ttl=1, adding a delta of 1. d(456) means a tombstone with
 ttl=456.

 then we can have the following operations

 operationresult after operation
 --
 c(1)count=1
 d(2)count = null ( counter not present )

 c(3)count = null ( add on dead counter
 ignored)
 ---


 if the 2 adds arrive out of order ,  we would still guarantee eventual
 consistency:

 operationresult after operation

 
 c(1)count=1
 c(3)count=2   (we have 2 adds, each with
 delta=1)
 d(2)count=null (deleted)
 --
 at the end of both scenarios, the result is guaranteed to be null;
 note that in the second scenario, line 2 shows a snapshot where we have a
 state with count=2, which scenario 1 never sees this. this is 

Re: expiring + counter column?

2011-05-29 Thread aaron morton
The comment around line 448 in StorageProxy

// We do the replication on another stage because it 
involves a read (see CM.makeReplicationMutation)
// and we want to avoid blocking too much the MUTATION stage

The read happens on another stage, it is not part of the mutation. 

And the test before that checks shouldReplicateOnWrite for the CF's involved in 
the mutation, which defaults to false.

See also the comments for StorageProxy.mutateCounter() and this comment which I 
*think* is still valid 
https://issues.apache.org/jira/browse/CASSANDRA-1909?focusedCommentId=12976727page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12976727


Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 30 May 2011, at 06:41, Yang wrote:

 yeah, then maybe we can make that a silent omission. less desirable, but 
 still better than unpredicted behavior. (this is not that bad: currently you 
 can't know whether a write result really reached a quorum, i.e. become 
 effective, anyway)
 
 
 regarding we never look at SStables, I think right now counter adds do 
 require a read on SStables, although asynchronously: 
 StorageProxy:
 private static void applyCounterMutation(final IMutation mutation, final 
 MultimapInetAddress, InetAddress hintedEndpoints, final 
 IWriteResponseHandler responseHandler, final String localDataCenter, final 
 ConsistencyLevel consistency_level, boolean executeOnMutationStage) {
 ..
 
 sendToHintedEndpoints(cm.makeReplicationMutation(), hintedEndpoints, 
 responseHandler, localDataCenter, false, consistency_level);
 
 }
 
 
 CounterMutation.java:
 public RowMutation makeReplicationMutation() throws IOException {
  
 
 Table table = Table.open(readCommand.table);
 Row row = readCommand.getRow(table);
 
 }
 
 
 I think the getRow() line above does what the .pdf design doc in the JIRA 
 described: replication to other replicas (non-leaders) replicate only the 
 **sum** that I own, not individual delta that I just received. actually I'm 
 not quite understanding why this approach was chosen, since it makes each 
 write into read---write (when getReplicateOnWrite() ) , which can be slow. 
 I'm still trying to understand that
 
 
 Thanks
 Yang
 
 On Sun, May 29, 2011 at 3:45 AM, aaron morton aa...@thelastpickle.com wrote:
 Without commenting on the other parts of the design, this part is not 
 possible attempts to add to a dead counter throws an exception 
 
 All write operations are no look operations (write to the log, update 
 memtables) we never look at the SSTables. It goes against the architecture of 
 the write path to require a read from disk. 
 
 Cheers
  
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 29 May 2011, at 20:04, Yang wrote:
 
 
 sorry in the notation, instead of ttl I mean timestamp
 
 
 On Sun, May 29, 2011 at 12:24 AM, Yang tedd...@gmail.com wrote:
 sorry to beat on the dead horse.
 
 I looked at the link referred from #2103 : 
 https://issues.apache.org/jira/browse/CASSANDRA-2101
 I agree with the reasoning in #2101 that the ultimate issue is that delete 
 and counter adds are not commutative. since by definition we can't achieve 
 predictable behavior with deletes + counter, can we redefine the behavior of 
 counter deletes, so that we can always guarantee the declared behavior? --- 
 specifically:
 
 
 we define that once a counter column is deleted, you can never add to it 
 again.  attempts to add to a dead counter throws an exception   all 
 future adds are just ignored.  i.e. a counter column has only one life, 
 until all tombstones are purged from system, after which it is possible for 
 the counter  to have a new incarnation.  basically instead of solving the 
 problem raised in #2103, we declare openly that it's unsolvable (which is 
 true), and make the code reflect this fact.
 
 
 
 I think this behavior would satisfy most use cases of counters. so instead 
 of relying on the advice to developers: do not do updates for a period 
 after deletes, otherwise it probably wont' work, we enforce this into the 
 code. 
 
 
 the same logic can be carried over into expiring column, since they are 
 essentially automatically inserted deletes. that way #2103 could be solved
 
 
 I'm attaching an example below, you can refer to them if needed.
 
 Thanks  a lot
 Yang
 
 
 example:
 for simplicity we assume there is only one column family , one column, so we 
 omit column name and cf name in our notation, assume all counterColumns have 
 a delta value of 1, we only mark their ttl now. so c(123) means a counter 
 column of ttl=1, adding a delta of 1. d(456) means a tombstone with ttl=456. 
 
 then we can have the following operations
 
 operationresult after 

Re: expiring + counter column?

2011-05-28 Thread Utku Can Topçu
How about implementing a freezing mechanism on counter columns.

If there are no more increments within freeze seconds after the last
increments (it would be orders or day or so); the column would lock itself
on increments and won't accept increment.

And after this freeze perioid, the ttl should work fine. The column will be
gone forever after freeze + ttl seconds.

On Sat, May 28, 2011 at 2:57 AM, Jonathan Ellis jbel...@gmail.com wrote:

 No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103

 On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote:
  is this combination feature available , or on track ?
 
  thanks
  Yang
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: expiring + counter column?

2011-05-27 Thread Jonathan Ellis
No. See comments to https://issues.apache.org/jira/browse/CASSANDRA-2103

On Fri, May 27, 2011 at 7:29 PM, Yang tedd...@gmail.com wrote:
 is this combination feature available , or on track ?

 thanks
 Yang




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com