[jira] [Issue Comment Edited] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-17 Thread Yang Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051350#comment-13051350
 ] 

Yang Yang edited comment on CASSANDRA-2735 at 6/17/11 10:06 PM:


true, I don't really care about expired data, I'm happy as long as we have an 
expiring counter that "mostly" works or works with certain cautions.

but it seems that the "changing order" can come not only from compaction (which 
is fixed for realistic scenarios here),
but also from effects of message drops.

"compact( compact (Add1, delete), Add2) " is the same as receiving Add1, delete 
, Add2 in messages.

but we know that messages can be easily dropped. 
so let's say the delete is dropped, then we replay it later (through repair for 
example), so we have Add1, Add2, delete, the same issue appears. 
I think the latter issue can be fixed by changing the TTL reconcile rule so 
that reconciled death time is the older death time, not timestamp+new_TTL.

anyhow I think we users of the counters api need to understand that placing a 
delete shortly after your last update, or place an update shortly after delete 
is most likely not going to work.  this patch fixes half of the issue, but it 
still remains. 

  was (Author: yangyangyyy):
true, I don't really care about expired data, I'm happy as long as we have 
an expiring counter that "mostly" works or works with certain cautions.

but it seems that the "changing order" can come not only from compaction (which 
is fixed for realistic scenarios here),
but also from effects of message drops.

"compact( compact (Add1, delete), Add2) " is the same as receiving Add1, delete 
, Add2 in messages.

but we know that messages can be easily dropped. 
so let's say the delete is dropped, then we replay it later (through repair for 
example), the same issue appears. 
I think the latter issue can be fixed by changing the TTL reconcile rule so 
that reconciled death time is the older death time, not timestamp+new_TTL.

anyhow I think we users of the counters api need to understand that placing a 
delete shortly after your last update, or place an update shortly after delete 
is most likely not going to work.  this patch fixes half of the issue, but it 
still remains. 
  
> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0004-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-17 Thread Yang Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051345#comment-13051345
 ] 

Yang Yang edited comment on CASSANDRA-2735 at 6/17/11 9:52 PM:
---

there could be a problem with trying to rely on forcing compaction order to 
make counter expiration work:

if you base the intended order on max timestamp of each sstable, the timestamp 
is not trustworthy, because a single malicious client request can bump up its 
timestamp to the future, and arbitrarily change the order of compaction, thus 
rendering the approach in 2735 useless.

you can't base the order on the physical sstable flush time either, since 
different nodes have different flush times.

overall I think trying to fix the compaction order is not the correct direction 
to attack this problem: the issue here is due to the changing order between 
*individual* counter adds/deletes (auto-expire is same as delete), this order 
can be different between different counters, so you have to fix the order 
between the updates within each counter, not the order between *ensembles of 
counters*. such ensembles of counters do not guarantee any orders at all, due 
to randomness in flushing time, or message delivery (they have similar effects)

the problem with current counter+delete implementation is that counters use 
timestamp() to represent their order, but when they are merged, they lose their 
*individual order* and retain a max timestamp(), which supposedly represents 
the order of the ensemble, but this is meaningless because the it is the order 
of the ensemble is different from the true order.



  was (Author: yangyangyyy):
there could be a problem with trying to rely on forcing compaction order to 
make counter expiration work:

if you base the intended order on max timestamp of each sstable, the timestamp 
is not trustworthy, because a single malicious client request can bump up its 
timestamp to the future, and arbitrarily change the order of compaction, thus 
rendering the approach in 2735 useless.

you can't base the order on the physical sstable flush time either, since 
different nodes have different flush times.

overall I think trying to fix the compaction order is not the correct direction 
to attack this problem: the issue here is due to the changing order between 
*individual* counter adds/deletes (auto-expire is same as delete), this order 
can be different between different counters, so you have to fix the order 
between the updates within each counter, not the order between *ensembles of 
counters*. such ensembles of counters do not guarantee any orders at all, due 
to randomness in flushing time, or message delivery (they have similar effects)

  
> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0004-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-17 Thread Yang Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051345#comment-13051345
 ] 

Yang Yang edited comment on CASSANDRA-2735 at 6/17/11 9:47 PM:
---

there could be a problem with trying to rely on forcing compaction order to 
make counter expiration work:

if you base the intended order on max timestamp of each sstable, the timestamp 
is not trustworthy, because a single malicious client request can bump up its 
timestamp to the future, and arbitrarily change the order of compaction, thus 
rendering the approach in 2735 useless.

you can't base the order on the physical sstable flush time either, since 
different nodes have different flush times.

overall I think trying to fix the compaction order is not the correct direction 
to attack this problem: the issue here is due to the changing order between 
*individual* counter adds/deletes (auto-expire is same as delete), this order 
can be different between different counters, so you have to fix the order 
between the updates within each counter, not the order between *ensembles of 
counters*. such ensembles of counters do not guarantee any orders at all, due 
to randomness in flushing time, or message delivery (they have similar effects)


  was (Author: yangyangyyy):
there could be a problem with trying to rely on forcing compaction order to 
make counter expiration work:

if you base the intended order on max timestamp of each sstable, the timestamp 
is not trustworthy, because a single malicious client request can bump up its 
timestamp to the future, and arbitrarily change the order of compaction, thus 
rendering the approach in 2735 useless.

you can't base the order on the physical sstable flush time either, since 
different nodes have different flush times.

overall I think trying to fix the compaction order is not the correct direction 
to attack this problem: the issue here is due to the changing order between 
individual counter adds/deletes (auto-expire is same as delete), this order can 
be different between different counters, so you have to fix the order between 
the updates within each counter, not the order between ensembles of counters. 
such ensembles of counters do not guarantee any orders at all, due to 
randomness in flushing time, or message delivery (they have similar effects)

  
> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0004-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2735) Timestamp Based Compaction Strategy

2011-06-09 Thread Alan Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046750#comment-13046750
 ] 

Alan Liang edited comment on CASSANDRA-2735 at 6/9/11 8:01 PM:
---

Splitting out the capturing of max client supplied timestamp into a separate 
ticket (#2753) so that other tickets can benefit.

  was (Author: alanliang):
Splitting out the capturing of max client supplied timestamp into a 
separate ticket so that other tickets can benefit.
  
> Timestamp Based Compaction Strategy
> ---
>
> Key: CASSANDRA-2735
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Alan Liang
>Assignee: Alan Liang
>Priority: Minor
>  Labels: compaction
> Attachments: 0002-timestamp-bucketed-compaction-strategy.patch, 
> 0003-implemented-timestamp-bucketed-compaction-strategy-a.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira