[jira] [Updated] (CASSANDRA-9111) SSTables originated from the same incremental repair session have different repairedAt timestamps

2015-04-08 Thread prmg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

prmg updated CASSANDRA-9111:

Attachment: CASSANDRA-9111-v1.txt

just noticed I forgot to add the new timestamp field when computing the message 
size on PrepareMessage.serializedSize. Attached a v1 version of the patch with 
a fix.

> SSTables originated from the same incremental repair session have different 
> repairedAt timestamps
> -
>
> Key: CASSANDRA-9111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9111
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: prmg
> Attachments: CASSANDRA-9111-v0.txt, CASSANDRA-9111-v1.txt
>
>
> CASSANDRA-7168 optimizes QUORUM reads by skipping incrementally repaired 
> SSTables on other replicas that were repaired on or before the maximum 
> repairedAt timestamp of the coordinating replica's SSTables for the query 
> partition.
> One assumption of that optimization is that SSTables originated from the same 
> repair session in different nodes will have the same repairedAt timestamp, 
> since the objective is to skip reading SSTables originated in the same repair 
> session (or before).
> However, currently, each node timestamps independently SSTables originated 
> from the same repair session, so they almost never have the same timestamp.
> Steps to reproduce the problem:
> {code}
> ccm create test
> ccm populate -n 3
> ccm start
> ccm node1 cqlsh;
> {code}
> {code:sql}
> CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE TABLE foo.bar ( key int, col int, PRIMARY KEY (key) ) ;
> INSERT INTO foo.bar (key, col) VALUES (1, 1);
> exit;
> {code}
> {code}
> ccm node1 flush;
> ccm node2 flush;
> ccm node3 flush;
> nodetool -h 127.0.0.1 -p 7100 repair -par -inc foo bar
> [2015-04-02 21:56:07,726] Starting repair command #1, repairing 3 ranges for 
> keyspace foo (parallelism=PARALLEL, full=false)
> [2015-04-02 21:56:07,816] Repair session 3655b670-d99c-11e4-b250-9107aba35569 
> for range (3074457345618258602,-9223372036854775808] finished
> [2015-04-02 21:56:07,816] Repair session 365a4a50-d99c-11e4-b250-9107aba35569 
> for range (-9223372036854775808,-3074457345618258603] finished
> [2015-04-02 21:56:07,818] Repair session 365bf800-d99c-11e4-b250-9107aba35569 
> for range (-3074457345618258603,3074457345618258602] finished
> [2015-04-02 21:56:07,995] Repair command #1 finished
> sstablemetadata 
> ~/.ccm/test/node1/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node2/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node3/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  | grep Repaired
> Repaired at: 1428023050318
> Repaired at: 1428023050322
> Repaired at: 1428023050340
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9111) SSTables originated from the same incremental repair session have different repairedAt timestamps

2015-04-08 Thread prmg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486174#comment-14486174
 ] 

prmg commented on CASSANDRA-9111:
-

oops, sorry about that. for some reason I used git diff --color to generate the 
patch (that's why the strange characters). now created and attached a proper 
patch with git format-patch. :)

> SSTables originated from the same incremental repair session have different 
> repairedAt timestamps
> -
>
> Key: CASSANDRA-9111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9111
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: prmg
> Attachments: CASSANDRA-9111-v0.txt
>
>
> CASSANDRA-7168 optimizes QUORUM reads by skipping incrementally repaired 
> SSTables on other replicas that were repaired on or before the maximum 
> repairedAt timestamp of the coordinating replica's SSTables for the query 
> partition.
> One assumption of that optimization is that SSTables originated from the same 
> repair session in different nodes will have the same repairedAt timestamp, 
> since the objective is to skip reading SSTables originated in the same repair 
> session (or before).
> However, currently, each node timestamps independently SSTables originated 
> from the same repair session, so they almost never have the same timestamp.
> Steps to reproduce the problem:
> {code}
> ccm create test
> ccm populate -n 3
> ccm start
> ccm node1 cqlsh;
> {code}
> {code:sql}
> CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE TABLE foo.bar ( key int, col int, PRIMARY KEY (key) ) ;
> INSERT INTO foo.bar (key, col) VALUES (1, 1);
> exit;
> {code}
> {code}
> ccm node1 flush;
> ccm node2 flush;
> ccm node3 flush;
> nodetool -h 127.0.0.1 -p 7100 repair -par -inc foo bar
> [2015-04-02 21:56:07,726] Starting repair command #1, repairing 3 ranges for 
> keyspace foo (parallelism=PARALLEL, full=false)
> [2015-04-02 21:56:07,816] Repair session 3655b670-d99c-11e4-b250-9107aba35569 
> for range (3074457345618258602,-9223372036854775808] finished
> [2015-04-02 21:56:07,816] Repair session 365a4a50-d99c-11e4-b250-9107aba35569 
> for range (-9223372036854775808,-3074457345618258603] finished
> [2015-04-02 21:56:07,818] Repair session 365bf800-d99c-11e4-b250-9107aba35569 
> for range (-3074457345618258603,3074457345618258602] finished
> [2015-04-02 21:56:07,995] Repair command #1 finished
> sstablemetadata 
> ~/.ccm/test/node1/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node2/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node3/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  | grep Repaired
> Repaired at: 1428023050318
> Repaired at: 1428023050322
> Repaired at: 1428023050340
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9111) SSTables originated from the same incremental repair session have different repairedAt timestamps

2015-04-08 Thread prmg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

prmg updated CASSANDRA-9111:

Attachment: CASSANDRA-9111-v0.txt

> SSTables originated from the same incremental repair session have different 
> repairedAt timestamps
> -
>
> Key: CASSANDRA-9111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9111
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: prmg
> Attachments: CASSANDRA-9111-v0.txt
>
>
> CASSANDRA-7168 optimizes QUORUM reads by skipping incrementally repaired 
> SSTables on other replicas that were repaired on or before the maximum 
> repairedAt timestamp of the coordinating replica's SSTables for the query 
> partition.
> One assumption of that optimization is that SSTables originated from the same 
> repair session in different nodes will have the same repairedAt timestamp, 
> since the objective is to skip reading SSTables originated in the same repair 
> session (or before).
> However, currently, each node timestamps independently SSTables originated 
> from the same repair session, so they almost never have the same timestamp.
> Steps to reproduce the problem:
> {code}
> ccm create test
> ccm populate -n 3
> ccm start
> ccm node1 cqlsh;
> {code}
> {code:sql}
> CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE TABLE foo.bar ( key int, col int, PRIMARY KEY (key) ) ;
> INSERT INTO foo.bar (key, col) VALUES (1, 1);
> exit;
> {code}
> {code}
> ccm node1 flush;
> ccm node2 flush;
> ccm node3 flush;
> nodetool -h 127.0.0.1 -p 7100 repair -par -inc foo bar
> [2015-04-02 21:56:07,726] Starting repair command #1, repairing 3 ranges for 
> keyspace foo (parallelism=PARALLEL, full=false)
> [2015-04-02 21:56:07,816] Repair session 3655b670-d99c-11e4-b250-9107aba35569 
> for range (3074457345618258602,-9223372036854775808] finished
> [2015-04-02 21:56:07,816] Repair session 365a4a50-d99c-11e4-b250-9107aba35569 
> for range (-9223372036854775808,-3074457345618258603] finished
> [2015-04-02 21:56:07,818] Repair session 365bf800-d99c-11e4-b250-9107aba35569 
> for range (-3074457345618258603,3074457345618258602] finished
> [2015-04-02 21:56:07,995] Repair command #1 finished
> sstablemetadata 
> ~/.ccm/test/node1/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node2/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node3/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  | grep Repaired
> Repaired at: 1428023050318
> Repaired at: 1428023050322
> Repaired at: 1428023050340
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9111) SSTables originated from the same incremental repair session have different repairedAt timestamps

2015-04-08 Thread prmg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

prmg updated CASSANDRA-9111:

Attachment: (was: CASSANDRA-9111-v0.txt)

> SSTables originated from the same incremental repair session have different 
> repairedAt timestamps
> -
>
> Key: CASSANDRA-9111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9111
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: prmg
>
> CASSANDRA-7168 optimizes QUORUM reads by skipping incrementally repaired 
> SSTables on other replicas that were repaired on or before the maximum 
> repairedAt timestamp of the coordinating replica's SSTables for the query 
> partition.
> One assumption of that optimization is that SSTables originated from the same 
> repair session in different nodes will have the same repairedAt timestamp, 
> since the objective is to skip reading SSTables originated in the same repair 
> session (or before).
> However, currently, each node timestamps independently SSTables originated 
> from the same repair session, so they almost never have the same timestamp.
> Steps to reproduce the problem:
> {code}
> ccm create test
> ccm populate -n 3
> ccm start
> ccm node1 cqlsh;
> {code}
> {code:sql}
> CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE TABLE foo.bar ( key int, col int, PRIMARY KEY (key) ) ;
> INSERT INTO foo.bar (key, col) VALUES (1, 1);
> exit;
> {code}
> {code}
> ccm node1 flush;
> ccm node2 flush;
> ccm node3 flush;
> nodetool -h 127.0.0.1 -p 7100 repair -par -inc foo bar
> [2015-04-02 21:56:07,726] Starting repair command #1, repairing 3 ranges for 
> keyspace foo (parallelism=PARALLEL, full=false)
> [2015-04-02 21:56:07,816] Repair session 3655b670-d99c-11e4-b250-9107aba35569 
> for range (3074457345618258602,-9223372036854775808] finished
> [2015-04-02 21:56:07,816] Repair session 365a4a50-d99c-11e4-b250-9107aba35569 
> for range (-9223372036854775808,-3074457345618258603] finished
> [2015-04-02 21:56:07,818] Repair session 365bf800-d99c-11e4-b250-9107aba35569 
> for range (-3074457345618258603,3074457345618258602] finished
> [2015-04-02 21:56:07,995] Repair command #1 finished
> sstablemetadata 
> ~/.ccm/test/node1/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node2/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node3/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  | grep Repaired
> Repaired at: 1428023050318
> Repaired at: 1428023050322
> Repaired at: 1428023050340
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9111) SSTables originated from the same incremental repair session have different repairedAt timestamps

2015-04-02 Thread prmg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

prmg updated CASSANDRA-9111:

Attachment: CASSANDRA-9111-v0.txt

The proposed solution involves passing the initiator timestamp in the repair's 
PrepareMessage. This timestamp is then used to create the ParentRepairSession 
with the initiator timestamp in all replicas. Finally, the 
ParentRepairSession's repairedAt field is used to populate the repairedAt field 
on the SSTable metadata after anticompation.

After performing the test described in the ticket description after this patch, 
the SSTables originated from the same incremental repair session now share the 
same timestamp:

{code}
sstablemetadata 
~/.ccm/test/node1/data/foo/bar-104e25c0d99c11e4b2509107aba35569/foo-bar-ka-1-Statistics.db
 
~/.ccm/test/node2/data/foo/bar-104e25c0d99c11e4b2509107aba35569/foo-bar-ka-1-Statistics.db
 
~/.ccm/test/node3/data/foo/bar-104e25c0d99c11e4b2509107aba35569/foo-bar-ka-1-Statistics.db
 | grep Repaired

Repaired at: 1428022567736
Repaired at: 1428022567736
Repaired at: 1428022567736
{code}

> SSTables originated from the same incremental repair session have different 
> repairedAt timestamps
> -
>
> Key: CASSANDRA-9111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9111
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: prmg
> Attachments: CASSANDRA-9111-v0.txt
>
>
> CASSANDRA-7168 optimizes QUORUM reads by skipping incrementally repaired 
> SSTables on other replicas that were repaired on or before the maximum 
> repairedAt timestamp of the coordinating replica's SSTables for the query 
> partition.
> One assumption of that optimization is that SSTables originated from the same 
> repair session in different nodes will have the same repairedAt timestamp, 
> since the objective is to skip reading SSTables originated in the same repair 
> session (or before).
> However, currently, each node timestamps independently SSTables originated 
> from the same repair session, so they almost never have the same timestamp.
> Steps to reproduce the problem:
> {code}
> ccm create test
> ccm populate -n 3
> ccm start
> ccm node1 cqlsh;
> {code}
> {code:sql}
> CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> CREATE TABLE foo.bar ( key int, col int, PRIMARY KEY (key) ) ;
> INSERT INTO foo.bar (key, col) VALUES (1, 1);
> exit;
> {code}
> {code}
> ccm node1 flush;
> ccm node2 flush;
> ccm node3 flush;
> nodetool -h 127.0.0.1 -p 7100 repair -par -inc foo bar
> [2015-04-02 21:56:07,726] Starting repair command #1, repairing 3 ranges for 
> keyspace foo (parallelism=PARALLEL, full=false)
> [2015-04-02 21:56:07,816] Repair session 3655b670-d99c-11e4-b250-9107aba35569 
> for range (3074457345618258602,-9223372036854775808] finished
> [2015-04-02 21:56:07,816] Repair session 365a4a50-d99c-11e4-b250-9107aba35569 
> for range (-9223372036854775808,-3074457345618258603] finished
> [2015-04-02 21:56:07,818] Repair session 365bf800-d99c-11e4-b250-9107aba35569 
> for range (-3074457345618258603,3074457345618258602] finished
> [2015-04-02 21:56:07,995] Repair command #1 finished
> sstablemetadata 
> ~/.ccm/test/node1/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node2/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  
> ~/.ccm/test/node3/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
>  | grep Repaired
> Repaired at: 1428023050318
> Repaired at: 1428023050322
> Repaired at: 1428023050340
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9111) SSTables originated from the same incremental repair session have different repairedAt timestamps

2015-04-02 Thread prmg (JIRA)
prmg created CASSANDRA-9111:
---

 Summary: SSTables originated from the same incremental repair 
session have different repairedAt timestamps
 Key: CASSANDRA-9111
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9111
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: prmg


CASSANDRA-7168 optimizes QUORUM reads by skipping incrementally repaired 
SSTables on other replicas that were repaired on or before the maximum 
repairedAt timestamp of the coordinating replica's SSTables for the query 
partition.

One assumption of that optimization is that SSTables originated from the same 
repair session in different nodes will have the same repairedAt timestamp, 
since the objective is to skip reading SSTables originated in the same repair 
session (or before).

However, currently, each node timestamps independently SSTables originated from 
the same repair session, so they almost never have the same timestamp.

Steps to reproduce the problem:
{code}
ccm create test
ccm populate -n 3
ccm start
ccm node1 cqlsh;
{code}

{code:sql}
CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 3};
CREATE TABLE foo.bar ( key int, col int, PRIMARY KEY (key) ) ;
INSERT INTO foo.bar (key, col) VALUES (1, 1);
exit;
{code}

{code}
ccm node1 flush;
ccm node2 flush;
ccm node3 flush;

nodetool -h 127.0.0.1 -p 7100 repair -par -inc foo bar

[2015-04-02 21:56:07,726] Starting repair command #1, repairing 3 ranges for 
keyspace foo (parallelism=PARALLEL, full=false)
[2015-04-02 21:56:07,816] Repair session 3655b670-d99c-11e4-b250-9107aba35569 
for range (3074457345618258602,-9223372036854775808] finished
[2015-04-02 21:56:07,816] Repair session 365a4a50-d99c-11e4-b250-9107aba35569 
for range (-9223372036854775808,-3074457345618258603] finished
[2015-04-02 21:56:07,818] Repair session 365bf800-d99c-11e4-b250-9107aba35569 
for range (-3074457345618258603,3074457345618258602] finished
[2015-04-02 21:56:07,995] Repair command #1 finished

sstablemetadata 
~/.ccm/test/node1/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
 
~/.ccm/test/node2/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
 
~/.ccm/test/node3/data/foo/bar-377b5540d99d11e49cc09107aba35569/foo-bar-ka-1-Statistics.db
 | grep Repaired

Repaired at: 1428023050318
Repaired at: 1428023050322
Repaired at: 1428023050340
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7168) Add repair aware consistency levels

2015-03-31 Thread prmg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389957#comment-14389957
 ] 

prmg commented on CASSANDRA-7168:
-

[~tjake] I'm giving a try on this ticket for learning purposes and so far I 
calculated maxPartitionRepairTime on the coordinator, sent over via the 
MessagingService to the replicas on the ReadCommand and skipped sstables with 
repairedAt <= maxPartitionRepairTime on the CollationController. One part that 
was not clear to me in your description was: 
bq. We will also need to include tombstones in the results of the non-repaired 
column family result since they need to be merged with the repaired result.
Is that tombstone inclusion already done by the normal flow of the collation 
controller or is it necessary to add some post-processing after repaired 
sstables <= maxPartitionRepairTime are skipped? Would be great if you could 
clarify that a bit for me. Thanks!

> Add repair aware consistency levels
> ---
>
> Key: CASSANDRA-7168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7168
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: T Jake Luciani
>  Labels: performance
> Fix For: 3.1
>
>
> With CASSANDRA-5351 and CASSANDRA-2424 I think there is an opportunity to 
> avoid a lot of extra disk I/O when running queries with higher consistency 
> levels.  
> Since repaired data is by definition consistent and we know which sstables 
> are repaired, we can optimize the read path by having a REPAIRED_QUORUM which 
> breaks reads into two phases:
>  
>   1) Read from one replica the result from the repaired sstables. 
>   2) Read from a quorum only the un-repaired data.
> For the node performing 1) we can pipeline the call so it's a single hop.
> In the long run (assuming data is repaired regularly) we will end up with 
> much closer to CL.ONE performance while maintaining consistency.
> Some things to figure out:
>   - If repairs fail on some nodes we can have a situation where we don't have 
> a consistent repaired state across the replicas.  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)