[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2015-08-05 Thread Marcus Olsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658262#comment-14658262
 ] 

Marcus Olsson edited comment on CASSANDRA-5220 at 8/5/15 2:15 PM:
--

While looking at CASSANDRA-5839 I realized that this might break something 
during upgrade from 2.2->3.0, with this patch the table _repair_history_ 
changes to have a set of ranges instead of a start and end range.

Should I change the table back and do one insert per range instead?


was (Author: molsson):
While looking at CASSANDRA-5839 I realized that this might break something 
during upgrade from 2.2->3.0, with this patch the table _repair_history_ 
changes to have a set of ranges instead of a start and end range. (This patch 
was first done when

Should I change the table back and do one insert per range instead?

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Marcus Olsson
>  Labels: performance, repair
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2, 
> cassandra-3.0-5220-1.patch, cassandra-3.0-5220-2.patch, 
> cassandra-3.0-5220.patch
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2014-04-16 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971574#comment-13971574
 ] 

Ryan McGuire edited comment on CASSANDRA-5220 at 4/16/14 4:05 PM:
--

This is a yourkit comparison between the two runs of the dtest. Old Time is 
without vnodes, New Time is with vnodes. I turned off internode_compression 
because that run showed all the time being spent in Snappy compressor code, but 
even without compression the repair time didn't improve (much):

!5220-yourkit.png!


was (Author: enigmacurry):
I don't have a lot of experience with java profiling, but this is a yourkit 
comparison between the two runs of the dtest. Old Time is without vnodes, New 
Time is with vnodes. I turned off internode_compression because that run showed 
all the time being spent in Snappy compressor code, but even without 
compression the repair time didn't improve (much):

!5220-yourkit.png!

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
>  Labels: performance, repair
> Fix For: 2.1 beta2
>
> Attachments: 5220-yourkit.png, 5220-yourkit.tar.bz2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-27 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949909#comment-13949909
 ] 

Ryan McGuire edited comment on CASSANDRA-5220 at 3/27/14 9:08 PM:
--

As of today, on cassandra-2.0 HEAD

repair_test.TestRepair.simple_repair_test:

bq.without vnodes: Repair time: 5.10s
bq.with vnodes: Repair time: 562.97s

100x slower than without vnodes. So I'm not sure what happened here since 
[~brandon.williams] ran this in November.

Seeing the same slowdown in 2.1 HEAD.

EDIT: Actually, this isn't the test length, this is the time the actual repair 
command took in that same dtest.


was (Author: enigmacurry):
As of today, on cassandra-2.0 HEAD

repair_test.TestRepair.simple_repair_test:

bq.without vnodes: Repair time: 5.10s
bq.with vnodes: Repair time: 562.97s

100x slower than without vnodes. So I'm not sure what happened here since 
[~brandon.williams] ran this in November.

EDIT: Actually, this isn't the test length, this is the time the actual repair 
command took in that same dtest.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-27 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949909#comment-13949909
 ] 

Ryan McGuire edited comment on CASSANDRA-5220 at 3/27/14 8:48 PM:
--

As of today, on cassandra-2.0 HEAD

repair_test.TestRepair.simple_repair_test:

bq.without vnodes: Repair time: 5.10s
bq.with vnodes: Repair time: 562.97s

100x slower than without vnodes. So I'm not sure what happened here since 
[~brandon.williams] ran this in November.

EDIT: Actually, this isn't the test length, this is the time the actual repair 
command took in that same dtest.


was (Author: enigmacurry):
As of today, on cassandra-2.0 HEAD

repair_test.TestRepair.simple_repair_test:

bq.without vnodes: Repair time: 5.10s
bq.with vnodes: Repair time: 562.97s

100x slower than without vnodes. So I'm not sure what happened here since 
[~brandon.williams] ran this in November.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-27 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949909#comment-13949909
 ] 

Ryan McGuire edited comment on CASSANDRA-5220 at 3/27/14 8:47 PM:
--

As of today, on cassandra-2.0 HEAD

repair_test.TestRepair.simple_repair_test:

bq.without vnodes: Repair time: 5.10s
bq.with vnodes: Repair time: 562.97s

100x slower than without vnodes. So I'm not sure what happened here since 
[~brandon.williams] ran this in November.


was (Author: enigmacurry):
As of today, on cassandra-2.0 HEAD

repair_test.TestRepair.simple_repair_test:

bq.without vnodes: Repair time: 5.10s
bq.with vnodes: Repair time: 562.97s

100x slower than without vnodes. So I'm not sure what happened here since 
@driftx ran this in November.

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2014-03-06 Thread Robert Coli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923229#comment-13923229
 ] 

Robert Coli edited comment on CASSANDRA-5220 at 3/6/14 11:20 PM:
-

{quote}So we're 3-3.5x slower in the simple case.{quote}
So, if :

1) the default for gc_grace_seconds is how frequently we want people to repair
2) and vnodes make repair 3-3.5x slower in the simple case
3) and vnodes are enabled by default

... why has the default for gc_grace_seconds not been increased by 3-3.5x? 
(CASSANDRA-5850)


was (Author: rcoli):
{quote}So we're 3-3.5x slower in the simple case.{quote}
So, if :

1) the default for gc_grace_seconds is how frequently we want people to repair
2) and vnodes make repair 3-3.5x slower in the simple case
3) and vnodes are enabled by default
4) why has the default for gc_grace_seconds not been increased by 3-3.5x? 
(CASSANDRA-5850)

> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 2.1 beta2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:49 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:30 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:26 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:22 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-12-23 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774
 ] 

Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 6:21 PM:
---

 We ran "nodetool repair" on a 3 node cassandra cluster with production-quality 
hardware, using version 2.0.3. Each node had about 1TB of data. This is still 
testing.  After 5 days the repair job still hasn't finished. I can see it's 
still running.

Here's the process:
{noformat}
root 30835 30774  0 Dec17 pts/000:03:53 /usr/bin/java -cp 
/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar
 -Xmx32m -Dlog4j.configuration=log4j-tools.properties 
-Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 
repair -pr as_reports
{noformat}

The log output has just:
{noformat}
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
[2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for 
keyspace as_reports
{noformat}

Here's the output of "nodetool tpstats":
{noformat}
cass3 /tmp> nodetool tpstats
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
ReadStage 1 0   38083403 0  
   0
RequestResponseStage  0 0 1951200451 0  
   0
MutationStage 0 0 2853354069 0  
   0
ReadRepairStage   0 03794926 0  
   0
ReplicateOnWriteStage 0 0  0 0  
   0
GossipStage   0 04880147 0  
   0
AntiEntropyStage  1 3  9 0  
   0
MigrationStage0 0 30 0  
   0
MemoryMeter   0 0115 0  
   0
MemtablePostFlusher   0 0  75121 0  
   0
FlushWriter   0 0  49934 0  
  52
MiscStage 0 0  0 0  
   0
PendingRangeCalculator0 0  7 0  
   0
commitlog_archiver0 0  0 0  
   0
AntiEntropySessions   1 1  1 0  
   0
InternalResponseStage 0 0  9 0  
   0
HintedHandoff 0 0   1141 0  
   0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   884
MUTATION   1407711
_TRACE   0
REQUEST_RESPONSE 0
{noformat}
The cluster has some write traffic to it. We decided to test it under load.
This is the busiest c

[jira] [Comment Edited] (CASSANDRA-5220) Repair improvements when using vnodes

2013-02-04 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570751#comment-13570751
 ] 

Yuki Morishita edited comment on CASSANDRA-5220 at 2/4/13 11:53 PM:


The reason the repair is done almost sequentially is to synchronize merkle tree 
creation across the nodes(CASSANDRA-2816). If we could form the groups of nodes 
that do not overlap for several ranges, we would be able to parallelize 
create/validate merkle tree across those groups.

  was (Author: yukim):
The reason the repair is done almost sequentially is to synchronize merkle 
tree creation across the nodes(CASSANDRA-2816). If we could form the groups of 
nodes that do not overlap for several ranges, we would be able to parallelize 
create/validate merkle tree.
  
> Repair improvements when using vnodes
> -
>
> Key: CASSANDRA-5220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5220
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.2.0 beta 1
>Reporter: Brandon Williams
>Assignee: Yuki Morishita
> Fix For: 1.2.2
>
>
> Currently when using vnodes, repair takes much longer to complete than 
> without them.  This appears at least in part because it's using a session per 
> range and processing them sequentially.  This generates a lot of log spam 
> with vnodes, and while being gentler and lighter on hard disk deployments, 
> ssd-based deployments would often prefer that repair be as fast as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira