[ https://issues.apache.org/jira/browse/CASSANDRA-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855774#comment-13855774 ]
Donald Smith edited comment on CASSANDRA-5220 at 12/23/13 8:22 PM: ------------------------------------------------------------------- We ran "nodetool repair" on a 3 node cassandra cluster with production-quality hardware, using version 2.0.3. Each node had about 1TB of data. This is still testing. After 5 days the repair job still hasn't finished. I can see it's still running. Here's the process: {noformat} root 30835 30774 0 Dec17 pts/0 00:03:53 /usr/bin/java -cp /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar -Xmx32m -Dlog4j.configuration=log4j-tools.properties -Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 repair -pr as_reports {noformat} The log output has just: {noformat} xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k [2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for keyspace as_reports {noformat} Here's the output of "nodetool tpstats": {noformat} cass3 /tmp> nodetool tpstats xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k Pool Name Active Pending Completed Blocked All time blocked ReadStage 1 0 38083403 0 0 RequestResponseStage 0 0 1951200451 0 0 MutationStage 0 0 2853354069 0 0 ReadRepairStage 0 0 3794926 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 4880147 0 0 AntiEntropyStage 1 3 9 0 0 MigrationStage 0 0 30 0 0 MemoryMeter 0 0 115 0 0 MemtablePostFlusher 0 0 75121 0 0 FlushWriter 0 0 49934 0 52 MiscStage 0 0 0 0 0 PendingRangeCalculator 0 0 7 0 0 commitlog_archiver 0 0 0 0 0 AntiEntropySessions 1 1 1 0 0 InternalResponseStage 0 0 9 0 0 HintedHandoff 0 0 1141 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 PAGED_RANGE 0 BINARY 0 READ 884 MUTATION 1407711 _TRACE 0 REQUEST_RESPONSE 0 {noformat} The cluster has some write traffic to it. We decided to test it under load. This is the busiest column family, as reported by "nodetool cfstats": {noformat} Read Count: 38084316 Read Latency: 9.409910464927346 ms. Write Count: 2850436738 Write Latency: 0.8083138546641199 ms. Pending Tasks: 0 .... Table: data_report_details SSTable count: 592 Space used (live), bytes: 160644106183 Space used (total), bytes: 160663248847 SSTable Compression Ratio: 0.5296494510512617 Number of keys (estimate): 51015040 Memtable cell count: 311180 Memtable data size, bytes: 46275953 Memtable switch count: 6100 Local read count: 6147 Local read latency: 154.539 ms Local write count: 750865416 Local write latency: 0.029 ms Pending tasks: 0 Bloom filter false positives: 265 Bloom filter false ratio: 0.06009 Bloom filter space used, bytes: 64690104 Compacted partition minimum bytes: 30 Compacted partition maximum bytes: 10090808 Compacted partition mean bytes: 5267 Average live cells per slice (last five minutes): 1.0 Average tombstones per slice (last five minutes): 0.0 {noformat} We're gonna restart the node. We barely do deletes or updates (only if a report is re-uploaded), so we suspect that we can get by without doing repairs. Correct me if we're wrong about that. nodetool compactionstats outputs: {nodetool} xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k pending tasks: 166 compaction type keyspace table completed total unit progress Compaction as_reportsdata_report_details_below_threshold 971187148 1899419306 bytes 51.13% Compaction as_reportsdata_report_details_below_threshold 950086203 1941500979 bytes 48.94% Compaction as_reportsdata_hierarchy_details 2968934609 5808990354 bytes 51.11% Compaction as_reportsdata_report_details_below_threshold 945816183 1900166474 bytes 49.78% Compaction as_reportsdata_report_details_below_threshold 899143344 1943534395 bytes 46.26% Compaction as_reportsdata_report_details_below_threshold 856329840 1946566670 bytes 43.99% Compaction as_reportsdata_report_details 195235688 915395763 bytes 21.33% Compaction as_reportsdata_report_details_below_threshold 982460217 1931001761 bytes 50.88% Compaction as_reportsdata_report_details_below_threshold 896609409 1931075688 bytes 46.43% Compaction as_reportsdata_report_details_below_threshold 869219044 1928977382 bytes 45.06% Compaction as_reportsdata_report_details_below_threshold 870931112 1901729646 bytes 45.80% Compaction as_reportsdata_report_details_below_threshold 879343635 1939491280 bytes 45.34% Compaction as_reportsdata_report_details_below_threshold 981888944 1893024439 bytes 51.87% Compaction as_reportsdata_report_details_below_threshold 871785587 1884652607 bytes 46.26% Compaction as_reportsdata_report_details_below_threshold 902340327 1913280943 bytes 47.16% Compaction as_reportsdata_report_details_below_threshold 1025069846 1901568674 bytes 53.91% Compaction as_reportsdata_report_details_below_threshold 920112020 1893272832 bytes 48.60% Compaction as_reportsdata_hierarchy_details 2962138268 5774762866 bytes 51.29% Compaction as_reportsdata_report_details_below_threshold 790782860 1918640911 bytes 41.22% Compaction as_reportsdata_hierarchy_details 2972501409 5885217724 bytes 50.51% Compaction as_reportsdata_report_details_below_threshold 1611697659 1939040337 bytes 83.12% Compaction as_reportsdata_report_details_below_threshold 943130526 1943713837 bytes 48.52% Compaction as_reportsdata_report_details_below_threshold 911127302 1952885196 bytes 46.66% Compaction as_reportsdata_report_details_below_threshold 911230087 1927967871 bytes 47.26% {nodetool} was (Author: thinkerfeeler): We ran "nodetool repair" on a 3 node cassandra cluster with production-quality hardware, using version 2.0.3. Each node had about 1TB of data. This is still testing. After 5 days the repair job still hasn't finished. I can see it's still running. Here's the process: {noformat} root 30835 30774 0 Dec17 pts/0 00:03:53 /usr/bin/java -cp /etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-2.0.3.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-2.0.3.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar:/usr/share/cassandra/lib/thrift-server-0.3.2.jar -Xmx32m -Dlog4j.configuration=log4j-tools.properties -Dstorage-config=/etc/cassandra/conf org.apache.cassandra.tools.NodeCmd -p 7199 repair -pr as_reports {noformat} The log output has just: {noformat} xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k [2013-12-17 23:26:48,144] Starting repair command #1, repairing 256 ranges for keyspace as_reports {noformat} Here's the output of "nodetool tpstats": {noformat} cass3 /tmp> nodetool tpstats xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M -Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k Pool Name Active Pending Completed Blocked All time blocked ReadStage 1 0 38083403 0 0 RequestResponseStage 0 0 1951200451 0 0 MutationStage 0 0 2853354069 0 0 ReadRepairStage 0 0 3794926 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 4880147 0 0 AntiEntropyStage 1 3 9 0 0 MigrationStage 0 0 30 0 0 MemoryMeter 0 0 115 0 0 MemtablePostFlusher 0 0 75121 0 0 FlushWriter 0 0 49934 0 52 MiscStage 0 0 0 0 0 PendingRangeCalculator 0 0 7 0 0 commitlog_archiver 0 0 0 0 0 AntiEntropySessions 1 1 1 0 0 InternalResponseStage 0 0 9 0 0 HintedHandoff 0 0 1141 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 PAGED_RANGE 0 BINARY 0 READ 884 MUTATION 1407711 _TRACE 0 REQUEST_RESPONSE 0 {noformat} The cluster has some write traffic to it. We decided to test it under load. This is the busiest column family, as reported by "nodetool cfstats": {noformat} Read Count: 38084316 Read Latency: 9.409910464927346 ms. Write Count: 2850436738 Write Latency: 0.8083138546641199 ms. Pending Tasks: 0 .... Table: data_report_details SSTable count: 592 Space used (live), bytes: 160644106183 Space used (total), bytes: 160663248847 SSTable Compression Ratio: 0.5296494510512617 Number of keys (estimate): 51015040 Memtable cell count: 311180 Memtable data size, bytes: 46275953 Memtable switch count: 6100 Local read count: 6147 Local read latency: 154.539 ms Local write count: 750865416 Local write latency: 0.029 ms Pending tasks: 0 Bloom filter false positives: 265 Bloom filter false ratio: 0.06009 Bloom filter space used, bytes: 64690104 Compacted partition minimum bytes: 30 Compacted partition maximum bytes: 10090808 Compacted partition mean bytes: 5267 Average live cells per slice (last five minutes): 1.0 Average tombstones per slice (last five minutes): 0.0 {noformat} We're gonna restart the node. We barely do deletes or updates (only if a report is re-uploaded), so we suspect that we can get by without doing repairs. Correct me if we're wrong about that. > Repair improvements when using vnodes > ------------------------------------- > > Key: CASSANDRA-5220 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5220 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 1.2.0 beta 1 > Reporter: Brandon Williams > Assignee: Yuki Morishita > Fix For: 2.1 > > > Currently when using vnodes, repair takes much longer to complete than > without them. This appears at least in part because it's using a session per > range and processing them sequentially. This generates a lot of log spam > with vnodes, and while being gentler and lighter on hard disk deployments, > ssd-based deployments would often prefer that repair be as fast as possible. -- This message was sent by Atlassian JIRA (v6.1.5#6160)