[jira] [Updated] (CASSANDRA-14884) Move TWCS message "No compaction necessary for bucket size" to Trace level
[ https://issues.apache.org/jira/browse/CASSANDRA-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-14884: -- Assignee: J.B. Langston Attachment: CASSANDRA-14884.patch Status: Patch Available (was: Open) > Move TWCS message "No compaction necessary for bucket size" to Trace level > -- > > Key: CASSANDRA-14884 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14884 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: J.B. Langston >Assignee: J.B. Langston >Priority: Trivial > Attachments: CASSANDRA-14884.patch > > > When using TWCS, this message sometimes spams the debug logs: > DEBUG > [CompactionExecutor:4993|https://datastax.jira.com/wiki/display/CompactionExecutor/4993] > 2018-04-20 00:41:13,795 TimeWindowCompactionStrategy.java:304 - No > compaction necessary for bucket size 1 , key 152176320, now 152418240 > The similar message is already at trace level for LCS, so this patch changes > the message from TWCS to trace as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14884) Move TWCS message "No compaction necessary for bucket size" to Trace level
J.B. Langston created CASSANDRA-14884: - Summary: Move TWCS message "No compaction necessary for bucket size" to Trace level Key: CASSANDRA-14884 URL: https://issues.apache.org/jira/browse/CASSANDRA-14884 Project: Cassandra Issue Type: Improvement Components: Compaction Reporter: J.B. Langston When using TWCS, this message sometimes spams the debug logs: DEBUG [CompactionExecutor:4993|https://datastax.jira.com/wiki/display/CompactionExecutor/4993] 2018-04-20 00:41:13,795 TimeWindowCompactionStrategy.java:304 - No compaction necessary for bucket size 1 , key 152176320, now 152418240 The similar message is already at trace level for LCS, so this patch changes the message from TWCS to trace as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14522) sstableloader options assume the rpc/native interface is the same as the internode interface
[ https://issues.apache.org/jira/browse/CASSANDRA-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566897#comment-16566897 ] J.B. Langston commented on CASSANDRA-14522: --- Yes, it does appear to be fixed in trunk. > sstableloader options assume the rpc/native interface is the same as the > internode interface > > > Key: CASSANDRA-14522 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14522 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jeremy Hanna >Assignee: Jeremy >Priority: Major > Labels: lhf > Attachments: CASSANDRA-14522.patch > > > Currently, in the LoaderOptions for the BulkLoader, the user can give a list > of initial host addresses. That's to do the initial connection to the > cluster but also to stream the sstables. If you have two physical > interfaces, one for rpc, the other for internode traffic, then bulk loader > won't currently work. It will throw an error such as: > {quote} > > sstableloader -v -u cassadmin -pw xxx -d > > 10.133.210.101,10.133.210.102,10.133.210.103,10.133.210.104 > > /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl > Established connection to initial hosts > Opening sstables and calculating sections to stream > Streaming relevant part of > /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl/mc-1-big-Data.db > /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl/mc-2-big-Data.db > to [/10.133.210.101, /10.133.210.103, /10.133.210.102, /10.133.210.104] > progress: total: 100% 0 MB/s(avg: 0 MB/s)ERROR 10:16:05,311 [Stream > #9ed00130-6ff6-11e8-965c-93a78bf96e60] Streaming error occurred > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_101] > at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_101] > at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_101] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > ~[na:1.8.0_101] > at java.nio.channels.SocketChannel.open(SocketChannel.java:189) > ~[na:1.8.0_101] > at > org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:266) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:253) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212) > [cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_101] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > [cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > ~[netty-all-4.0.54.Final.jar:4.0.54.Final] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101] > ERROR 10:16:05,312 [Stream #9ed00130-6ff6-11e8-965c-93a78bf96e60] Streaming > error occurred > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_101] > at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_101] > at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_101] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > ~[na:1.8.0_101] > at java.nio.channels.SocketChannel.open(SocketChannel.java:189) > ~[na:1.8.0_101] > at > org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:266) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:253) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212) >
[jira] [Updated] (CASSANDRA-14522) sstableloader options assume the rpc/native interface is the same as the internode interface
[ https://issues.apache.org/jira/browse/CASSANDRA-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-14522: -- Attachment: CASSANDRA-14522.patch Status: Patch Available (was: Open) There is a simpler fix. The Host object being iterated over in that loop has methods to get the listen address directly. You just need to change endpoint.getAddress to endpoint.getBroadcastAddress. I have also attached a patch. The patch is against Cassandra 3.0 and should merge forward cleanly. Note: there is also a Host.getListenAddress method which returns the local listen address, but we want to use the broadcast address in case the sstableloader is run in a different DC that cannot communicate with a remote node over the local listen address. I also noticed that you checked in your changes to cassandra.yaml that you were using to test this. That should be reverted. > sstableloader options assume the rpc/native interface is the same as the > internode interface > > > Key: CASSANDRA-14522 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14522 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Jeremy Hanna >Assignee: Jeremy >Priority: Major > Labels: lhf > Attachments: CASSANDRA-14522.patch > > > Currently, in the LoaderOptions for the BulkLoader, the user can give a list > of initial host addresses. That's to do the initial connection to the > cluster but also to stream the sstables. If you have two physical > interfaces, one for rpc, the other for internode traffic, then bulk loader > won't currently work. It will throw an error such as: > {quote} > > sstableloader -v -u cassadmin -pw xxx -d > > 10.133.210.101,10.133.210.102,10.133.210.103,10.133.210.104 > > /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl > Established connection to initial hosts > Opening sstables and calculating sections to stream > Streaming relevant part of > /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl/mc-1-big-Data.db > /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl/mc-2-big-Data.db > to [/10.133.210.101, /10.133.210.103, /10.133.210.102, /10.133.210.104] > progress: total: 100% 0 MB/s(avg: 0 MB/s)ERROR 10:16:05,311 [Stream > #9ed00130-6ff6-11e8-965c-93a78bf96e60] Streaming error occurred > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_101] > at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_101] > at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_101] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > ~[na:1.8.0_101] > at java.nio.channels.SocketChannel.open(SocketChannel.java:189) > ~[na:1.8.0_101] > at > org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:266) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:253) > ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212) > [cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_101] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > [cassandra-all-3.0.15.2128.jar:3.0.15.2128] > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > ~[netty-all-4.0.54.Final.jar:4.0.54.Final] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101] > ERROR 10:16:05,312 [Stream #9ed00130-6ff6-11e8-965c-93a78bf96e60] Streaming > error occurred > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_101] > at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_101] > at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_101] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > ~[na:1.8.0_101] > at java.nio.channels.SocketChannel.open(SocketChannel.java:189) > ~[na:1.8.0_101] > at >
[jira] [Created] (CASSANDRA-12197) Integrate top threads command in nodetool
J.B. Langston created CASSANDRA-12197: - Summary: Integrate top threads command in nodetool Key: CASSANDRA-12197 URL: https://issues.apache.org/jira/browse/CASSANDRA-12197 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Priority: Minor SJK (https://github.com/aragozin/jvm-tools) has a command called ttop that displays the top threads within the JVM, sorted either by CPU utilization or heap allocation rate. When diagnosing garbage collection or high cpu utilization, this is very helpful information. It would be great if users can get this directly with nodetool without having to download something else. SJK is Apache 2.0 licensed so it might be possible leverage its code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11939) Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms
[ https://issues.apache.org/jira/browse/CASSANDRA-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-11939: -- Description: It’s triggering my ocd that read and write latency columns are swapped in proxyhistograms vs cfhistograms. I guess the argument against changing it now is that it could screw with some peoples scripts or expectations, but it does make it hard to eyeball when you’re trying to compare local latencies vs coordinator latencies. {code} Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 4.00 17.00770.00 8239 4 75% 5.00 24.00924.00 17084 17 95% 5.00 35.00 61214.00 51012 24 98% 6.00 35.00 126934.00105778 24 99% 6.00 72.00 152321.00152321 35 Min 0.00 9.00 36.0021 0 Max 6.00 86.00 263210.00 20924300 1109 Percentile Read Latency Write Latency Range Latency (micros) (micros) (micros) 50% 1331.00535.00 11864.00 75% 17084.00642.00 20501.00 95%219342.00 1331.00 20501.00 98%315852.00 2759.00 20501.00 99%379022.00 3311.00 20501.00 Min 373.00 73.00 9888.00 Max379022.00 9887.00 20501.00 {code} Ideally read and write latencies should be in the same order and the first and second columns on both so they’re directly aligned. The sstables column should be moved to the 3rd column to make way. was: It’s triggering my ocd that read and write latency columns are swapped in proxyhistograms vs cfhistograms. I guess the argument against changing it now is that it could screw with some peoples scripts or expectations, but it does make it hard to eyeball when you’re trying to compare local latencies vs coordinator latencies. {code} Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 4.00 17.00770.00 8239 4 75% 5.00 24.00924.00 17084 17 95% 5.00 35.00 61214.00 51012 24 98% 6.00 35.00 126934.00105778 24 99% 6.00 72.00 152321.00152321 35 Min 0.00 9.00 36.0021 0 Max 6.00 86.00 263210.00 20924300 1109 Percentile Read Latency Write Latency Range Latency (micros) (micros) (micros) 50% 1331.00535.00 11864.00 75% 17084.00642.00 20501.00 95%219342.00 1331.00 20501.00 98%315852.00 2759.00 20501.00 99%379022.00 3311.00 20501.00 Min 373.00 73.00 9888.00 Max379022.00 9887.00 20501.00 {code} Ideally read and write latencies should be in the same order and the first and second columns on both so they’re directly comparable. The sstables column should be moved to the 3rd column to make way. > Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms > - > > Key: CASSANDRA-11939 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11939 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: J.B. Langston >Priority: Minor > > It’s triggering my ocd that read and write latency columns are swapped in > proxyhistograms vs cfhistograms. I guess the argument against changing it now > is that it could screw with some peoples scripts or expectations, but it does > make it hard to eyeball when you’re trying to compare local latencies vs > coordinator latencies. > {code} > Percentile SSTables Write Latency Read
[jira] [Updated] (CASSANDRA-11939) Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms
[ https://issues.apache.org/jira/browse/CASSANDRA-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-11939: -- Description: It’s triggering my ocd that read and write latency columns are swapped in proxyhistograms vs cfhistograms. I guess the argument against changing it now is that it could screw with some peoples scripts or expectations, but it does make it hard to eyeball when you’re trying to compare local latencies vs coordinator latencies. {code} Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 4.00 17.00770.00 8239 4 75% 5.00 24.00924.00 17084 17 95% 5.00 35.00 61214.00 51012 24 98% 6.00 35.00 126934.00105778 24 99% 6.00 72.00 152321.00152321 35 Min 0.00 9.00 36.0021 0 Max 6.00 86.00 263210.00 20924300 1109 Percentile Read Latency Write Latency Range Latency (micros) (micros) (micros) 50% 1331.00535.00 11864.00 75% 17084.00642.00 20501.00 95%219342.00 1331.00 20501.00 98%315852.00 2759.00 20501.00 99%379022.00 3311.00 20501.00 Min 373.00 73.00 9888.00 Max379022.00 9887.00 20501.00 {code} Ideally read and write latencies should be in the same order and the first and second columns on both so they’re directly comparable. The sstables column should be moved to the 3rd column to make way. was: It’s triggering my ocd that read and write latency columns are swapped in proxyhistograms vs cfhistograms. I guesst the argument against changing it now is that it could screw with some peoples scripts or expectations, but it does make it hard to eyeball when you’re trying to compare local latencies vs coordinator latencies. {code} Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 4.00 17.00770.00 8239 4 75% 5.00 24.00924.00 17084 17 95% 5.00 35.00 61214.00 51012 24 98% 6.00 35.00 126934.00105778 24 99% 6.00 72.00 152321.00152321 35 Min 0.00 9.00 36.0021 0 Max 6.00 86.00 263210.00 20924300 1109 Percentile Read Latency Write Latency Range Latency (micros) (micros) (micros) 50% 1331.00535.00 11864.00 75% 17084.00642.00 20501.00 95%219342.00 1331.00 20501.00 98%315852.00 2759.00 20501.00 99%379022.00 3311.00 20501.00 Min 373.00 73.00 9888.00 Max379022.00 9887.00 20501.00 {code} Ideally read and write latencies should be in the same order and the first and second columns on both so they’re directly comparable. The sstables column should be moved to the 3rd column to make way. > Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms > - > > Key: CASSANDRA-11939 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11939 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: J.B. Langston >Priority: Minor > > It’s triggering my ocd that read and write latency columns are swapped in > proxyhistograms vs cfhistograms. I guess the argument against changing it now > is that it could screw with some peoples scripts or expectations, but it does > make it hard to eyeball when you’re trying to compare local latencies vs > coordinator latencies. > {code} > Percentile SSTables Write Latency
[jira] [Created] (CASSANDRA-11939) Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms
J.B. Langston created CASSANDRA-11939: - Summary: Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms Key: CASSANDRA-11939 URL: https://issues.apache.org/jira/browse/CASSANDRA-11939 Project: Cassandra Issue Type: Bug Components: Tools Reporter: J.B. Langston Priority: Minor It’s triggering my ocd that read and write latency columns are swapped in proxyhistograms vs cfhistograms. I guesst the argument against changing it now is that it could screw with some peoples scripts or expectations, but it does make it hard to eyeball when you’re trying to compare local latencies vs coordinator latencies. {code} Percentile SSTables Write Latency Read LatencyPartition Size Cell Count (micros) (micros) (bytes) 50% 4.00 17.00770.00 8239 4 75% 5.00 24.00924.00 17084 17 95% 5.00 35.00 61214.00 51012 24 98% 6.00 35.00 126934.00105778 24 99% 6.00 72.00 152321.00152321 35 Min 0.00 9.00 36.0021 0 Max 6.00 86.00 263210.00 20924300 1109 Percentile Read Latency Write Latency Range Latency (micros) (micros) (micros) 50% 1331.00535.00 11864.00 75% 17084.00642.00 20501.00 95%219342.00 1331.00 20501.00 98%315852.00 2759.00 20501.00 99%379022.00 3311.00 20501.00 Min 373.00 73.00 9888.00 Max379022.00 9887.00 20501.00 {code} Ideally read and write latencies should be in the same order and the first and second columns on both so they’re directly comparable. The sstables column should be moved to the 3rd column to make way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11664) Tab completion in cqlsh doesn't work for capitalized letters
J.B. Langston created CASSANDRA-11664: - Summary: Tab completion in cqlsh doesn't work for capitalized letters Key: CASSANDRA-11664 URL: https://issues.apache.org/jira/browse/CASSANDRA-11664 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Minor Tab completion in cqlsh doesn't work for capitalized letters, either in keyspace names or table names. Typing quotes and a corresponding capital letter should complete the table/keyspace name and the closing quote. {code} cqlsh> create keyspace "Test" WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> use "Tes cqlsh> use tes cqlsh> use Test; InvalidRequest: code=2200 [Invalid query] message="Keyspace 'test' does not exist" cqlsh> use "Test"; cqlsh:Test> drop keyspace "Test" cqlsh:Test> create table "TestTable" (a text primary key, b text); cqlsh:Test> select * from "TestTable"; a | b ---+--- (0 rows) cqlsh:Test> select * from "Test {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8969) Add indication in cassandra.yaml that rpc timeouts going too high will cause memory build up
[ https://issues.apache.org/jira/browse/CASSANDRA-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208691#comment-15208691 ] J.B. Langston commented on CASSANDRA-8969: -- I agree this could be a good warning to have. I've seen a lot of customers naively increase the timeout. Usually it's caused by I/O not keeping up with requests, but a lot of users won't take the time to figure that out. They just see their application timing out and they see something in cassandra.yaml called timeout so they increase it without thinking of the cost. Now they have GC death spiral and OOM to contend with in addition to the original problem. > Add indication in cassandra.yaml that rpc timeouts going too high will cause > memory build up > > > Key: CASSANDRA-8969 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8969 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 8969.txt > > > It would be helpful to communicate that setting the rpc timeouts too high may > cause memory problems on the server as it can become overloaded and has to > retain the in flight requests in memory. I'll get this done but just adding > the ticket as a placeholder for memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8969) Add indication in cassandra.yaml that rpc timeouts going too high will cause memory build up
[ https://issues.apache.org/jira/browse/CASSANDRA-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208691#comment-15208691 ] J.B. Langston edited comment on CASSANDRA-8969 at 3/23/16 4:19 PM: --- I agree this could be a good warning to have. I've seen a lot of users naively increase the timeout. Usually it's caused by I/O not keeping up with requests, but a lot of users won't take the time to figure that out. They just see their application timing out and they see something in cassandra.yaml called timeout so they increase it without thinking of the cost. Now they have GC death spiral and OOM to contend with in addition to the original problem. was (Author: jblangs...@datastax.com): I agree this could be a good warning to have. I've seen a lot of customers naively increase the timeout. Usually it's caused by I/O not keeping up with requests, but a lot of users won't take the time to figure that out. They just see their application timing out and they see something in cassandra.yaml called timeout so they increase it without thinking of the cost. Now they have GC death spiral and OOM to contend with in addition to the original problem. > Add indication in cassandra.yaml that rpc timeouts going too high will cause > memory build up > > > Key: CASSANDRA-8969 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8969 > Project: Cassandra > Issue Type: Improvement > Components: Configuration >Reporter: Jeremy Hanna >Assignee: Jeremy Hanna >Priority: Minor > Labels: lhf > Fix For: 3.x > > Attachments: 8969.txt > > > It would be helpful to communicate that setting the rpc timeouts too high may > cause memory problems on the server as it can become overloaded and has to > retain the in flight requests in memory. I'll get this done but just adding > the ticket as a placeholder for memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10593) Unintended interactions between commitlog archiving and commitlog recycling
[ https://issues.apache.org/jira/browse/CASSANDRA-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-10593: -- Description: Currently the comments in commitlog_archiving.properties suggest using either cp or ln for the archive_command. Using ln is problematic because commitlog recycling marks segments as recycled once the corresponding memtables are flushed and Cassandra will no longer replay them. This means it's only possible to do PITR on any records that were written since the last flush. Using cp works, and this is currently how OpsCenter does for PITR, however [~brandon.williams] has pointed out this could have some performance impact because of the additional I/O overhead of copying the commitlog segments. Starting in 2.1, we can disable commit log recycling in cassandra.yaml so I thought this would allow me to do PITR without the extra overhead of using cp. However, when I disable commitlog recycling and try to do a PITR, Cassandra blows up when trying to replay the restored commit logs: {code} ERROR 16:56:42 Exception encountered during startup java.lang.IllegalStateException: Cannot safely construct descriptor for segment, as name and header descriptors do not match ((4,1445878452545) vs (4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log at org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive(CommitLogArchiver.java:207) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:116) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335) ~[dse-core-4.8.0.jar:4.8.0] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at com.datastax.bdp.DseModule.main(DseModule.java:75) [dse-core-4.8.0.jar:4.8.0] java.lang.IllegalStateException: Cannot safely construct descriptor for segment, as name and header descriptors do not match ((4,1445878452545) vs (4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log at org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive(CommitLogArchiver.java:207) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:116) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) at com.datastax.bdp.DseModule.main(DseModule.java:75) Exception encountered during startup: Cannot safely construct descriptor for segment, as name and header descriptors do not match ((4,1445878452545) vs (4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log INFO 16:56:42 DSE shutting down... INFO 16:56:42 All plugins are stopped. ERROR 16:56:42 Exception in thread Thread[Thread-2,5,main] java.lang.AssertionError: null at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1403) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:196) ~[dse-core-4.8.0.jar:4.8.0] at com.datastax.bdp.server.DseDaemon.preStop(DseDaemon.java:426) ~[dse-core-4.8.0.jar:4.8.0] at com.datastax.bdp.server.DseDaemon.safeStop(DseDaemon.java:436) ~[dse-core-4.8.0.jar:4.8.0] at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:676) ~[dse-core-4.8.0.jar:4.8.0] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_31] {code} For the sake of completeness, I also tested using cp for the archive_command and commitlog recycling disabled, and PITR works as expected, but this of course defeats the point. It would be good to have some guidance on what is supported here. If ln isn't expected to work at all, it shouldn't be documented as an acceptable option for the archive_command in commitlog_archiving.properties. If it should work with commitlog recycling disabled, the bug causing the IllegalStateException needs to be fixed. It would also be good to do some testing and quantify the performance impact of enabling commitlog archiving using cp as the archve_command. I realize there are several different issues described here, so maybe they should be separate JIRAs, but first I wanted to just clarify whether we want to support ln at all, and we can go from there. was: Currently the comments in commitlog_archiving.properties suggest using either cp or ln for the archive_command. Using ln is problematic because commitlog recycling marks segments as recycled once the corresponding memtables are flushed and Cassandra will
[jira] [Created] (CASSANDRA-10593) Unintended interactions between commitlog archiving and commitlog recycling
J.B. Langston created CASSANDRA-10593: - Summary: Unintended interactions between commitlog archiving and commitlog recycling Key: CASSANDRA-10593 URL: https://issues.apache.org/jira/browse/CASSANDRA-10593 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Currently the comments in commitlog_archiving.properties suggest using either cp or ln for the archive_command. Using ln is problematic because commitlog recycling marks segments as recycled once the corresponding memtables are flushed and Cassandra will no longer be replay them. This means it's only possible to do PITR on any records that were written since the last flush. Using cp works, and this is currently how OpsCenter does for PITR, however [~brandon.williams] has pointed out this could have some performance impact because of the additional I/O overhead of copying the commitlog segments. Starting in 2.1, we can disable commit log recycling in cassandra.yaml so I thought this would allow me to do PITR without the extra overhead of using cp. However, when I disable commitlog recycling and try to do a PITR, Cassandra blows up when trying to replay the restored commit logs: {code} ERROR 16:56:42 Exception encountered during startup java.lang.IllegalStateException: Cannot safely construct descriptor for segment, as name and header descriptors do not match ((4,1445878452545) vs (4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log at org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive(CommitLogArchiver.java:207) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:116) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335) ~[dse-core-4.8.0.jar:4.8.0] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at com.datastax.bdp.DseModule.main(DseModule.java:75) [dse-core-4.8.0.jar:4.8.0] java.lang.IllegalStateException: Cannot safely construct descriptor for segment, as name and header descriptors do not match ((4,1445878452545) vs (4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log at org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive(CommitLogArchiver.java:207) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:116) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352) at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) at com.datastax.bdp.DseModule.main(DseModule.java:75) Exception encountered during startup: Cannot safely construct descriptor for segment, as name and header descriptors do not match ((4,1445878452545) vs (4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log INFO 16:56:42 DSE shutting down... INFO 16:56:42 All plugins are stopped. ERROR 16:56:42 Exception in thread Thread[Thread-2,5,main] java.lang.AssertionError: null at org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1403) ~[cassandra-all-2.1.9.791.jar:2.1.9.791] at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:196) ~[dse-core-4.8.0.jar:4.8.0] at com.datastax.bdp.server.DseDaemon.preStop(DseDaemon.java:426) ~[dse-core-4.8.0.jar:4.8.0] at com.datastax.bdp.server.DseDaemon.safeStop(DseDaemon.java:436) ~[dse-core-4.8.0.jar:4.8.0] at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:676) ~[dse-core-4.8.0.jar:4.8.0] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_31] {code} For the sake of completeness, I also tested using cp for the archive_command and commitlog recycling disabled, and PITR works as expected, but this of course defeats the point. It would be good to have some guidance on what is supported here. If ln isn't expected to work at all, it shouldn't be documented as an acceptable option for the archive_command in commitlog_archiving.properties. If it should work with commitlog recycling disabled, the bug causing the IllegalStateException needs to be fixed. It would also be good to do some testing and quantify the performance impact of enabling commitlog archiving using cp as the archve_command. I realize there are several different issues described here, so maybe they should be separate JIRAs, but first I wanted to just clarify whether we want to support ln at all, and we can go from there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8720) Provide tools for finding wide row/partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709984#comment-14709984 ] J.B. Langston commented on CASSANDRA-8720: -- Specifically, what I would like to see: a command-line tool that will list partition keys of partitions over a specified number of cells and/or bytes, along with the size of the each partition in cells and bytes. This can be an offline tool if it's easier to implement that way. Provide tools for finding wide row/partition keys - Key: CASSANDRA-8720 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Multiple users have requested some sort of tool to help identify wide row keys. They get into a situation where they know a wide row/partition has been inserted and it's causing problems for them but they have no idea what the row key is in order to remove it. Maintaining the widest row key currently encountered and displaying it in cfstats would be one possible approach. Another would be an offline tool (possibly an enhancement to sstablekeys) to show the number of columns/bytes per key in each sstable. If a tool to aggregate the information at a CF-level could be provided that would be a bonus, but it shouldn't be too hard to write a script wrapper to aggregate them if not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8720) Provide tools for finding wide row/partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710030#comment-14710030 ] J.B. Langston commented on CASSANDRA-8720: -- Looks like we crossed over each other's comments. I think if this offline tool needs to go through the motions of compacting without actually writing out new files or deleting the old ones, then that would be fine. Of course it would require lots of I/O and people would need to be aware of that, but in some cases I think they'd be willing to accept that in order to identify large partitions. Provide tools for finding wide row/partition keys - Key: CASSANDRA-8720 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Multiple users have requested some sort of tool to help identify wide row keys. They get into a situation where they know a wide row/partition has been inserted and it's causing problems for them but they have no idea what the row key is in order to remove it. Maintaining the widest row key currently encountered and displaying it in cfstats would be one possible approach. Another would be an offline tool (possibly an enhancement to sstablekeys) to show the number of columns/bytes per key in each sstable. If a tool to aggregate the information at a CF-level could be provided that would be a bonus, but it shouldn't be too hard to write a script wrapper to aggregate them if not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9585) Make truncate table X an alias for truncate X
J.B. Langston created CASSANDRA-9585: Summary: Make truncate table X an alias for truncate X Key: CASSANDRA-9585 URL: https://issues.apache.org/jira/browse/CASSANDRA-9585 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston CQL syntax is inconsistent: it's drop table X but truncate X. It used to trip me up all the time until I wrapped my brain around this inconsistency and it still triggers a tiny bout of OCD every time I type it. I realize it's too late to change it, but why not have both? truncate table X is also consistent with the syntax in SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9585) Make truncate table X an alias for truncate X
[ https://issues.apache.org/jira/browse/CASSANDRA-9585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-9585: - Priority: Trivial (was: Major) Make truncate table X an alias for truncate X - Key: CASSANDRA-9585 URL: https://issues.apache.org/jira/browse/CASSANDRA-9585 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Trivial CQL syntax is inconsistent: it's drop table X but truncate X. It used to trip me up all the time until I wrapped my brain around this inconsistency and it still triggers a tiny bout of OCD every time I type it. I realize it's too late to change it, but why not have both? truncate table X is also consistent with the syntax in SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9325) cassandra-stress requires keystore but provides no way to configure it
J.B. Langston created CASSANDRA-9325: Summary: cassandra-stress requires keystore but provides no way to configure it Key: CASSANDRA-9325 URL: https://issues.apache.org/jira/browse/CASSANDRA-9325 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Even though it shouldn't be required unless client certificate authentication is enabled, the stress tool is looking for a keystore in the default location of conf/.keystore with the default password of cassandra. There is no command line option to override these defaults so you have to provide a keystore that satisfies the default. It looks for conf/.keystore in the working directory, so you need to create this in the directory you are running cassandra-stress from.It doesn't really matter what's in the keystore; it just needs to exist in the expected location and have a password of cassandra. Since the keystore might be required if client certificate authentication is enabled, we need to add -transport parameters for keystore and keystore-password. These should be optional unless client certificate authentication is enabled on the server. In case it wasn't apparent, this is for Cassandra 2.1 and later's stress tool. I actually had even more problems getting Cassandra 2.0's stress tool working with SSL and gave up on it. We probably don't need to fix 2.0; we can just document that it doesn't support SSL and recommend using 2.1 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it
[ https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-9325: - Description: Even though it shouldn't be required unless client certificate authentication is enabled, the stress tool is looking for a keystore in the default location of conf/.keystore with the default password of cassandra. There is no command line option to override these defaults so you have to provide a keystore that satisfies the default. It looks for conf/.keystore in the working directory, so you need to create this in the directory you are running cassandra-stress from.It doesn't really matter what's in the keystore; it just needs to exist in the expected location and have a password of cassandra. Since the keystore might be required if client certificate authentication is enabled, we need to add -transport parameters for keystore and keystore-password. Ideally, these should be optional and stress shouldn't require the keystore unless client certificate authentication is enabled on the server. In case it wasn't apparent, this is for Cassandra 2.1 and later's stress tool. I actually had even more problems getting Cassandra 2.0's stress tool working with SSL and gave up on it. We probably don't need to fix 2.0; we can just document that it doesn't support SSL and recommend using 2.1 instead. was: Even though it shouldn't be required unless client certificate authentication is enabled, the stress tool is looking for a keystore in the default location of conf/.keystore with the default password of cassandra. There is no command line option to override these defaults so you have to provide a keystore that satisfies the default. It looks for conf/.keystore in the working directory, so you need to create this in the directory you are running cassandra-stress from.It doesn't really matter what's in the keystore; it just needs to exist in the expected location and have a password of cassandra. Since the keystore might be required if client certificate authentication is enabled, we need to add -transport parameters for keystore and keystore-password. These should be optional unless client certificate authentication is enabled on the server. In case it wasn't apparent, this is for Cassandra 2.1 and later's stress tool. I actually had even more problems getting Cassandra 2.0's stress tool working with SSL and gave up on it. We probably don't need to fix 2.0; we can just document that it doesn't support SSL and recommend using 2.1 instead. cassandra-stress requires keystore for SSL but provides no way to configure it -- Key: CASSANDRA-9325 URL: https://issues.apache.org/jira/browse/CASSANDRA-9325 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Even though it shouldn't be required unless client certificate authentication is enabled, the stress tool is looking for a keystore in the default location of conf/.keystore with the default password of cassandra. There is no command line option to override these defaults so you have to provide a keystore that satisfies the default. It looks for conf/.keystore in the working directory, so you need to create this in the directory you are running cassandra-stress from.It doesn't really matter what's in the keystore; it just needs to exist in the expected location and have a password of cassandra. Since the keystore might be required if client certificate authentication is enabled, we need to add -transport parameters for keystore and keystore-password. Ideally, these should be optional and stress shouldn't require the keystore unless client certificate authentication is enabled on the server. In case it wasn't apparent, this is for Cassandra 2.1 and later's stress tool. I actually had even more problems getting Cassandra 2.0's stress tool working with SSL and gave up on it. We probably don't need to fix 2.0; we can just document that it doesn't support SSL and recommend using 2.1 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it
[ https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-9325: - Summary: cassandra-stress requires keystore for SSL but provides no way to configure it (was: cassandra-stress requires keystore but provides no way to configure it) cassandra-stress requires keystore for SSL but provides no way to configure it -- Key: CASSANDRA-9325 URL: https://issues.apache.org/jira/browse/CASSANDRA-9325 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Even though it shouldn't be required unless client certificate authentication is enabled, the stress tool is looking for a keystore in the default location of conf/.keystore with the default password of cassandra. There is no command line option to override these defaults so you have to provide a keystore that satisfies the default. It looks for conf/.keystore in the working directory, so you need to create this in the directory you are running cassandra-stress from.It doesn't really matter what's in the keystore; it just needs to exist in the expected location and have a password of cassandra. Since the keystore might be required if client certificate authentication is enabled, we need to add -transport parameters for keystore and keystore-password. These should be optional unless client certificate authentication is enabled on the server. In case it wasn't apparent, this is for Cassandra 2.1 and later's stress tool. I actually had even more problems getting Cassandra 2.0's stress tool working with SSL and gave up on it. We probably don't need to fix 2.0; we can just document that it doesn't support SSL and recommend using 2.1 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9308) Decouple streaming from secondary index rebuild
[ https://issues.apache.org/jira/browse/CASSANDRA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529067#comment-14529067 ] J.B. Langston edited comment on CASSANDRA-9308 at 5/5/15 7:03 PM: -- 3.0 is a long way off for a lot of production users though. And for the scenario Brandon brings up, I think logging a warning for the user to run repair may be preferable to forcing them to go through the whole bootstrap and reindex process again, which in some cases can take days. was (Author: jblangs...@datastax.com): 3.0 is a long way off for a lot of production users though. Decouple streaming from secondary index rebuild --- Key: CASSANDRA-9308 URL: https://issues.apache.org/jira/browse/CASSANDRA-9308 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Currently, streaming is not considered complete until any secondary indexes on the table being streamed have been rebuilt. If any source replicas go down after streaming completes, but before the secondary indexes have been rebuilt, it will cause the bootstrap to fail, requiring the user to go through the whole bootstrap process again. Ideally, the two should be decoupled so that once the streaming is complete, the new node can complete the secondary index rebuild and successfully boostrap regardless of the status of the source replicas at that point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9308) Decouple streaming from secondary index rebuild
[ https://issues.apache.org/jira/browse/CASSANDRA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529067#comment-14529067 ] J.B. Langston commented on CASSANDRA-9308: -- 3.0 is a long way off for a lot of production users though. Decouple streaming from secondary index rebuild --- Key: CASSANDRA-9308 URL: https://issues.apache.org/jira/browse/CASSANDRA-9308 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Currently, streaming is not considered complete until any secondary indexes on the table being streamed have been rebuilt. If any source replicas go down after streaming completes, but before the secondary indexes have been rebuilt, it will cause the bootstrap to fail, requiring the user to go through the whole bootstrap process again. Ideally, the two should be decoupled so that once the streaming is complete, the new node can complete the secondary index rebuild and successfully boostrap regardless of the status of the source replicas at that point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9308) Decouple streaming from secondary index rebuild
J.B. Langston created CASSANDRA-9308: Summary: Decouple streaming from secondary index rebuild Key: CASSANDRA-9308 URL: https://issues.apache.org/jira/browse/CASSANDRA-9308 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Currently, streaming is not considered complete until any secondary indexes on the table being streamed have been rebuilt. If any source replicas go down after streaming completes, but before the secondary indexes have been rebuilt, it will cause the bootstrap to fail, requiring the user to go through the whole bootstrap process again. Ideally, the two should be decoupled so that once the streaming is complete, the new node can complete the secondary index rebuild and successfully boostrap regardless of the status of the source replicas at that point. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320585#comment-14320585 ] J.B. Langston commented on CASSANDRA-8730: -- Looks like these changes are fairly isolated to two classes... would this be feasible to backport to 2.0? Optimize UUIDType comparisons - Key: CASSANDRA-8730 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Assignee: Benedict Fix For: 2.1.4 Compaction is slow on tables using compound keys containing UUIDs due to being CPU bound by key comparison. [~benedict] said he sees some easy optimizations that could be made for UUID comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312474#comment-14312474 ] J.B. Langston commented on CASSANDRA-8730: -- 12.88MB/sec from the latest code. Optimize UUIDType comparisons - Key: CASSANDRA-8730 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Assignee: Benedict Fix For: 2.1.4 Compaction is slow on tables using compound keys containing UUIDs due to being CPU bound by key comparison. [~benedict] said he sees some easy optimizations that could be made for UUID comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312266#comment-14312266 ] J.B. Langston commented on CASSANDRA-8730: -- [~iamaleksey] This is the schema I am testing against. It uses uuid and timestamp (not timeuuid): {code} CREATE TABLE x ( a bigint, b bigint, c timestamp, d uuid, e text, f text, g text, h float, PRIMARY KEY ((a, b), c, d) ) WITH CLUSTERING ORDER BY (ts DESC, uuid DESC) AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.10 AND gc_grace_seconds=0 AND index_interval=128 AND read_repair_chance=0.00 AND replicate_on_write='true' AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND speculative_retry='99.0PERCENTILE' AND memtable_flush_period_in_ms=0 AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'LZ4Compressor'}; {code} Optimize UUIDType comparisons - Key: CASSANDRA-8730 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Assignee: Benedict Fix For: 2.1.4 Compaction is slow on tables using compound keys containing UUIDs due to being CPU bound by key comparison. [~benedict] said he sees some easy optimizations that could be made for UUID comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8730) Optimize UUIDType comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309901#comment-14309901 ] J.B. Langston edited comment on CASSANDRA-8730 at 2/6/15 8:59 PM: -- Looks like a bit of improvement: 12.63MB/sec vs 10.19MB/sec. Looks like it threw away more data this time. I guess some tombstones passed gc grace since I last tested. Therefore, I'm not sure how apples-to-apples the comparison is, so I'm going to try again while setting my clock back to the date when I ran it before. Before: {code} INFO 15:19:05 Compacted 4 sstables to [./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,]. 9,183,829,489 bytes to 9,180,536,394 (~99% of original) in 901,172ms = 9.715395MB/s. 311,495 total partitions merged to 253,490. Partition merge counts were {1:195485, 2:58005, } {code} After: {code} INFO 20:47:24 Compacted 4 sstables to [./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,]. 8,152,562,772 bytes to 4,659,100,313 (~57% of original) in 615,577ms = 7.218048MB/s. 311,495 total partitions merged to 80,012. Partition merge counts were {1:195485, 2:58005, } {code} was (Author: jblangs...@datastax.com): Looks like a bit of improvement: 12.63MB/sec vs 10.19MB/sec. Looks like it threw away more data this time. I guess some tombstones passed gc grace since I last tested. Therefore, I'm not sure how apples-to-apples the comparison is, so I'm going to try again while setting my clock back to the date when I ran it before. Before: {code} INFO 15:19:05 Compacted 4 sstables to [./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,]. 9,183,829,489 bytes to 9,180,536,394 (~99% of original) in 901,172ms = 9.715395MB/s. 311,495 total partitions merged to 253,490. Partition merge counts were {1:195485, 2:58005, } {code} After: {code} INFO 20:47:24 Completed flushing /Users/jblangston/repos/cassandra/bin/./../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-48-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1423254980101, position=758851) INFO 20:47:24 Compacted 4 sstables to [./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,]. 8,152,562,772 bytes to 4,659,100,313 (~57% of original) in 615,577ms = 7.218048MB/s. 311,495 total partitions merged to 80,012. Partition merge counts were {1:195485, 2:58005, } {code} Optimize UUIDType comparisons - Key: CASSANDRA-8730 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Assignee: Benedict Fix For: 2.1.4 Compaction is slow on tables using compound keys containing UUIDs due to being CPU bound by key comparison. [~benedict] said he sees some easy optimizations that could be made for UUID comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309901#comment-14309901 ] J.B. Langston commented on CASSANDRA-8730: -- Looks like a bit of improvement: 12.63MB/sec vs 10.19MB/sec. Looks like it threw away more data this time. I guess some tombstones passed gc grace since I last tested. Therefore, I'm not sure how apples-to-apples the comparison is, so I'm going to try again while setting my clock back to the date when I ran it before. Before: {code} INFO 15:19:05 Compacted 4 sstables to [./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,]. 9,183,829,489 bytes to 9,180,536,394 (~99% of original) in 901,172ms = 9.715395MB/s. 311,495 total partitions merged to 253,490. Partition merge counts were {1:195485, 2:58005, } {code} After: {code} INFO 20:47:24 Completed flushing /Users/jblangston/repos/cassandra/bin/./../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-48-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1423254980101, position=758851) INFO 20:47:24 Compacted 4 sstables to [./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,]. 8,152,562,772 bytes to 4,659,100,313 (~57% of original) in 615,577ms = 7.218048MB/s. 311,495 total partitions merged to 80,012. Partition merge counts were {1:195485, 2:58005, } {code} Optimize UUIDType comparisons - Key: CASSANDRA-8730 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Assignee: Benedict Fix For: 2.1.4 Compaction is slow on tables using compound keys containing UUIDs due to being CPU bound by key comparison. [~benedict] said he sees some easy optimizations that could be made for UUID comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons
[ https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309953#comment-14309953 ] J.B. Langston commented on CASSANDRA-8730: -- Hmm, setting my clock back didn't help. I still got the same results as before. I'm not sure why it did not compact away almost half the data before. Optimize UUIDType comparisons - Key: CASSANDRA-8730 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Assignee: Benedict Fix For: 2.1.4 Compaction is slow on tables using compound keys containing UUIDs due to being CPU bound by key comparison. [~benedict] said he sees some easy optimizations that could be made for UUID comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8559) OOM caused by large tombstone warning.
[ https://issues.apache.org/jira/browse/CASSANDRA-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305113#comment-14305113 ] J.B. Langston commented on CASSANDRA-8559: -- I have seen another user hit this. In this case, the log message was triggered by a bad query and an even worse data model, but it's too easy for new users to Cassandra to stumble across this. If someone shoots themselves in the foot, we should try not to blow their whole leg off. So I'm +1 on having a limit to the amount of information we log. OOM caused by large tombstone warning. -- Key: CASSANDRA-8559 URL: https://issues.apache.org/jira/browse/CASSANDRA-8559 Project: Cassandra Issue Type: Bug Components: Core Environment: 2.0.11 / 2.1 Reporter: Dominic Letz Labels: tombstone Fix For: 2.0.13 Attachments: Selection_048.png, cassandra-2.0.11-8559.txt, stacktrace.log When running with high amount of tombstones the error message generation from CASSANDRA-6117 can lead to out of memory situation with the default setting. Attached a heapdump viewed in visualvm showing how this construct created two 777mb strings to print the error message for a read query and then crashed OOM. {code} if (respectTombstoneThresholds() columnCounter.ignored() DatabaseDescriptor.getTombstoneWarnThreshold()) { StringBuilder sb = new StringBuilder(); CellNameType type = container.metadata().comparator; for (ColumnSlice sl : slices) { assert sl != null; sb.append('['); sb.append(type.getString(sl.start)); sb.append('-'); sb.append(type.getString(sl.finish)); sb.append(']'); } logger.warn(Read {} live and {} tombstoned cells in {}.{} (see tombstone_warn_threshold). {} columns was requested, slices={}, delInfo={}, columnCounter.live(), columnCounter.ignored(), container.metadata().ksName, container.metadata().cfName, count, sb, container.deletionInfo()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8730) Optimize UUIDType comparisons
J.B. Langston created CASSANDRA-8730: Summary: Optimize UUIDType comparisons Key: CASSANDRA-8730 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Assignee: Benedict Compaction is slow on tables using compound keys containing UUIDs due to being CPU bound by key comparison. [~benedict] said he sees some easy optimizations that could be made for UUID comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8720) Provide tools for finding wide row/partition keys
J.B. Langston created CASSANDRA-8720: Summary: Provide tools for finding wide row/partition keys Key: CASSANDRA-8720 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Multiple users have requested some sort of tool to help identify wide row keys. They get into a situation where they know a wide row/partition has been inserted and it's causing problems for them but they have no idea what the row key is in order to remove it. Maintaining the widest row key currently encountered and displaying it in cfstats would be one possible approach. Another would be an offline tool (possibly an enhancement to sstablekeys) to show the number of columns/bytes per key in each sstable. If a tool to aggregate the information at a CF-level could be provided that would be a bonus, but it shouldn't be too hard to write a script wrapper to aggregate them if not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8720) Provide tools for finding wide row/partition keys
[ https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302211#comment-14302211 ] J.B. Langston commented on CASSANDRA-8720: -- Better than nothing, but logs can get rotated, deleted, etc. and it would good to have a way to get this information on demand without having to wait for a compaction to occur. Provide tools for finding wide row/partition keys - Key: CASSANDRA-8720 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Multiple users have requested some sort of tool to help identify wide row keys. They get into a situation where they know a wide row/partition has been inserted and it's causing problems for them but they have no idea what the row key is in order to remove it. Maintaining the widest row key currently encountered and displaying it in cfstats would be one possible approach. Another would be an offline tool (possibly an enhancement to sstablekeys) to show the number of columns/bytes per key in each sstable. If a tool to aggregate the information at a CF-level could be provided that would be a bonus, but it shouldn't be too hard to write a script wrapper to aggregate them if not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8606) sstablesplit does not remove original sstable
[ https://issues.apache.org/jira/browse/CASSANDRA-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298760#comment-14298760 ] J.B. Langston commented on CASSANDRA-8606: -- This also affects offline sstableupgrade. I'd say this should be a higher priority since people could fill up their disks during an upgrade. sstablesplit does not remove original sstable - Key: CASSANDRA-8606 URL: https://issues.apache.org/jira/browse/CASSANDRA-8606 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Assignee: Marcus Eriksson Priority: Minor Fix For: 2.1.3 sstablesplit leaves the original file on disk, it should not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8615) Create -D flag to disable speculative retry by default
J.B. Langston created CASSANDRA-8615: Summary: Create -D flag to disable speculative retry by default Key: CASSANDRA-8615 URL: https://issues.apache.org/jira/browse/CASSANDRA-8615 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Some clusters have shown increased latency with speculative retry enabled. Speculative retry is enabled by default when upgrading from 1.2 to 2.0, and for large clusters it can take a long time to complete a rolling upgrade, during which time speculative retry will be enabled. Therefore it would be helpful to have a -D flag that will disable it by default during an upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8448) Comparison method violates its general contract in AbstractEndpointSnitch
J.B. Langston created CASSANDRA-8448: Summary: Comparison method violates its general contract in AbstractEndpointSnitch Key: CASSANDRA-8448 URL: https://issues.apache.org/jira/browse/CASSANDRA-8448 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Seen in both 1.2 and 2.0. The error is occurring here: https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/locator/AbstractEndpointSnitch.java#L49 ERROR [Thrift:9] 2014-12-04 20:12:28,732 CustomTThreadPoolServer.java (line 219) Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: Comparison method violates its general contract! at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2199) at com.google.common.cache.LocalCache.get(LocalCache.java:3932) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806) at org.apache.cassandra.service.ClientState.authorize(ClientState.java:352) at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:224) at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:218) at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:202) at org.apache.cassandra.thrift.CassandraServer.createMutationList(CassandraServer.java:822) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:954) at com.datastax.bdp.server.DseServer.batch_mutate(DseServer.java:576) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3922) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3906) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:201) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeHi(TimSort.java:868) at java.util.TimSort.mergeAt(TimSort.java:485) at java.util.TimSort.mergeCollapse(TimSort.java:410) at java.util.TimSort.sort(TimSort.java:214) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at org.apache.cassandra.locator.AbstractEndpointSnitch.sortByProximity(AbstractEndpointSnitch.java:49) at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithScore(DynamicEndpointSnitch.java:157) at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithBadness(DynamicEndpointSnitch.java:186) at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(DynamicEndpointSnitch.java:151) at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1408) at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1402) at org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:148) at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1223) at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1165) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:255) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:225) at org.apache.cassandra.auth.Auth.selectUser(Auth.java:243) at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84) at org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50) at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:69) at org.apache.cassandra.service.ClientState$1.load(ClientState.java:338) at org.apache.cassandra.service.ClientState$1.load(ClientState.java:335) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193) ...
[jira] [Updated] (CASSANDRA-8448) Comparison method violates its general contract in AbstractEndpointSnitch
[ https://issues.apache.org/jira/browse/CASSANDRA-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-8448: - Description: Seen in both 1.2 and 2.0. The error is occurring here: https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/locator/AbstractEndpointSnitch.java#L49 {code} ERROR [Thrift:9] 2014-12-04 20:12:28,732 CustomTThreadPoolServer.java (line 219) Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: Comparison method violates its general contract! at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2199) at com.google.common.cache.LocalCache.get(LocalCache.java:3932) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806) at org.apache.cassandra.service.ClientState.authorize(ClientState.java:352) at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:224) at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:218) at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:202) at org.apache.cassandra.thrift.CassandraServer.createMutationList(CassandraServer.java:822) at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:954) at com.datastax.bdp.server.DseServer.batch_mutate(DseServer.java:576) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3922) at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3906) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:201) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeHi(TimSort.java:868) at java.util.TimSort.mergeAt(TimSort.java:485) at java.util.TimSort.mergeCollapse(TimSort.java:410) at java.util.TimSort.sort(TimSort.java:214) at java.util.TimSort.sort(TimSort.java:173) at java.util.Arrays.sort(Arrays.java:659) at java.util.Collections.sort(Collections.java:217) at org.apache.cassandra.locator.AbstractEndpointSnitch.sortByProximity(AbstractEndpointSnitch.java:49) at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithScore(DynamicEndpointSnitch.java:157) at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithBadness(DynamicEndpointSnitch.java:186) at org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(DynamicEndpointSnitch.java:151) at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1408) at org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1402) at org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:148) at org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1223) at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1165) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:255) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:225) at org.apache.cassandra.auth.Auth.selectUser(Auth.java:243) at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84) at org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50) at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:69) at org.apache.cassandra.service.ClientState$1.load(ClientState.java:338) at org.apache.cassandra.service.ClientState$1.load(ClientState.java:335) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193) ... 18 more {code} Workaround: Setting -Djava.util.Arrays.useLegacyMergeSort=true causes the error to go away. was: Seen in both 1.2 and 2.0. The
[jira] [Created] (CASSANDRA-8329) LeveledCompactionStrategy should split large files across data directories when compacting
J.B. Langston created CASSANDRA-8329: Summary: LeveledCompactionStrategy should split large files across data directories when compacting Key: CASSANDRA-8329 URL: https://issues.apache.org/jira/browse/CASSANDRA-8329 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Because we fall back to STCS for L0 when LCS gets behind, the sstables in L0 can get quite large during sustained periods of heavy writes. This can result in large imbalances between data volumes when using JBOD support. Eventually these large files get broken up as L0 sstables are moved up into higher levels; however, because LCS only chooses a single volume on which to write all of the sstables created during a single compaction, the imbalance is persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7386) JBOD threshold to prevent unbalanced disk utilization
[ https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208119#comment-14208119 ] J.B. Langston commented on CASSANDRA-7386: -- I've seen a lot of users hitting this issue lately, so the sooner we can get a patch the better. This also needs to be back ported to 2.0 if at all possible. In several cases I've seen severe imbalances like the ones described where there are some drives completely full and others at 10-20% utilization. JBOD threshold to prevent unbalanced disk utilization - Key: CASSANDRA-7386 URL: https://issues.apache.org/jira/browse/CASSANDRA-7386 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Chris Lohfink Assignee: Alan Boudreault Priority: Minor Fix For: 2.1.3 Attachments: 7386-v1.patch, 7386v2.diff, Mappe1.ods, mean-writevalue-7disks.png, patch_2_1_branch_proto.diff, sstable-count-second-run.png Currently the pick the disks are picked first by number of current tasks, then by free space. This helps with performance but can lead to large differences in utilization in some (unlikely but possible) scenarios. Ive seen 55% to 10% and heard reports of 90% to 10% on IRC. With both LCS and STCS (although my suspicion is that STCS makes it worse since harder to be balanced). I purpose the algorithm change a little to have some maximum range of utilization where it will pick by free space over load (acknowledging it can be slower). So if a disk A is 30% full and disk B is 5% full it will never pick A over B until it balances out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7386) JBOD threshold to prevent unbalanced disk utilization
[ https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208119#comment-14208119 ] J.B. Langston edited comment on CASSANDRA-7386 at 11/12/14 3:31 PM: I've seen a lot of users hitting this issue lately, so the sooner we can get a patch the better. This also needs to be back ported to 2.0 if at all possible. In several cases I've seen severe imbalances like the ones described where there are some drives completely full and others at 10-20% utilization. Here are a couple of stack traces. It happens both during flushes and compactions. {code} ERROR [FlushWriter:6241] 2014-09-07 08:27:35,298 CassandraDaemon.java (line 198) Exception in thread Thread[FlushWriter:6241,5,main] FSWriteError in /data6/system/compactions_in_progress/system-compactions_in_progress-tmp-jb-8222-Index.db at org.apache.cassandra.io.util.SequentialWriter.flushData(SequentialWriter.java:267) at org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:219) at org.apache.cassandra.io.util.SequentialWriter.syncInternal(SequentialWriter.java:191) at org.apache.cassandra.io.util.SequentialWriter.close(SequentialWriter.java:381) at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:481) at org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212) at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301) at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:417) at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: No space left on device at java.io.RandomAccessFile.writeBytes0(Native Method) at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520) at java.io.RandomAccessFile.write(RandomAccessFile.java:550) at org.apache.cassandra.io.util.SequentialWriter.flushData(SequentialWriter.java:263) ... 13 more ERROR [CompactionExecutor:9166] 2014-09-06 16:09:14,786 CassandraDaemon.java (line 198) Exception in thread Thread[CompactionExecutor:9166,1,main] FSWriteError in /data6/keyspace_1/data/keyspace_1-data-tmp-jb-13599-Filter.db at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475) at org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212) at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301) at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:209) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.write(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:295) at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) at org.apache.cassandra.utils.BloomFilterSerializer.serialize(BloomFilterSerializer.java:34) at org.apache.cassandra.utils.Murmur3BloomFilter$Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44) at org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:41) at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:468) ... 13 more {code} was (Author: jblangs...@datastax.com): I've seen a lot of users hitting this issue lately, so the sooner we can get a patch the better. This also needs to be back ported to 2.0 if at all possible. In several cases
[jira] [Created] (CASSANDRA-8253) cassandra-stress 2.1 doesn't support LOCAL_ONE
J.B. Langston created CASSANDRA-8253: Summary: cassandra-stress 2.1 doesn't support LOCAL_ONE Key: CASSANDRA-8253 URL: https://issues.apache.org/jira/browse/CASSANDRA-8253 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Looks like a simple oversight in argument parsing: ➜ bin ./cassandra-stress write cl=LOCAL_ONE Invalid value LOCAL_ONE; must match pattern ONE|QUORUM|LOCAL_QUORUM|EACH_QUORUM|ALL|ANY Also, CASSANDRA-7077 argues that it should be using LOCAL_ONE by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8253) cassandra-stress 2.1 doesn't support LOCAL_ONE
[ https://issues.apache.org/jira/browse/CASSANDRA-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-8253: - Reproduced In: 2.1.1 cassandra-stress 2.1 doesn't support LOCAL_ONE -- Key: CASSANDRA-8253 URL: https://issues.apache.org/jira/browse/CASSANDRA-8253 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Looks like a simple oversight in argument parsing: ➜ bin ./cassandra-stress write cl=LOCAL_ONE Invalid value LOCAL_ONE; must match pattern ONE|QUORUM|LOCAL_QUORUM|EACH_QUORUM|ALL|ANY Also, CASSANDRA-7077 argues that it should be using LOCAL_ONE by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175588#comment-14175588 ] J.B. Langston commented on CASSANDRA-8084: -- I don't think sstableloader is working right. Here is the output for sstableloader itself: {code} automaton@ip-172-31-7-50:~/Keyspace1/Standard1$ sstableloader -d localhost `pwd` Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-320-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-326-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-325-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-283-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-267-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-211-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-301-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-316-Data.db to [/54.183.192.248, /54.215.139.161, /54.165.222.3, /54.172.118.222] Streaming session ID: ac5dd440-5645-11e4-a813-3d13c3d3c540 progress: [/54.172.118.222 8/8 (100%)] [/54.183.192.248 8/8 (100%)] [/54.165.222.3 8/8 (100%)] [/54.215.139.161 8/8 (100%)] [total: 100% - 2147483647MB/s (avg: 30MB/s) {code} Here is netstats on the node where it is running: {code} Responses n/a 0812 automaton@ip-172-31-7-50:~$ nodetool netstats Mode: NORMAL Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540 /172.31.7.50 (using /54.183.192.248) Receiving 8 files, 1059673728 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-10-Data.db 56468194/164372226 bytes(34%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-4-Data.db 27800/27800 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-3-Data.db 50674396/50674396 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-5-Data.db 68597334/68597334 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-7-Data.db 139068110/139068110 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-6-Data.db 12682638/12682638 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-9-Data.db 27800/27800 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-8-Data.db 68279024/68279024 bytes(100%) received from /172.31.7.50 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 0 0 Responses n/a 0970 {code} Here's netstats on the other node in the same DC: {code} automaton@ip-172-31-40-169:~$ nodetool netstats Mode: NORMAL Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540 /172.31.7.50 (using /54.183.192.248) Receiving 8 files, 1059673728 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-239-Data.db 68279024/68279024 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-245-Data.db 27800/27800 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-246-Data.db 43078602/50674396 bytes(85%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-240-Data.db 27800/27800 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-241-Data.db 12682638/12682638 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-243-Data.db 139068110/139068110 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-242-Data.db 164372226/164372226 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-244-Data.db 68597334/68597334 bytes(100%) received from /172.31.7.50 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 0 249589 Responses
[jira] [Comment Edited] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175588#comment-14175588 ] J.B. Langston edited comment on CASSANDRA-8084 at 10/17/14 9:43 PM: I don't think sstableloader is working right. Here is the output for sstableloader itself: {code} automaton@ip-172-31-7-50:~/Keyspace1/Standard1$ sstableloader -d localhost `pwd` Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-320-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-326-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-325-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-283-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-267-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-211-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-301-Data.db /home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-316-Data.db to [/54.183.192.248, /54.215.139.161, /54.165.222.3, /54.172.118.222] Streaming session ID: ac5dd440-5645-11e4-a813-3d13c3d3c540 progress: [/54.172.118.222 8/8 (100%)] [/54.183.192.248 8/8 (100%)] [/54.165.222.3 8/8 (100%)] [/54.215.139.161 8/8 (100%)] [total: 100% - 2147483647MB/s (avg: 30MB/s) {code} Here is netstats on the node where it is running (54.183.192.248): {code} Responses n/a 0812 automaton@ip-172-31-7-50:~$ nodetool netstats Mode: NORMAL Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540 /172.31.7.50 (using /54.183.192.248) Receiving 8 files, 1059673728 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-10-Data.db 56468194/164372226 bytes(34%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-4-Data.db 27800/27800 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-3-Data.db 50674396/50674396 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-5-Data.db 68597334/68597334 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-7-Data.db 139068110/139068110 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-6-Data.db 12682638/12682638 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-9-Data.db 27800/27800 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-8-Data.db 68279024/68279024 bytes(100%) received from /172.31.7.50 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 0 0 Responses n/a 0970 {code} Here's netstats on the other node in the same DC (54.215.139.161): {code} automaton@ip-172-31-40-169:~$ nodetool netstats Mode: NORMAL Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540 /172.31.7.50 (using /54.183.192.248) Receiving 8 files, 1059673728 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-239-Data.db 68279024/68279024 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-245-Data.db 27800/27800 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-246-Data.db 43078602/50674396 bytes(85%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-240-Data.db 27800/27800 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-241-Data.db 12682638/12682638 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-243-Data.db 139068110/139068110 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-242-Data.db 164372226/164372226 bytes(100%) received from /172.31.7.50 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-244-Data.db 68597334/68597334 bytes(100%) received from /172.31.7.50 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed
[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173821#comment-14173821 ] J.B. Langston commented on CASSANDRA-8084: -- Test v3; nodetool netstats looks good as well as actual ports used via netstat -an. In the logs, I only see the internal IP mentioned in one place. Is this the INFO line you were talking about? {code} INFO [STREAM-INIT-/172.31.5.143:43953] 2014-10-16 14:36:11,292 StreamResultFuture.java (line 121) [Stream #c5fbdb90-5541-11e4-8eb3-c9fac3589773] Received streaming plan for Repair INFO [STREAM-INIT-/172.31.5.143:43994] 2014-10-16 14:38:16,120 StreamResultFuture.java (line 121) [Stream #10424ae0-5542-11e4-8eb3-c9fac3589773] Received streaming plan for Repair {code} GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair - Key: CASSANDRA-8084 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084 Project: Cassandra Issue Type: Bug Components: Config Environment: Tested this in GCE and AWS clusters. Created multi region and multi dc cluster once in GCE and once in AWS and ran into the same problem. DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION=Ubuntu 12.04.3 LTS NAME=Ubuntu VERSION=12.04.3 LTS, Precise Pangolin ID=ubuntu ID_LIKE=debian PRETTY_NAME=Ubuntu precise (12.04.3 LTS) VERSION_ID=12.04 Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also latest DSE version which is 4.5 and which corresponds to 2.0.8.39. Reporter: Jana Assignee: Yuki Morishita Labels: features Fix For: 2.0.11 Attachments: 8084-2.0-v2.txt, 8084-2.0-v3.txt, 8084-2.0.txt Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) used the PRIVATE IPS for communication between INTRA-DC nodes in my multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran nodetool repair -local. It works fine during regular reads. Here are the various cluster flavors I tried and failed- AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties file. AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties file. GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties file. GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties file. I am expecting with the above setup all of my nodes in a given DC all communicate via private ips since the cloud providers dont charge us for using the private ips and they charge for using public ips. But they can use PUBLIC IPs for INTER-DC communications which is working as expected. Here is a snippet from my log files when I ran the nodetool repair -local - Node responding to 'node running repair' INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree to /54.172.118.222 for system_traces/sessions INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree to /54.172.118.222 for system_traces/events Node running repair - INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for events from /54.172.118.222 Note: The IPs its communicating is all PUBLIC Ips and it should have used the PRIVATE IPs starting with 172.x.x.x YAML file values : The listen address is set to: PRIVATE IP The broadcast address is set to: PUBLIC IP The SEEDs address is set to: PUBLIC IPs from both DCs The SNITCHES tried: GPFS and EC2MultiRegionSnitch RACK-DC: Had prefer_local set to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173927#comment-14173927 ] J.B. Langston commented on CASSANDRA-8084: -- Confirmed the log messages: {code} INFO [StreamConnectionEstablisher:1] 2014-10-16 14:36:11,277 StreamSession.java (line 218) [Stream #c5fbdb90-5541-11e4-8eb3-c9fac3589773] Starting streaming to /54.183.192.248 through /172.31.7.50 INFO [StreamConnectionEstablisher:2] 2014-10-16 14:38:16,083 StreamSession.java (line 218) [Stream #10424ae0-5542-11e4-8eb3-c9fac3589773] Starting streaming to /54.183.192.248 through /172.31.7.50 INFO [StreamConnectionEstablisher:1] 2014-10-16 14:39:53,600 StreamSession.java (line 218) [Stream #4a9133f0-5542-11e4-8eb3-c9fac3589773] Starting streaming to /54.183.192.248 through /172.31.7.50 INFO [StreamConnectionEstablisher:2] 2014-10-16 14:40:50,476 StreamSession.java (line 218) [Stream #6c5b4200-5542-11e4-8eb3-c9fac3589773] Starting streaming to /54.183.192.248 through /172.31.7.50 {code} Everything looks like it's working as expected. I haven't tested sstableloader as suggested by [~jjordan]. GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair - Key: CASSANDRA-8084 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084 Project: Cassandra Issue Type: Bug Components: Config Environment: Tested this in GCE and AWS clusters. Created multi region and multi dc cluster once in GCE and once in AWS and ran into the same problem. DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION=Ubuntu 12.04.3 LTS NAME=Ubuntu VERSION=12.04.3 LTS, Precise Pangolin ID=ubuntu ID_LIKE=debian PRETTY_NAME=Ubuntu precise (12.04.3 LTS) VERSION_ID=12.04 Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also latest DSE version which is 4.5 and which corresponds to 2.0.8.39. Reporter: Jana Assignee: Yuki Morishita Labels: features Fix For: 2.0.11 Attachments: 8084-2.0-v2.txt, 8084-2.0-v3.txt, 8084-2.0.txt Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) used the PRIVATE IPS for communication between INTRA-DC nodes in my multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran nodetool repair -local. It works fine during regular reads. Here are the various cluster flavors I tried and failed- AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties file. AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties file. GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties file. GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties file. I am expecting with the above setup all of my nodes in a given DC all communicate via private ips since the cloud providers dont charge us for using the private ips and they charge for using public ips. But they can use PUBLIC IPs for INTER-DC communications which is working as expected. Here is a snippet from my log files when I ran the nodetool repair -local - Node responding to 'node running repair' INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree to /54.172.118.222 for system_traces/sessions INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree to /54.172.118.222 for system_traces/events Node running repair - INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for events from /54.172.118.222 Note: The IPs its communicating is all PUBLIC Ips and it should have used the PRIVATE IPs starting with 172.x.x.x YAML file values : The listen address is set to: PRIVATE IP The broadcast address is set to: PUBLIC IP The SEEDs address is set to: PUBLIC IPs from both DCs The SNITCHES tried: GPFS and EC2MultiRegionSnitch RACK-DC: Had prefer_local set to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169555#comment-14169555 ] J.B. Langston commented on CASSANDRA-8084: -- I think it is most important to show the private IP in netstats, and my vote would be to show both the public and private IP in that case. On the logs, I can see that would be more work to fix and I don't necessarily think it needs to show the private IP everywhere, but maybe on the messages that specifically concern streaming, we could show both. GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair - Key: CASSANDRA-8084 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084 Project: Cassandra Issue Type: Bug Components: Config Environment: Tested this in GCE and AWS clusters. Created multi region and multi dc cluster once in GCE and once in AWS and ran into the same problem. DISTRIB_ID=Ubuntu DISTRIB_RELEASE=12.04 DISTRIB_CODENAME=precise DISTRIB_DESCRIPTION=Ubuntu 12.04.3 LTS NAME=Ubuntu VERSION=12.04.3 LTS, Precise Pangolin ID=ubuntu ID_LIKE=debian PRETTY_NAME=Ubuntu precise (12.04.3 LTS) VERSION_ID=12.04 Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also latest DSE version which is 4.5 and which corresponds to 2.0.8.39. Reporter: Jana Assignee: Yuki Morishita Labels: features Fix For: 2.0.11 Attachments: 8084-2.0.txt Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) used the PRIVATE IPS for communication between INTRA-DC nodes in my multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran nodetool repair -local. It works fine during regular reads. Here are the various cluster flavors I tried and failed- AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties file. AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties file. GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + (Prefer_local=true) in rackdc-properties file. GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in rackdc-properties file. I am expecting with the above setup all of my nodes in a given DC all communicate via private ips since the cloud providers dont charge us for using the private ips and they charge for using public ips. But they can use PUBLIC IPs for INTER-DC communications which is working as expected. Here is a snippet from my log files when I ran the nodetool repair -local - Node responding to 'node running repair' INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree to /54.172.118.222 for system_traces/sessions INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree to /54.172.118.222 for system_traces/events Node running repair - INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for events from /54.172.118.222 Note: The IPs its communicating is all PUBLIC Ips and it should have used the PRIVATE IPs starting with 172.x.x.x YAML file values : The listen address is set to: PRIVATE IP The broadcast address is set to: PUBLIC IP The SEEDs address is set to: PUBLIC IPs from both DCs The SNITCHES tried: GPFS and EC2MultiRegionSnitch RACK-DC: Had prefer_local set to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166908#comment-14166908 ] J.B. Langston commented on CASSANDRA-8084: -- I tested and it appears to work. Here is the cluster I am testing with: {code} Datacenter: DC1_EAST Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.165.222.3711.26 MB 1 25.0% dd449706-2059-4b65-ae98-0012d2cf8f67 rack1 UN 54.172.118.222 561.14 MB 1 25.0% 18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef rack1 Datacenter: DC1_WEST Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.183.192.248 721.2 MB 1 25.0% c4dd37f1-d937-4876-8669-f0b01a3942db rack1 UN 54.215.139.161 909.26 MB 1 25.0% 16499349-8cef-4a62-a99c-ab145cb70921 rack1 I wasn't sure initially because the logs and `nodetool netstats` still show the broadcast address. You can see here that nodetool netstats, when run on 54.215.139.161, shows we are streaming from 54.183.192.248 (the broadcast address of the other node in the same DC): {code} Mode: NORMAL Repair dbc7ea40-5082-11e4-8190-c9fac3589773 /54.183.192.248 Receiving 9 files, 229856794 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db 58878176/58878176 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db 97856/97856 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db 69407704/69407704 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db 3203116/3203116 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db 12545306/12545306 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db 69407704/69407704 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db 1536228/1536228 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db 12589230/12589230 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db 2191474/2191474 bytes(100%) received from /54.183.192.248 Sending 5 files, 109645980 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db 14323672/14323672 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db 20581730/20581730 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db 3161694/3161694 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db 69407704/69407704 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db 2171180/2171180 bytes(100%) sent to /54.183.192.248 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 01495191 Responses n/a 0 714928 {code} However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we are only connecting to the local node on its listen address (172.31.7.50): {code} tcp0 0 172.31.5.143:7000 0.0.0.0:* LISTEN 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34936 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34937 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34938 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34936 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34937 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34938 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.7.50:52125 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.7.50:52126 ESTABLISHED 17279/java tcp0 0 172.31.5.143:57502 172.31.7.50:7000
[jira] [Comment Edited] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166908#comment-14166908 ] J.B. Langston edited comment on CASSANDRA-8084 at 10/10/14 2:25 PM: I tested and it appears to work. Here is the cluster I am testing with: {code} Datacenter: DC1_EAST Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.165.222.3711.26 MB 1 25.0% dd449706-2059-4b65-ae98-0012d2cf8f67 rack1 UN 54.172.118.222 561.14 MB 1 25.0% 18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef rack1 Datacenter: DC1_WEST Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.183.192.248 721.2 MB 1 25.0% c4dd37f1-d937-4876-8669-f0b01a3942db rack1 UN 54.215.139.161 909.26 MB 1 25.0% 16499349-8cef-4a62-a99c-ab145cb70921 rack1 {code} I wasn't sure initially because the logs and `nodetool netstats` still show the broadcast address. You can see here that nodetool netstats, when run on 54.215.139.161, shows we are streaming from 54.183.192.248 (the broadcast address of the other node in the same DC): {code} Mode: NORMAL Repair dbc7ea40-5082-11e4-8190-c9fac3589773 /54.183.192.248 Receiving 9 files, 229856794 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db 58878176/58878176 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db 97856/97856 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db 69407704/69407704 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db 3203116/3203116 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db 12545306/12545306 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db 69407704/69407704 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db 1536228/1536228 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db 12589230/12589230 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db 2191474/2191474 bytes(100%) received from /54.183.192.248 Sending 5 files, 109645980 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db 14323672/14323672 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db 20581730/20581730 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db 3161694/3161694 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db 69407704/69407704 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db 2171180/2171180 bytes(100%) sent to /54.183.192.248 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 01495191 Responses n/a 0 714928 {code} However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we are only connecting to the local node on its listen address (172.31.7.50): {code} tcp0 0 172.31.5.143:7000 0.0.0.0:* LISTEN 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34936 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34937 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34938 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34936 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34937 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34938 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.7.50:52125 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.7.50:52126 ESTABLISHED 17279/java tcp
[jira] [Comment Edited] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166908#comment-14166908 ] J.B. Langston edited comment on CASSANDRA-8084 at 10/10/14 2:26 PM: I tested and it appears to work. Here is the cluster I am testing with: {code} Datacenter: DC1_EAST Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.165.222.3711.26 MB 1 25.0% dd449706-2059-4b65-ae98-0012d2cf8f67 rack1 UN 54.172.118.222 561.14 MB 1 25.0% 18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef rack1 Datacenter: DC1_WEST Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.183.192.248 721.2 MB 1 25.0% c4dd37f1-d937-4876-8669-f0b01a3942db rack1 UN 54.215.139.161 909.26 MB 1 25.0% 16499349-8cef-4a62-a99c-ab145cb70921 rack1 {code} I wasn't sure initially because the logs and `nodetool netstats` still show the broadcast address. You can see here that nodetool netstats, when run on 54.215.139.161, shows we are streaming from 54.183.192.248 (the broadcast address of the other node in the same DC): {code} Mode: NORMAL Repair dbc7ea40-5082-11e4-8190-c9fac3589773 /54.183.192.248 Receiving 9 files, 229856794 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db 58878176/58878176 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db 97856/97856 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db 69407704/69407704 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db 3203116/3203116 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db 12545306/12545306 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db 69407704/69407704 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db 1536228/1536228 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db 12589230/12589230 bytes(100%) received from /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db 2191474/2191474 bytes(100%) received from /54.183.192.248 Sending 5 files, 109645980 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db 14323672/14323672 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db 20581730/20581730 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db 3161694/3161694 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db 69407704/69407704 bytes(100%) sent to /54.183.192.248 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db 2171180/2171180 bytes(100%) sent to /54.183.192.248 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 01495191 Responses n/a 0 714928 {code} However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we are only connecting to the local node on its listen address (172.31.7.50): {code} tcp0 0 172.31.5.143:7000 0.0.0.0:* LISTEN 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34936 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34937 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.5.143:34938 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34936 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34937 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:34938 172.31.5.143:7000 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.7.50:52125 ESTABLISHED 17279/java tcp0 0 172.31.5.143:7000 172.31.7.50:52126 ESTABLISHED 17279/java tcp
[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163680#comment-14163680 ] J.B. Langston commented on CASSANDRA-8084: -- Here is the AWS cluster used to reproduce this: {code} automaton@ip-172-31-0-237:~$ nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: aws_east Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.165.86.246 304.01 MB 256 26.8% 1042deb8-5395-42b1-adf4-2a373149b052 rack1 UN 54.209.121.225 302.82 MB 256 21.8% 7e7499c2-acfb-4eda-b786-7878907038b8 rack1 Datacenter: aws_west Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.183.246.79 79.01 MB 256 24.7% 9a4450a4-d00b-407c-8217-464ca5d3d74c rack1 UN 54.183.249.149 319.14 MB 256 26.7% cb6579d4-3eac-48c6-a8c0-ca30071a97e8 rack1 {code} Here is the test case I ran to reproduce this: 1) Run cassandra-stress once to create Keyspace1 and Standard1 CF. 2) Alter keyspace with replication to all nodes: {code} ALTER KEYSPACE Keyspace1 WITH replication = { 'class': 'NetworkTopologyStrategy', 'aws_east': '2', 'aws_west': '2' }; {code} 3) Shut down one of the nodes in aws_west. 4) Run cassandra-stress on the other node in aws-west (just cassandra-stress with no options). Let it finish. 5) Start back up the node. 6) Run nodetool repair -local 7) Repair and streaming messages in system.log will show that it is using the broadcast IP for nodes in the same DC. You can also watch the connections being established over the broadcast IP with this command: {code} sudo netstat -anp | grep 7000 | sort -k5 {code} This was conducted on DSE with GPFS. We should repeat with EC2MRS on DSE and with GPFS on Apache Cassandra/DSC. Here is the netstat output showing that it is establishing connections to the node in the same DC (54.183.249.149). This command is being run on 54.183.246.79, so it should have used the private 172 address to talk to 54.183.249.149 instead. {code} automaton@ip-172-31-0-237:~$ sudo netstat -anp | grep 7000 | sort -k5 tcp 0 0 172.31.0.237:7000 0.0.0.0:* LISTEN 8959/java tcp 0 0 172.31.0.237:7000 172.31.0.237:54148 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 172.31.0.237:54149 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 172.31.0.237:54150 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:54148 172.31.0.237:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:54149 172.31.0.237:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:54150 172.31.0.237:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 172.31.4.163:56894 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 172.31.4.163:56895 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:55510 172.31.4.163:7000 ESTABLISHED 8959/java tcp 0 35 172.31.0.237:55504 172.31.4.163:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 54.165.86.246:36101 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:50600 54.165.86.246:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:50606 54.165.86.246:7000 ESTABLISHED 8959/java tcp 1 0 172.31.0.237:60588 54.183.249.149:7000 CLOSE_WAIT 8959/java tcp 0 0 172.31.0.237:60587 54.183.249.149:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:60505 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60508 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60509 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60511 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60513 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60514 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60515 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60517 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60521 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60523 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60524 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60527 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60528 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60532 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60534 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60536 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60538 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60544 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60546 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60552 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60554 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60560 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60562 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60564 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60565 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60566 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60568 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60570 54.183.249.149:7000 TIME_WAIT - tcp 0 0
[jira] [Comment Edited] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163680#comment-14163680 ] J.B. Langston edited comment on CASSANDRA-8084 at 10/8/14 4:17 PM: --- Here is the AWS cluster used to reproduce this: {code} automaton@ip-172-31-0-237:~$ nodetool status Note: Ownership information does not include topology; for complete information, specify a keyspace Datacenter: aws_east Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.165.86.246 304.01 MB 256 26.8% 1042deb8-5395-42b1-adf4-2a373149b052 rack1 UN 54.209.121.225 302.82 MB 256 21.8% 7e7499c2-acfb-4eda-b786-7878907038b8 rack1 Datacenter: aws_west Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 54.183.246.79 79.01 MB 256 24.7% 9a4450a4-d00b-407c-8217-464ca5d3d74c rack1 UN 54.183.249.149 319.14 MB 256 26.7% cb6579d4-3eac-48c6-a8c0-ca30071a97e8 rack1 {code} Here is the test case I ran to reproduce this: 1) Run cassandra-stress once to create Keyspace1 and Standard1 CF. 2) Alter keyspace with replication to all nodes: {code} ALTER KEYSPACE Keyspace1 WITH replication = { 'class': 'NetworkTopologyStrategy', 'aws_east': '2', 'aws_west': '2' }; {code} 3) Shut down one of the nodes in aws_west. 4) Run cassandra-stress on the other node in aws-west (just cassandra-stress with no options). Let it finish. 5) Start back up the node. 6) Run nodetool repair -local 7) Repair and streaming messages in system.log will show that it is using the broadcast IP for nodes in the same DC. You can also watch the connections being established over the broadcast IP with this command: {code} sudo netstat -anp | grep 7000 | sort -k5 {code} The original test was conducted on DSE. We also reproduced it on on Apache Cassandra/DSC 2.0.10. Here is the netstat output showing that it is establishing connections to the node in the same DC (54.183.249.149). This command is being run on 54.183.246.79, so it should have used the private 172 address to talk to 54.183.249.149 instead. {code} automaton@ip-172-31-0-237:~$ sudo netstat -anp | grep 7000 | sort -k5 tcp 0 0 172.31.0.237:7000 0.0.0.0:* LISTEN 8959/java tcp 0 0 172.31.0.237:7000 172.31.0.237:54148 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 172.31.0.237:54149 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 172.31.0.237:54150 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:54148 172.31.0.237:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:54149 172.31.0.237:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:54150 172.31.0.237:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 172.31.4.163:56894 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 172.31.4.163:56895 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:55510 172.31.4.163:7000 ESTABLISHED 8959/java tcp 0 35 172.31.0.237:55504 172.31.4.163:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:7000 54.165.86.246:36101 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:50600 54.165.86.246:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:50606 54.165.86.246:7000 ESTABLISHED 8959/java tcp 1 0 172.31.0.237:60588 54.183.249.149:7000 CLOSE_WAIT 8959/java tcp 0 0 172.31.0.237:60587 54.183.249.149:7000 ESTABLISHED 8959/java tcp 0 0 172.31.0.237:60505 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60508 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60509 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60511 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60513 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60514 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60515 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60517 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60521 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60523 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60524 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60527 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60528 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60532 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60534 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60536 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60538 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60544 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60546 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60552 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60554 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60560 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60562 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60564 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60565 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60566 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60568 54.183.249.149:7000 TIME_WAIT - tcp 0 0 172.31.0.237:60570 54.183.249.149:7000
[jira] [Created] (CASSANDRA-7805) Performance regression in multi-get (in clause) due to automatic paging
J.B. Langston created CASSANDRA-7805: Summary: Performance regression in multi-get (in clause) due to automatic paging Key: CASSANDRA-7805 URL: https://issues.apache.org/jira/browse/CASSANDRA-7805 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Minor Comparative benchmarking of 1.2 vs. 2.0 shows a regression in multi-get (in clause) queries due to automatic paging. Take the following example: select myId, col1, col2, col3 from myTable where col1 = 'xyz' and myId IN (id1, id1, ..., id100); // primary key is (myId, col1) We were suprised to see that in 2.0, the above query was giving an order of magnitude worse performance than in 1.2. Digging in, I believe it is due to the issue described in the comment at the top of MultiPartitionPager.java (v2.0.9): Note that this is not easy to make efficient. Indeed, we need to page the first command fully before returning results from the next one, but if the result returned by each command is small (compared to pageSize), paging the commands one at a time under-performs compared to parallelizing. The perf regression is due to the new paging feature in 2.0. The server is executing the read for each id in the IN clause *sequentially* in order to implement the paging semantics. The wisdom of using multi-get like this has been debated in other forums, but the thing that's unfortunate from a user point of view, is if they had a bunch of code working against 1.2 and then they upgrade their cluster to 2.0 and all of a sudden start to see an order of magnitude or worse perf regression. That will be perceived as a problem. I think it would surprise anyone not familiar with the code that the separate reads for the IN clause would be done sequentially rather than in parallel. As a workaround, disable paging in the Java driver by setting fetchSize to Integer.MAX_VALUE on your QueryOptions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7767) Expose sizes of off-heap data structures via JMX and `nodetool info`
J.B. Langston created CASSANDRA-7767: Summary: Expose sizes of off-heap data structures via JMX and `nodetool info` Key: CASSANDRA-7767 URL: https://issues.apache.org/jira/browse/CASSANDRA-7767 Project: Cassandra Issue Type: New Feature Reporter: J.B. Langston It would be very helpful for troubleshooting memory consumption to know the individual sizes of off-heap data structures such as bloom filters, index summaries, compression metadata, etc. Can we expose this over JMX? Also, since `nodetool info` already shows size of heap, key cache, etc. it seems like a natural place to show this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7745) Background LCS compactions stall with pending compactions remaining
J.B. Langston created CASSANDRA-7745: Summary: Background LCS compactions stall with pending compactions remaining Key: CASSANDRA-7745 URL: https://issues.apache.org/jira/browse/CASSANDRA-7745 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston We've hit a scenario where background LCS compactions will stall. compactionstats output shows hundreds of pending compactions but none active. The thread dumps show no CompactionExecutor threads running, and no compaction activity is being logged to system.log. This seems to happen when there are no writes to the node. There are no flushes logged either, and when writes resume, compactions seem to resume as well, but still don't ever get to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7723) sstable2json (and possibly other command-line tools) hang if no write permission to the commitlogs
J.B. Langston created CASSANDRA-7723: Summary: sstable2json (and possibly other command-line tools) hang if no write permission to the commitlogs Key: CASSANDRA-7723 URL: https://issues.apache.org/jira/browse/CASSANDRA-7723 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston sstable2json (and potentially other command-line tools that call DatabaseDescriptor.loadSchemas) will hang if the user running them doesn't have write permission on the commit logs. loadSchemas calls Schema.updateVersion, which causes a mutation to the system tables, then it just spins forever trying to acquire a commit log segment. See this thread dump: https://gist.github.com/markcurtis1970/837e770d1cad5200943c. The tools should recognize this and present an understandable error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7723) sstable2json (and possibly other command-line tools) hang if no write permission to the commitlogs
[ https://issues.apache.org/jira/browse/CASSANDRA-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-7723: - Priority: Minor (was: Major) sstable2json (and possibly other command-line tools) hang if no write permission to the commitlogs -- Key: CASSANDRA-7723 URL: https://issues.apache.org/jira/browse/CASSANDRA-7723 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Minor sstable2json (and potentially other command-line tools that call DatabaseDescriptor.loadSchemas) will hang if the user running them doesn't have write permission on the commit logs. loadSchemas calls Schema.updateVersion, which causes a mutation to the system tables, then it just spins forever trying to acquire a commit log segment. See this thread dump: https://gist.github.com/markcurtis1970/837e770d1cad5200943c. The tools should recognize this and present an understandable error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-7117) cqlsh should return a non-zero error code if a query fails
J.B. Langston created CASSANDRA-7117: Summary: cqlsh should return a non-zero error code if a query fails Key: CASSANDRA-7117 URL: https://issues.apache.org/jira/browse/CASSANDRA-7117 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Priority: Minor cqlsh should return a non-zero error code when the last query in a file or piped stdin fails. This is so that shell scripts to determine if a cql script failed or succeeded. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7117) cqlsh should return a non-zero error code if a query fails
[ https://issues.apache.org/jira/browse/CASSANDRA-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-7117: - Description: cqlsh should return a non-zero error code when a query in a file or piped stdin fails. This is so that shell scripts to determine if a cql script failed or succeeded. (was: cqlsh should return a non-zero error code when the last query in a file or piped stdin fails. This is so that shell scripts to determine if a cql script failed or succeeded.) cqlsh should return a non-zero error code if a query fails -- Key: CASSANDRA-7117 URL: https://issues.apache.org/jira/browse/CASSANDRA-7117 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Priority: Minor cqlsh should return a non-zero error code when a query in a file or piped stdin fails. This is so that shell scripts to determine if a cql script failed or succeeded. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-5624) Memory leak in SerializingCache
[ https://issues.apache.org/jira/browse/CASSANDRA-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962308#comment-13962308 ] J.B. Langston commented on CASSANDRA-5624: -- Nobody on 1.2 has hit it. As far as I know, just the one occurrence. Memory leak in SerializingCache --- Key: CASSANDRA-5624 URL: https://issues.apache.org/jira/browse/CASSANDRA-5624 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Jonathan Ellis Assignee: Ryan McGuire A customer reported a memory leak when off-heap row cache is enabled. I gave them a patch against 1.1.9 to troubleshoot (https://github.com/jbellis/cassandra/commits/row-cache-finalizer). This confirms that row cache is responsible. Here is a sample of the log: {noformat} DEBUG [Finalizer] 2013-06-08 06:49:58,656 FreeableMemory.java (line 69) Unreachable memory still has nonzero refcount 1 DEBUG [Finalizer] 2013-06-08 06:49:58,656 FreeableMemory.java (line 71) Unreachable memory 140337996747792 has not been freed (will free now) DEBUG [Finalizer] 2013-06-08 06:49:58,656 FreeableMemory.java (line 69) Unreachable memory still has nonzero refcount 1 DEBUG [Finalizer] 2013-06-08 06:49:58,656 FreeableMemory.java (line 71) Unreachable memory 140337989287984 has not been freed (will free now) {noformat} That is, memory is not being freed because we never got to zero references. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6841) ConcurrentModificationException in commit-log-writer after local schema reset
[ https://issues.apache.org/jira/browse/CASSANDRA-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6841: - Fix Version/s: 1.2.17 ConcurrentModificationException in commit-log-writer after local schema reset - Key: CASSANDRA-6841 URL: https://issues.apache.org/jira/browse/CASSANDRA-6841 Project: Cassandra Issue Type: Bug Environment: Linux 3.2.0 (Debian Wheezy) Cassandra 2.0.6, Oracle JVM 1.7.0_51 Almost default cassandra.yaml (IPs and cluster name changed) This is the 2nd node in a 2-node ring. It has ~2500 keyspaces and very low traffic. (Only new keyspaces see reads and writes.) Reporter: Pas Assignee: Benedict Priority: Minor Fix For: 1.2.17, 2.0.7, 2.1 beta2 {code} INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,013 MigrationManager.java (line 329) Starting local schema reset... INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,016 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,016 Memtable.java (line 331) Writing Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,182 Memtable.java (line 371) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-398-Data.db (145 bytes) for commitlog position ReplayPosition(segmentId=1394620057452, position=33159822) INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,185 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,185 Memtable.java (line 331) Writing Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,357 Memtable.java (line 371) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-399-Data.db (96 bytes) for commitlog position ReplayPosition(segmentId=1394620057452, position=33159959) INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,361 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-local@768887091(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,361 Memtable.java (line 331) Writing Memtable-local@768887091(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,516 Memtable.java (line 371) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-400-Data.db (96 bytes) for commitlog position ReplayPosition(segmentId=1394620057452, position=33160096) INFO [CompactionExecutor:38] 2014-03-12 11:37:54,517 CompactionTask.java (line 115) Compacting [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-398-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-400-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-399-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-397-Data.db')] INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,519 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-local@271993477(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,519 Memtable.java (line 331) Writing Memtable-local@271993477(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,794 Memtable.java (line 371) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-401-Data.db (96 bytes) for commitlog position ReplayPosition(segmentId=1394620057452, position=33160233) INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,799 MigrationManager.java (line 357) Local schema reset is complete. INFO [CompactionExecutor:38] 2014-03-12 11:37:54,848 CompactionTask.java (line 275) Compacted 4 sstables to [/var/lib/cassandra/data/system/local/system-local-jb-402,]. 6,099 bytes to 5,821 (~95% of original) in 330ms = 0.016822MB/s. 4 total partitions merged to 1. Partition merge counts were {4:1, } INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-schema_columnfamilies@106276050(181506/509164 serialized/live bytes, 3276 ops) INFO [FlushWriter:6] 2014-03-12 11:37:55,110 Memtable.java (line 331) Writing Memtable-schema_columnfamilies@106276050(181506/509164 serialized/live bytes, 3276 ops) INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-schema_columns@252242773(185191/630698 serialized/live bytes, 3614 ops) ERROR [COMMIT-LOG-WRITER]
[jira] [Reopened] (CASSANDRA-6841) ConcurrentModificationException in commit-log-writer after local schema reset
[ https://issues.apache.org/jira/browse/CASSANDRA-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston reopened CASSANDRA-6841: -- Reopening to get a backport for 1.2. ConcurrentModificationException in commit-log-writer after local schema reset - Key: CASSANDRA-6841 URL: https://issues.apache.org/jira/browse/CASSANDRA-6841 Project: Cassandra Issue Type: Bug Environment: Linux 3.2.0 (Debian Wheezy) Cassandra 2.0.6, Oracle JVM 1.7.0_51 Almost default cassandra.yaml (IPs and cluster name changed) This is the 2nd node in a 2-node ring. It has ~2500 keyspaces and very low traffic. (Only new keyspaces see reads and writes.) Reporter: Pas Assignee: Benedict Priority: Minor Fix For: 1.2.17, 2.0.7, 2.1 beta2 {code} INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,013 MigrationManager.java (line 329) Starting local schema reset... INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,016 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,016 Memtable.java (line 331) Writing Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,182 Memtable.java (line 371) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-398-Data.db (145 bytes) for commitlog position ReplayPosition(segmentId=1394620057452, position=33159822) INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,185 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,185 Memtable.java (line 331) Writing Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,357 Memtable.java (line 371) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-399-Data.db (96 bytes) for commitlog position ReplayPosition(segmentId=1394620057452, position=33159959) INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,361 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-local@768887091(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,361 Memtable.java (line 331) Writing Memtable-local@768887091(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,516 Memtable.java (line 371) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-400-Data.db (96 bytes) for commitlog position ReplayPosition(segmentId=1394620057452, position=33160096) INFO [CompactionExecutor:38] 2014-03-12 11:37:54,517 CompactionTask.java (line 115) Compacting [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-398-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-400-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-399-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-397-Data.db')] INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,519 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-local@271993477(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,519 Memtable.java (line 331) Writing Memtable-local@271993477(62/620 serialized/live bytes, 1 ops) INFO [FlushWriter:6] 2014-03-12 11:37:54,794 Memtable.java (line 371) Completed flushing /var/lib/cassandra/data/system/local/system-local-jb-401-Data.db (96 bytes) for commitlog position ReplayPosition(segmentId=1394620057452, position=33160233) INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,799 MigrationManager.java (line 357) Local schema reset is complete. INFO [CompactionExecutor:38] 2014-03-12 11:37:54,848 CompactionTask.java (line 275) Compacted 4 sstables to [/var/lib/cassandra/data/system/local/system-local-jb-402,]. 6,099 bytes to 5,821 (~95% of original) in 330ms = 0.016822MB/s. 4 total partitions merged to 1. Partition merge counts were {4:1, } INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-schema_columnfamilies@106276050(181506/509164 serialized/live bytes, 3276 ops) INFO [FlushWriter:6] 2014-03-12 11:37:55,110 Memtable.java (line 331) Writing Memtable-schema_columnfamilies@106276050(181506/509164 serialized/live bytes, 3276 ops) INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 785) Enqueuing flush of Memtable-schema_columns@252242773(185191/630698 serialized/live bytes, 3614 ops) ERROR
[jira] [Created] (CASSANDRA-6960) Cassandra requires allow filtering
J.B. Langston created CASSANDRA-6960: Summary: Cassandra requires allow filtering Key: CASSANDRA-6960 URL: https://issues.apache.org/jira/browse/CASSANDRA-6960 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6960) Cassandra requires ALLOW FILTERING for a range scan
[ https://issues.apache.org/jira/browse/CASSANDRA-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6960: - Reproduced In: 2.0.5 Description: Given this table definition: {code} CREATE TABLE metric_log_a ( destination_id text, rate_plan_id int, metric_name text, extraction_date 'org.apache.cassandra.db.marshal.TimestampType', metric_value text, PRIMARY KEY (destination_id, rate_plan_id, metric_name, extraction_date) ); {code} It seems that Cassandra should be able to perform the following query without ALLOW FILTERING: {code} select destination_id, rate_plan_id, metric_name, extraction_date, metric_value from metric_log_a where token(destination_id) ? and token(destination_id) = ? and rate_plan_id=90 and metric_name='minutesOfUse' and extraction_date = '2014-03-05' and extraction_date = '2014-03-05' allow filtering; {code} However, it will refuse to run unless ALLOW FILTERING is specified. Summary: Cassandra requires ALLOW FILTERING for a range scan (was: Cassandra requires allow filtering) Cassandra requires ALLOW FILTERING for a range scan --- Key: CASSANDRA-6960 URL: https://issues.apache.org/jira/browse/CASSANDRA-6960 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Given this table definition: {code} CREATE TABLE metric_log_a ( destination_id text, rate_plan_id int, metric_name text, extraction_date 'org.apache.cassandra.db.marshal.TimestampType', metric_value text, PRIMARY KEY (destination_id, rate_plan_id, metric_name, extraction_date) ); {code} It seems that Cassandra should be able to perform the following query without ALLOW FILTERING: {code} select destination_id, rate_plan_id, metric_name, extraction_date, metric_value from metric_log_a where token(destination_id) ? and token(destination_id) = ? and rate_plan_id=90 and metric_name='minutesOfUse' and extraction_date = '2014-03-05' and extraction_date = '2014-03-05' allow filtering; {code} However, it will refuse to run unless ALLOW FILTERING is specified. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one
J.B. Langston created CASSANDRA-6902: Summary: Make cqlsh prompt for a password if the user doesn't enter one Key: CASSANDRA-6902 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902 Project: Cassandra Issue Type: New Feature Reporter: J.B. Langston Priority: Minor If the user specifies -u username and leaves off -p password, cqlsh should prompt for a password without echoing it to the screen instead of throwing an exception, which it currently does. I know that you can put a username and password in the .cqlshrc file but if a user wants to log in with multiple accounts and not have the password visible on the screen, there's no way to currently do that. This feature has been requested by a customer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one
[ https://issues.apache.org/jira/browse/CASSANDRA-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6902: - Description: If the user specifies -u username and leaves off -p password, cqlsh should prompt for a password without echoing it to the screen instead of throwing an exception, which it currently does. I know that you can put a username and password in the .cqlshrc file but if a user wants to log in with multiple accounts and not have the password visible on the screen, there's no way to currently do that. (was: If the user specifies -u username and leaves off -p password, cqlsh should prompt for a password without echoing it to the screen instead of throwing an exception, which it currently does. I know that you can put a username and password in the .cqlshrc file but if a user wants to log in with multiple accounts and not have the password visible on the screen, there's no way to currently do that. This feature has been requested by a customer.) Make cqlsh prompt for a password if the user doesn't enter one -- Key: CASSANDRA-6902 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902 Project: Cassandra Issue Type: New Feature Reporter: J.B. Langston Priority: Minor If the user specifies -u username and leaves off -p password, cqlsh should prompt for a password without echoing it to the screen instead of throwing an exception, which it currently does. I know that you can put a username and password in the .cqlshrc file but if a user wants to log in with multiple accounts and not have the password visible on the screen, there's no way to currently do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one
[ https://issues.apache.org/jira/browse/CASSANDRA-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6902: - Attachment: trunk-6902.txt Make cqlsh prompt for a password if the user doesn't enter one -- Key: CASSANDRA-6902 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902 Project: Cassandra Issue Type: New Feature Reporter: J.B. Langston Assignee: Mikhail Stepura Priority: Minor Attachments: trunk-6902.txt If the user specifies -u username and leaves off -p password, cqlsh should prompt for a password without echoing it to the screen instead of throwing an exception, which it currently does. I know that you can put a username and password in the .cqlshrc file but if a user wants to log in with multiple accounts and not have the password visible on the screen, there's no way to currently do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one
[ https://issues.apache.org/jira/browse/CASSANDRA-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6902: - Attachment: trunk-6902.txt Make cqlsh prompt for a password if the user doesn't enter one -- Key: CASSANDRA-6902 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902 Project: Cassandra Issue Type: New Feature Components: Tools Reporter: J.B. Langston Assignee: J.B. Langston Priority: Minor Fix For: 2.0.7 Attachments: trunk-6902.txt If the user specifies -u username and leaves off -p password, cqlsh should prompt for a password without echoing it to the screen instead of throwing an exception, which it currently does. I know that you can put a username and password in the .cqlshrc file but if a user wants to log in with multiple accounts and not have the password visible on the screen, there's no way to currently do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one
[ https://issues.apache.org/jira/browse/CASSANDRA-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6902: - Attachment: (was: trunk-6902.txt) Make cqlsh prompt for a password if the user doesn't enter one -- Key: CASSANDRA-6902 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902 Project: Cassandra Issue Type: New Feature Components: Tools Reporter: J.B. Langston Assignee: J.B. Langston Priority: Minor Fix For: 2.0.7 Attachments: trunk-6902.txt If the user specifies -u username and leaves off -p password, cqlsh should prompt for a password without echoing it to the screen instead of throwing an exception, which it currently does. I know that you can put a username and password in the .cqlshrc file but if a user wants to log in with multiple accounts and not have the password visible on the screen, there's no way to currently do that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6449) Tools error out if they can't make ~/.cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885584#comment-13885584 ] J.B. Langston commented on CASSANDRA-6449: -- From a customer: The culprit is: / src / java / org / apache / cassandra / utils / FBUtilities.java File historyDir = new File(System.getProperty(user.home), .cassandra); Setting an alternate environment variable HOME doesn't fix. I've tried patching the nodetool wrapper script to provide -Duser.home at runtime, but it seems when defining user.home, I get runtime errors with missing libraries. It would be nice if the tool just honoured $HOME (or let you specify a commandline override without hacking the script). Tools error out if they can't make ~/.cassandra --- Key: CASSANDRA-6449 URL: https://issues.apache.org/jira/browse/CASSANDRA-6449 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jeremiah Jordan We shouldn't error out if we can't make the .cassandra folder for the new history stuff. {noformat} Exception in thread main FSWriteError in /usr/share/opscenter-agent/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:261) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:627) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1403) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1122) Caused by: java.io.IOException: Failed to mkdirs /usr/share/opscenter-agent/.cassandra ... 4 more {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6449) Tools error out if they can't make ~/.cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885608#comment-13885608 ] J.B. Langston commented on CASSANDRA-6449: -- This is the error that occurs when manually defining -Duser.home in the nodetool shell script: {code} Exception in thread main java.lang.NoClassDefFoundError: com/google/common/collect/AbstractMultimap$WrappedSortedSet at com.google.common.collect.AbstractMultimap.wrapCollection(AbstractMultimap.java:374) at com.google.common.collect.AbstractMultimap.get(AbstractMultimap.java:363) at com.google.common.collect.AbstractSetMultimap.get(AbstractSetMultimap.java:59) at com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:65) at com.google.common.collect.TreeMultimap.get(TreeMultimap.java:74) at com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:35) at com.google.common.collect.Multimaps$UnmodifiableMultimap.get(Multimaps.java:563) at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:507) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2048) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2042) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:264) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:762) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1454) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:74) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1295) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1387) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:818) at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} I'm guessing maybe we use user.home elsewhere to set up the CLASSPATH. Tools error out if they can't make ~/.cassandra --- Key: CASSANDRA-6449 URL: https://issues.apache.org/jira/browse/CASSANDRA-6449 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jeremiah Jordan We shouldn't error out if we can't make the .cassandra folder for the new history stuff. {noformat} Exception in thread main FSWriteError in /usr/share/opscenter-agent/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:261) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:627) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1403) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1122) Caused by: java.io.IOException: Failed to mkdirs /usr/share/opscenter-agent/.cassandra ... 4 more {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (CASSANDRA-6449) Tools error out if they can't make ~/.cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885608#comment-13885608 ] J.B. Langston edited comment on CASSANDRA-6449 at 1/29/14 6:18 PM: --- This is the error that occurs when manually defining -Duser.home in the nodetool shell script: {code} Exception in thread main java.lang.NoClassDefFoundError: com/google/common/collect/AbstractMultimap$WrappedSortedSet at com.google.common.collect.AbstractMultimap.wrapCollection(AbstractMultimap.java:374) at com.google.common.collect.AbstractMultimap.get(AbstractMultimap.java:363) at com.google.common.collect.AbstractSetMultimap.get(AbstractSetMultimap.java:59) at com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:65) at com.google.common.collect.TreeMultimap.get(TreeMultimap.java:74) at com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:35) at com.google.common.collect.Multimaps$UnmodifiableMultimap.get(Multimaps.java:563) at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:507) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2048) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2042) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:264) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:762) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1454) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:74) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1295) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1387) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:818) at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {code} was (Author: jblangs...@datastax.com): This is the error that occurs when manually defining -Duser.home in the nodetool shell script: {code} Exception in thread main java.lang.NoClassDefFoundError: com/google/common/collect/AbstractMultimap$WrappedSortedSet at com.google.common.collect.AbstractMultimap.wrapCollection(AbstractMultimap.java:374) at com.google.common.collect.AbstractMultimap.get(AbstractMultimap.java:363) at com.google.common.collect.AbstractSetMultimap.get(AbstractSetMultimap.java:59) at com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:65) at com.google.common.collect.TreeMultimap.get(TreeMultimap.java:74) at com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:35) at com.google.common.collect.Multimaps$UnmodifiableMultimap.get(Multimaps.java:563) at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:507) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2048) at org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2042) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
[jira] [Created] (CASSANDRA-6548) Order nodetool ring output by token when vnodes aren't in use
J.B. Langston created CASSANDRA-6548: Summary: Order nodetool ring output by token when vnodes aren't in use Key: CASSANDRA-6548 URL: https://issues.apache.org/jira/browse/CASSANDRA-6548 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston It is confusing to order the nodes by hostId in nodetool ring when vnodes aren't in use. This happens in 1.2 when providing a keyspace name: {code} Datacenter: DC1 == Replicas: 2 Address RackStatus State LoadOwns Token 42535295865117307932921825928971026432 xxx.xxx.xxx.48 RAC2Up Normal 324.26 GB 25.00% 85070591730234615865843651857942052864 xxx.xxx.xxx.42 RAC1Up Normal 284.39 GB 25.00% 0 xxx.xxx.xxx.44 RAC1Up Normal 931.07 GB 75.00% 127605887595351923798765477786913079296 xxx.xxx.xxx.46 RAC2Up Normal 881.93 GB 75.00% 42535295865117307932921825928971026432 Datacenter: DC2 == Replicas: 2 Address RackStatus State LoadOwns Token 148873535527910577765226390751398592512 xxx.xxx.xxx.19 RAC2Up Normal 568.22 GB 50.00% 63802943797675961899382738893456539648 xxx.xxx.xxx.17 RAC1Up Normal 621.58 GB 50.00% 106338239662793269832304564822427566080 xxx.xxx.xxx.15 RAC1Up Normal 566.99 GB 50.00% 21267647932558653966460912964485513216 xxx.xxx.xxx.21 RAC2Up Normal 619.41 GB 50.00% 148873535527910577765226390751398592512 {code} Among other things, it makes it hard to spot rack imbalances. In the above output, the racks DC1 is actually incorrectly ordered and DC2 is correctly ordered, but it's not obvious until you manually sort the nodes by token. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6548) Order nodetool ring output by token when vnodes aren't in use
[ https://issues.apache.org/jira/browse/CASSANDRA-6548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6548: - Description: It is confusing to order the nodes by hostId in nodetool ring when vnodes aren't in use. This happens in 1.2 when providing a keyspace name: {code} Datacenter: DC1 == Replicas: 2 Address RackStatus State LoadOwns Token 42535295865117307932921825928971026432 xxx.xxx.xxx.48 RAC2Up Normal 324.26 GB 25.00% 85070591730234615865843651857942052864 xxx.xxx.xxx.42 RAC1Up Normal 284.39 GB 25.00% 0 xxx.xxx.xxx.44 RAC1Up Normal 931.07 GB 75.00% 127605887595351923798765477786913079296 xxx.xxx.xxx.46 RAC2Up Normal 881.93 GB 75.00% 42535295865117307932921825928971026432 Datacenter: DC2 == Replicas: 2 Address RackStatus State LoadOwns Token 148873535527910577765226390751398592512 xxx.xxx.xxx.19 RAC2Up Normal 568.22 GB 50.00% 63802943797675961899382738893456539648 xxx.xxx.xxx.17 RAC1Up Normal 621.58 GB 50.00% 106338239662793269832304564822427566080 xxx.xxx.xxx.15 RAC1Up Normal 566.99 GB 50.00% 21267647932558653966460912964485513216 xxx.xxx.xxx.21 RAC2Up Normal 619.41 GB 50.00% 148873535527910577765226390751398592512 {code} Among other things, this makes it hard to spot rack imbalances. In the above output, the racks in DC1 are actually incorrectly ordered and those in DC2 are correctly ordered, but it's not obvious until you manually sort the nodes by token. was: It is confusing to order the nodes by hostId in nodetool ring when vnodes aren't in use. This happens in 1.2 when providing a keyspace name: {code} Datacenter: DC1 == Replicas: 2 Address RackStatus State LoadOwns Token 42535295865117307932921825928971026432 xxx.xxx.xxx.48 RAC2Up Normal 324.26 GB 25.00% 85070591730234615865843651857942052864 xxx.xxx.xxx.42 RAC1Up Normal 284.39 GB 25.00% 0 xxx.xxx.xxx.44 RAC1Up Normal 931.07 GB 75.00% 127605887595351923798765477786913079296 xxx.xxx.xxx.46 RAC2Up Normal 881.93 GB 75.00% 42535295865117307932921825928971026432 Datacenter: DC2 == Replicas: 2 Address RackStatus State LoadOwns Token 148873535527910577765226390751398592512 xxx.xxx.xxx.19 RAC2Up Normal 568.22 GB 50.00% 63802943797675961899382738893456539648 xxx.xxx.xxx.17 RAC1Up Normal 621.58 GB 50.00% 106338239662793269832304564822427566080 xxx.xxx.xxx.15 RAC1Up Normal 566.99 GB 50.00% 21267647932558653966460912964485513216 xxx.xxx.xxx.21 RAC2Up Normal 619.41 GB 50.00% 148873535527910577765226390751398592512 {code} Among other things, it makes it hard to spot rack imbalances. In the above output, the racks DC1 is actually incorrectly ordered and DC2 is correctly ordered, but it's not obvious until you manually sort the nodes by token. Order nodetool ring output by token when vnodes aren't in use - Key: CASSANDRA-6548 URL: https://issues.apache.org/jira/browse/CASSANDRA-6548 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston It is confusing to order the nodes by hostId in nodetool ring when vnodes aren't in use. This happens in 1.2 when providing a keyspace name: {code} Datacenter: DC1 == Replicas: 2 Address RackStatus State LoadOwns Token 42535295865117307932921825928971026432 xxx.xxx.xxx.48 RAC2Up Normal 324.26 GB 25.00% 85070591730234615865843651857942052864 xxx.xxx.xxx.42 RAC1Up Normal 284.39 GB 25.00% 0 xxx.xxx.xxx.44 RAC1Up Normal 931.07 GB 75.00% 127605887595351923798765477786913079296 xxx.xxx.xxx.46 RAC2Up Normal 881.93 GB 75.00%
[jira] [Created] (CASSANDRA-6262) Nodetool compact throws an error after importing data with sstableloader
J.B. Langston created CASSANDRA-6262: Summary: Nodetool compact throws an error after importing data with sstableloader Key: CASSANDRA-6262 URL: https://issues.apache.org/jira/browse/CASSANDRA-6262 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Exception when running nodetool compact: {code} Error occurred during compaction java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: index (2) must be less than size (2) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.apache.cassandra.db.compaction.CompactionManager.performMaximal(CompactionManager.java:331) at org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1691) at org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:2198) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) at sun.rmi.transport.Transport$1.run(Transport.java:177) at sun.rmi.transport.Transport$1.run(Transport.java:174) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:173) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.lang.IndexOutOfBoundsException: index (2) must be less than size (2) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:81) at org.apache.cassandra.db.marshal.CompositeType.getComparator(CompositeType.java:94) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:76) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31) at org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:128) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:119) at org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:114) at
[jira] [Commented] (CASSANDRA-6097) nodetool repair randomly hangs.
[ https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790844#comment-13790844 ] J.B. Langston commented on CASSANDRA-6097: -- Customer compiled Cassandra from git and ran the resulting nodetool against his DSE installation. He reported that the hang is still reproducible. I haven't tried to duplicate this myself yet. nodetool repair randomly hangs. --- Key: CASSANDRA-6097 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax AMI Reporter: J.B. Langston Assignee: Yuki Morishita Priority: Minor Fix For: 1.2.11 Attachments: 6097-1.2.txt, dse.stack, nodetool.stack nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps to reproduce: Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 (Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the same version, and subsequent attempts to reproduce it on the AMI were unsuccessful. The customer says he is able is able to reliably reproduce on his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 1) Deploy an AMI using the DataStax AMI at https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2 2) Create a test keyspace {code} create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; {code} 3) Run an endless loop that runs nodetool repair repeatedly: {code} while true; do nodetool repair -pr test; done {code} 4) Wait until repair hangs. It may take many tries; the behavior is random. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
[ https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-5911: - Attachment: 6528_140171_knwmuqxe9bjv5re_system.log Attached system.log showing commitlog replay. This was produced by running stress against a single-node cassandra cluster, then running drain and restarting. Commit logs are not removed after nodetool flush or nodetool drain -- Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Assignee: Vijay Priority: Minor Fix For: 2.0.2 Attachments: 6528_140171_knwmuqxe9bjv5re_system.log Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M ../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log INFO 10:03:43,912 Replaying
[jira] [Updated] (CASSANDRA-4785) Secondary Index Sporadically Doesn't Return Rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-4785: - Attachment: repro.py entity_aliases.txt Reproducible test case. Steps to reproduce: 1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'. 2) Create schema: 'cassandra-cli entity_aliases.txt' 3) Run reproducible test case (requires pycassa): 'python repro.py' Script inserts a row into Entity_Aliases table, then queries first by rowId and then by secondary index. Both queries should return the same row. 5) Sometimes the node needs to be flushed and restarted after the initial insert before the issue is reproducible. Expected result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) {code} Actual Result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... {code} Secondary Index Sporadically Doesn't Return Rows Key: CASSANDRA-4785 URL: https://issues.apache.org/jira/browse/CASSANDRA-4785 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.5, 1.1.6 Environment: Ubuntu 10.04 Java 6 Sun Cassandra 1.1.5 upgraded from 1.1.2 - 1.1.3 - 1.1.5 Reporter: Arya Goudarzi Attachments: entity_aliases.txt, repro.py I have a ColumnFamily with caching = ALL. I have 2 secondary indexes on it. I have noticed if I query using the secondary index in the where clause, sometimes I get the results and sometimes I don't. Until 2 weeks ago, the caching option on this CF was set to NONE. So, I suspect something happened in secondary index caching scheme. Here are things I tried: 1. I rebuild indexes for that CF on all nodes; 2. I set the caching to KEYS_ONLY and rebuild the index again; 3. I set the caching to NONE and rebuild the index again; None of the above helped. I suppose the caching still exists as this behavior looks like cache mistmatch. I did a bit research, and found CASSANDRA-4197 that could be related. Please advice. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-4973) Secondary Index stops returning rows when caching=ALL
[ https://issues.apache.org/jira/browse/CASSANDRA-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786389#comment-13786389 ] J.B. Langston commented on CASSANDRA-4973: -- Reproducible test case attached to CASSANDRA-4785, of which this appears to be a duplicate. Secondary Index stops returning rows when caching=ALL - Key: CASSANDRA-4973 URL: https://issues.apache.org/jira/browse/CASSANDRA-4973 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.2, 1.1.6 Environment: Centos 6.3, Java 1.6.0_35, cass. 1.1.2 upgraded to 1.1.6 Reporter: Daniel Strawson Attachments: secondary_index_rowcache_restart_test.py I've been using cassandra on a project for a little while in development and have recently suddenly started having an issue where the secondary index stops working, this is happening on my new production system, we are not yet live. Things work ok one moment, then suddenly queries to the cf through the secondary index stop returning data. I've seen it happen on 3 CFs. I've tried: - various nodetools repair / scrub / rebuild_indexes options, none seem to make a difference. - Doing a 'update column family whatever with column_metadata=[]' then repeating with my correct column_metadata definition. This seems to fix the problem (temporarily) until it comes back. The last time it happened I had just restarted cassandra, so it could be that which is causing the issue, I've got the production system ok at the moment, I will try restarting a bit later when its not being used and if I can get the issue to reoccur I will add more information. The problem first manifested itself in 1.1.2, so I upgraded to 1.1.6, this has not fixed it. Here is an example of the create column family I'm using for one of the CFs that affected: create column family region with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'BytesType' and key_validation_class = 'UTF8Type' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' and caching = 'KEYS_ONLY' and column_metadata = [ {column_name : 'label', validation_class : UTF8Type}, {column_name : 'countryCode', validation_class : UTF8Type, index_name : 'region_countryCode_idx', index_type : 0}, ] and compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'}; I've noticed that CASSANDRA-4785 looks similar, in my case once the system has the problem, it doesn't go away until I fix it. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-4785) Secondary Index Sporadically Doesn't Return Rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786381#comment-13786381 ] J.B. Langston edited comment on CASSANDRA-4785 at 10/4/13 5:38 PM: --- Reproducible test case. Steps to reproduce: 1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'. 2) Create schema: 'cassandra-cli entity_aliases.txt' 3) Run reproducible test case (requires pycassa): 'python repro.py' Script inserts a row into Entity_Aliases table, then queries first by rowId and then by secondary index. Both queries should return the same row. Note: Sometimes the node needs to be flushed and restarted after the initial insert before the issue is reproducible. Expected result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) {code} Actual Result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... {code} was (Author: jblangs...@datastax.com): Reproducible test case. Steps to reproduce: 1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'. 2) Create schema: 'cassandra-cli entity_aliases.txt' 3) Run reproducible test case (requires pycassa): 'python repro.py' Script inserts a row into Entity_Aliases table, then queries first by rowId and then by secondary index. Both queries should return the same row. 5) Sometimes the node needs to be flushed and restarted after the initial insert before the issue is reproducible. Expected result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) {code} Actual Result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... {code} Secondary Index Sporadically Doesn't Return Rows Key: CASSANDRA-4785 URL: https://issues.apache.org/jira/browse/CASSANDRA-4785 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.5, 1.1.6 Environment: Ubuntu 10.04 Java 6 Sun Cassandra 1.1.5 upgraded from 1.1.2 - 1.1.3 - 1.1.5 Reporter: Arya Goudarzi Attachments: entity_aliases.txt, repro.py I have a ColumnFamily with caching = ALL. I have 2 secondary indexes on it. I have noticed if I query using the secondary index in the where clause, sometimes I get the results and sometimes I don't. Until 2 weeks ago, the caching option on this CF was set to NONE. So, I suspect something happened in secondary index caching scheme. Here are things I tried: 1. I rebuild indexes for that CF on all nodes; 2. I set the caching to KEYS_ONLY and rebuild the index again; 3. I set the caching to NONE and rebuild the index again; None of the above helped. I suppose the caching still exists as this behavior looks like cache mistmatch. I did a bit research, and found CASSANDRA-4197 that could be related. Please advice. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-4785) Secondary Index Sporadically Doesn't Return Rows
[ https://issues.apache.org/jira/browse/CASSANDRA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786381#comment-13786381 ] J.B. Langston edited comment on CASSANDRA-4785 at 10/4/13 5:41 PM: --- I have attached files for a reproducible test case. Steps to reproduce: 1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'. 2) Create schema: 'cassandra-cli entity_aliases.txt' 3) Run reproducible test case (requires pycassa): 'python repro.py' Script inserts a row into Entity_Aliases table, then queries first by rowId and then by secondary index. Both queries should return the same row. Note: Sometimes the node needs to be flushed and restarted after the initial insert before the issue is reproducible. Expected result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) {code} Actual Result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... {code} Reproduced in both 1.1.9 and 1.2.10. Customer is requesting a fix against 1.1.x. was (Author: jblangs...@datastax.com): Reproducible test case. Steps to reproduce: 1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'. 2) Create schema: 'cassandra-cli entity_aliases.txt' 3) Run reproducible test case (requires pycassa): 'python repro.py' Script inserts a row into Entity_Aliases table, then queries first by rowId and then by secondary index. Both queries should return the same row. Note: Sometimes the node needs to be flushed and restarted after the initial insert before the issue is reproducible. Expected result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) {code} Actual Result: {code} Getting by rowId ... OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')]) Querying with get_indexed_slice ... {code} Secondary Index Sporadically Doesn't Return Rows Key: CASSANDRA-4785 URL: https://issues.apache.org/jira/browse/CASSANDRA-4785 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.5, 1.1.6 Environment: Ubuntu 10.04 Java 6 Sun Cassandra 1.1.5 upgraded from 1.1.2 - 1.1.3 - 1.1.5 Reporter: Arya Goudarzi Attachments: entity_aliases.txt, repro.py I have a ColumnFamily with caching = ALL. I have 2 secondary indexes on it. I have noticed if I query using the secondary index in the where clause, sometimes I get the results and sometimes I don't. Until 2 weeks ago, the caching option on this CF was set to NONE. So, I suspect something happened in secondary index caching scheme. Here are things I tried: 1. I rebuild indexes for that CF on all nodes; 2. I set the caching to KEYS_ONLY and rebuild the index again; 3. I set the caching to NONE and rebuild the index again; None of the above helped. I suppose the caching still exists as this behavior looks like cache mistmatch. I did a bit research, and found CASSANDRA-4197 that could be related. Please advice. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (CASSANDRA-6097) nodetool repair randomly hangs.
[ https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6097: - Attachment: dse.stack nodetool.stack Stack trace for nodetool and cassandra attached. nodetool repair randomly hangs. --- Key: CASSANDRA-6097 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax AMI Reporter: J.B. Langston Priority: Trivial Attachments: dse.stack, nodetool.stack nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps to reproduce: Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 (Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the same version, and subsequent attempts to reproduce it on the AMI were unsuccessful. The customer says he is able is able to reliably reproduce on his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 1) Deploy an AMI using the DataStax AMI at https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2 2) Create a test keyspace {code} create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; {code} 3) Run an endless loop that runs nodetool repair repeatedly: {code} while true; do nodetool repair -pr test; done {code} 4) Wait until repair hangs. It may take many tries; the behavior is random. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6097) nodetool repair randomly hangs.
[ https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783281#comment-13783281 ] J.B. Langston commented on CASSANDRA-6097: -- The JMX documentation [states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765] that notifications are not guaranteed to always be delivered. The API only guarantees that a client either receives all notifications for which it is listening, or can discover that notifications may have been lost. A client can discover when notifications are lost by registering a listener using JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't doing this last part. Seems like we should register a list ConnectionNotificationListener and if a connection fails, signal the condition so that nodetool doesn't hang. Maybe have nodetool query for the status of the repair at that point via separate JMX call, or just print a warning that The status of the repair command can't be determined, please check the log. or something like that. I would disagree with prioritizing this as trivial. It's not critical but I have had many customers express frustration with the nodetool repair's proclivity for hanging. It makes automating repairs painful because they can't count on nodetool to ever return. nodetool repair randomly hangs. --- Key: CASSANDRA-6097 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax AMI Reporter: J.B. Langston Priority: Trivial Attachments: dse.stack, nodetool.stack nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps to reproduce: Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 (Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the same version, and subsequent attempts to reproduce it on the AMI were unsuccessful. The customer says he is able is able to reliably reproduce on his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 1) Deploy an AMI using the DataStax AMI at https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2 2) Create a test keyspace {code} create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; {code} 3) Run an endless loop that runs nodetool repair repeatedly: {code} while true; do nodetool repair -pr test; done {code} 4) Wait until repair hangs. It may take many tries; the behavior is random. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-6097) nodetool repair randomly hangs.
[ https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783281#comment-13783281 ] J.B. Langston edited comment on CASSANDRA-6097 at 10/1/13 8:06 PM: --- The JMX documentation [states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765] that notifications are not guaranteed to always be delivered. The API only guarantees that a client either receives all notifications for which it is listening, or can discover that notifications may have been lost. A client can discover when notifications are lost by registering a listener using JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't doing this last part. Seems like we should register a listener ConnectionNotificationListener and if a notification fails, signal the condition so that nodetool doesn't hang. Maybe have nodetool query for the status of the repair at that point via separate JMX call, or just print a warning that The status of the repair command can't be determined, please check the log. or something like that. I would disagree with prioritizing this as trivial. It's not critical but I have had many customers express frustration with the nodetool repair's proclivity for hanging. It makes automating repairs painful because they can't count on nodetool to ever return. was (Author: jblangs...@datastax.com): The JMX documentation [states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765] that notifications are not guaranteed to always be delivered. The API only guarantees that a client either receives all notifications for which it is listening, or can discover that notifications may have been lost. A client can discover when notifications are lost by registering a listener using JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't doing this last part. Seems like we should register a list ConnectionNotificationListener and if a connection fails, signal the condition so that nodetool doesn't hang. Maybe have nodetool query for the status of the repair at that point via separate JMX call, or just print a warning that The status of the repair command can't be determined, please check the log. or something like that. I would disagree with prioritizing this as trivial. It's not critical but I have had many customers express frustration with the nodetool repair's proclivity for hanging. It makes automating repairs painful because they can't count on nodetool to ever return. nodetool repair randomly hangs. --- Key: CASSANDRA-6097 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax AMI Reporter: J.B. Langston Priority: Trivial Attachments: dse.stack, nodetool.stack nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster.
[jira] [Comment Edited] (CASSANDRA-6097) nodetool repair randomly hangs.
[ https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783281#comment-13783281 ] J.B. Langston edited comment on CASSANDRA-6097 at 10/1/13 8:07 PM: --- The JMX documentation [states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765] that notifications are not guaranteed to always be delivered. The API only guarantees that a client either receives all notifications for which it is listening, or can discover that notifications may have been lost. A client can discover when notifications are lost by registering a listener using JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't doing this last part. Seems like we should register a ConnectionNotificationListener and if a notification fails, signal the condition so that nodetool doesn't hang. Maybe have nodetool query for the status of the repair at that point via separate JMX call, or just print a warning that The status of the repair command can't be determined, please check the log. or something like that. I would disagree with prioritizing this as trivial. It's not critical but I have had many customers express frustration with the nodetool repair's proclivity for hanging. It makes automating repairs painful because they can't count on nodetool to ever return. was (Author: jblangs...@datastax.com): The JMX documentation [states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765] that notifications are not guaranteed to always be delivered. The API only guarantees that a client either receives all notifications for which it is listening, or can discover that notifications may have been lost. A client can discover when notifications are lost by registering a listener using JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't doing this last part. Seems like we should register a listener ConnectionNotificationListener and if a notification fails, signal the condition so that nodetool doesn't hang. Maybe have nodetool query for the status of the repair at that point via separate JMX call, or just print a warning that The status of the repair command can't be determined, please check the log. or something like that. I would disagree with prioritizing this as trivial. It's not critical but I have had many customers express frustration with the nodetool repair's proclivity for hanging. It makes automating repairs painful because they can't count on nodetool to ever return. nodetool repair randomly hangs. --- Key: CASSANDRA-6097 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax AMI Reporter: J.B. Langston Priority: Trivial Attachments: dse.stack, nodetool.stack nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps
[jira] [Commented] (CASSANDRA-6097) nodetool repair randomly hangs.
[ https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782397#comment-13782397 ] J.B. Langston commented on CASSANDRA-6097: -- I think the ease with which this can be reproduced is dependent on the number of keyspaces. I started up a stock 3.1.3 AMI in hadoop mode so that DSE would create the cfs/HiveMetaStore/dse_system keyspaces and also created an additional keyspace using the customer's schema. Now I am able to reproduce the issue very readily by running nodetool repair -pr in a loop. I thought that it might have been something to do with having hadoop enabled, so i disabled it again, but I am still able to reproduce the issue. On the other hand, if I give repair a specific keyspace name, it takes much longer to reproduce, if at all. nodetool repair randomly hangs. --- Key: CASSANDRA-6097 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax AMI Reporter: J.B. Langston Priority: Trivial nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps to reproduce: Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 (Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the same version, and subsequent attempts to reproduce it on the AMI were unsuccessful. The customer says he is able is able to reliably reproduce on his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 1) Deploy an AMI using the DataStax AMI at https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2 2) Create a test keyspace {code} create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; {code} 3) Run an endless loop that runs nodetool repair repeatedly: {code} while true; do nodetool repair -pr test; done {code} 4) Wait until repair hangs. It may take many tries; the behavior is random. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (CASSANDRA-6110) Workaround for JMX random port selection
J.B. Langston created CASSANDRA-6110: Summary: Workaround for JMX random port selection Key: CASSANDRA-6110 URL: https://issues.apache.org/jira/browse/CASSANDRA-6110 Project: Cassandra Issue Type: Improvement Reporter: J.B. Langston Priority: Minor Many people have been annoyed by the way that JMX selects a second port at random for the RMIServer, which makes it almost impossible to use JMX through a firewall. There is a [workaround|https://blogs.oracle.com/jmxetc/entry/connecting_through_firewall_using_jmx] using a custom java agent. Since jamm is already specified as the java agent for Cassandra, this would have to subclass or wrap the jamm MemoryMeter class. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6097) nodetool repair randomly hangs.
[ https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781504#comment-13781504 ] J.B. Langston commented on CASSANDRA-6097: -- If I'm reading [this|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/NodeProbe.java#L1036-L1039] correctly, the condition that nodetool repair is waiting on won't get signaled if the status returned to the NotificationListener is SESSION_FAILED. Could that explain why it's hanging? nodetool repair randomly hangs. --- Key: CASSANDRA-6097 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax AMI Reporter: J.B. Langston Priority: Trivial nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps to reproduce: Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 (Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the same version, and subsequent attempts to reproduce it on the AMI were unsuccessful. The customer says he is able is able to reliably reproduce on his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 1) Deploy an AMI using the DataStax AMI at https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2 2) Create a test keyspace {code} create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; {code} 3) Run an endless loop that runs nodetool repair repeatedly: {code} while true; do nodetool repair -pr test; done {code} 4) Wait until repair hangs. It may take many tries; the behavior is random. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (CASSANDRA-6104) Add additional limits in cassandra.conf provided by Debian package
J.B. Langston created CASSANDRA-6104: Summary: Add additional limits in cassandra.conf provided by Debian package Key: CASSANDRA-6104 URL: https://issues.apache.org/jira/browse/CASSANDRA-6104 Project: Cassandra Issue Type: Bug Components: Packaging Reporter: J.B. Langston Priority: Trivial /etc/security/limits.d/cassandra.conf distributed with DSC deb/rpm packages should contain additional settings. We have found these limits to be necessary for some customers through various support tickets. {code} cassandra - memlock unlimited cassandra - nofile 10 cassandra - nproc 32768 cassandra - as unlimited {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-6097) nodetool repair randomly hangs.
[ https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-6097: - Description: nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps to reproduce: Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 (Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the same version, and subsequent attempts to reproduce it on the AMI were unsuccessful. The customer says he is able is able to reliably reproduce on his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 1) Deploy an AMI using the DataStax AMI at https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2 2) Create a test keyspace {code} create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; {code} 3) Run an endless loop that runs nodetool repair repeatedly: {code} while true; do nodetool repair -pr test; done {code} 4) Wait until repair hangs. It may take many tries; the behavior is random. was: nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps to reproduce: Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 (Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the same version, and subsequent
[jira] [Created] (CASSANDRA-6097) nodetool repair randomly hangs.
J.B. Langston created CASSANDRA-6097: Summary: nodetool repair randomly hangs. Key: CASSANDRA-6097 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097 Project: Cassandra Issue Type: Bug Components: Core Environment: DataStax AMI Reporter: J.B. Langston nodetool repair randomly hangs. This is not the same issue where repair hangs if a stream is disrupted. This can be reproduced on a single-node cluster where no streaming takes place, so I think this may be a JMX connection or timeout issue. Thread dumps show that nodetool is waiting on a JMX response and there are no repair-related threads running in Cassandra. Nodetool main thread waiting for JMX response: {code} main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at java.lang.Object.wait(Object.java:485) at org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34) - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition) at org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976) at org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221) at org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213) {code} When nodetool hangs, it does not print out the following message: Starting repair command #XX, repairing 1 ranges for keyspace XXX However, Cassandra logs that repair in system.log: 1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X, repairing X ranges for keyspace XXX This suggests that the repair command was received by Cassandra but the connection then failed and nodetool didn't receive a response. Obviously, running repair on a single-node cluster is pointless but it's the easiest way to demonstrate this problem. The customer who reported this has also seen the issue on his real multi-node cluster. Steps to reproduce: Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 (Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the same version, and subsequent attempts to reproduce it on the AMI were unsuccessful. The customer says he is able is able to reliably reproduce on his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 1) Deploy an AMI using the DataStax AMI at https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2 2) Create a test keyspace create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; 3) Run an endless loop that runs nodetool repair repeatedly: while true; do nodetool repair -pr test; done 4) Wait until repair hangs. It may take hundreds or thousands of tries; the behavior is random. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-6047) Memory leak when using snapshot repairs
J.B. Langston created CASSANDRA-6047: Summary: Memory leak when using snapshot repairs Key: CASSANDRA-6047 URL: https://issues.apache.org/jira/browse/CASSANDRA-6047 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Running nodetool repair repeatedly with the -snapshot parameter results in a native memory leak. The JVM process will take up more and more physical memory until it is killed by the Linux OOM killer. The command used was as follows: nodetool repair keyspace -local -snapshot -pr -st start_token -et end_token Removing the -snapshot flag prevented the memory leak. The subrange repair necessitated multiple repairs, so it made the problem noticeable, but I believe the problem would be reproducible even if you ran repair repeatedly without specifying a start and end token. Notes from [~yukim]: Probably the cause is too many snapshots. Snapshot sstables are opened during validation, but memories used are freed when releaseReferences called. But since snapshots never get marked compacted, memories never freed. We only cleanup mmap'd memories when sstable is mark compacted. https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/io/sstable/SSTableReader.java#L974 Validation compaction never marks snapshots compacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing
[ https://issues.apache.org/jira/browse/CASSANDRA-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-5958: - Description: When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: {code}Unable to find property 'some_property' on class: org.apache.cassandra.config.Config{code} The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we should catch this exception and wrap it in another exception that says something like this: {code}Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra.{code} was: When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: Unable to find property 'some_property' on class: org.apache.cassandra.config.Config The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we catch this exception and wrap it in another exception that says something like this: Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra. Unable to find property errors from snakeyaml are confusing - Key: CASSANDRA-5958 URL: https://issues.apache.org/jira/browse/CASSANDRA-5958 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Minor When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: {code}Unable to find property 'some_property' on class: org.apache.cassandra.config.Config{code} The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we should catch this exception and wrap it in another exception that says something like this: {code}Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra.{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing
[ https://issues.apache.org/jira/browse/CASSANDRA-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.B. Langston updated CASSANDRA-5958: - Description: When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: {code}Unable to find property 'some_property' on class: org.apache.cassandra.config.Config{code} The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we should catch this exception and wrap it in another exception that says something like this: {code}Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra.{code} Also, it might make sense to make this a warning instead of a fatal error, and just ignore the unwanted property. was: When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: {code}Unable to find property 'some_property' on class: org.apache.cassandra.config.Config{code} The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we should catch this exception and wrap it in another exception that says something like this: {code}Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra.{code} Unable to find property errors from snakeyaml are confusing - Key: CASSANDRA-5958 URL: https://issues.apache.org/jira/browse/CASSANDRA-5958 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Minor When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: {code}Unable to find property 'some_property' on class: org.apache.cassandra.config.Config{code} The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we should catch this exception and wrap it in another exception that says something like this: {code}Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra.{code} Also, it might make sense to make this a warning instead of a fatal error, and just ignore the unwanted property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing
[ https://issues.apache.org/jira/browse/CASSANDRA-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754836#comment-13754836 ] J.B. Langston commented on CASSANDRA-5958: -- 1.2 and prior Unable to find property errors from snakeyaml are confusing - Key: CASSANDRA-5958 URL: https://issues.apache.org/jira/browse/CASSANDRA-5958 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Minor When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: {code}Unable to find property 'some_property' on class: org.apache.cassandra.config.Config{code} The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we should catch this exception and wrap it in another exception that says something like this: {code}Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra.{code} Also, it might make sense to make this a warning instead of a fatal error, and just ignore the unwanted property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing
[ https://issues.apache.org/jira/browse/CASSANDRA-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754928#comment-13754928 ] J.B. Langston commented on CASSANDRA-5958: -- I just tested with 2.0.0-rc2 and the message is the same as before. Unable to find property errors from snakeyaml are confusing - Key: CASSANDRA-5958 URL: https://issues.apache.org/jira/browse/CASSANDRA-5958 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Minor When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: {code}Unable to find property 'some_property' on class: org.apache.cassandra.config.Config{code} The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we should catch this exception and wrap it in another exception that says something like this: {code}Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra.{code} Also, it might make sense to make this a warning instead of a fatal error, and just ignore the unwanted property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing
J.B. Langston created CASSANDRA-5958: Summary: Unable to find property errors from snakeyaml are confusing Key: CASSANDRA-5958 URL: https://issues.apache.org/jira/browse/CASSANDRA-5958 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston Priority: Minor When an unexpected property is present in cassandra.yaml (e.g. after upgrading), snakeyaml outputs the following message: Unable to find property 'some_property' on class: org.apache.cassandra.config.Config The error message is kind of counterintuitive because at first glance it seems to suggest the property is missing from the yaml file, when in fact the error is caused by the *presence* of an unrecognized property. I know if you read it carefully it says it can't find the property on the class, but this has confused more than one user. I think we catch this exception and wrap it in another exception that says something like this: Please remove 'some_property' from your cassandra.yaml. It is not recognized by this version of Cassandra. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5947) Sampling bug in metrics-core-2.0.3.jar used by Cassandra
J.B. Langston created CASSANDRA-5947: Summary: Sampling bug in metrics-core-2.0.3.jar used by Cassandra Key: CASSANDRA-5947 URL: https://issues.apache.org/jira/browse/CASSANDRA-5947 Project: Cassandra Issue Type: Bug Reporter: J.B. Langston There is a sampling bug in the version of the metrics library we're using in Cassandra. See https://github.com/codahale/metrics/issues/421. ExponentiallyDecayingSample is used by the Timer's histogram that is used in stress tool, and according to [~brandon.williams] it is also in a few other places like the dynamic snitch. The statistical theory involved in this bug goes over my head so i'm not sure if this would bug would meaningfully affect its usage by Cassandra. One of the comments on the bug mentions that it affects slow sampling rates (10 samples/min was the example given). We're currently distributing metrics-core-2.0.3.jar and according to the release nodes, this bug is fixed in 2.1.3: http://metrics.codahale.com/about/release-notes/#v2-1-3-aug-06-2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain
J.B. Langston created CASSANDRA-5911: Summary: Commit logs are not removed after nodetool flush or nodetool drain Key: CASSANDRA-5911 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston Priority: Minor Commit logs are not removed after nodetool flush or nodetool drain. This can lead to unnecessary commit log replay during startup. I've reproduced this on Apache Cassandra 1.2.8. Usually this isn't much of an issue but on a Solr-indexed column family in DSE, each replayed mutation has to be reindexed which can make startup take a long time (on the order of 20-30 min). Reproduction follows: {code} jblangston:bin jblangston$ ./cassandra /dev/null jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000 /dev/null jblangston:bin jblangston$ du -h ../commitlog 576M../commitlog jblangston:bin jblangston$ nodetool flush jblangston:bin jblangston$ du -h ../commitlog 576M../commitlog jblangston:bin jblangston$ nodetool drain jblangston:bin jblangston$ du -h ../commitlog 576M../commitlog jblangston:bin jblangston$ pkill java jblangston:bin jblangston$ du -h ../commitlog 576M../commitlog jblangston:bin jblangston$ ./cassandra -f | grep Replaying INFO 10:03:42,915 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log INFO 10:03:42,922 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log INFO 10:03:43,907 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log INFO 10:03:43,908 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log INFO 10:03:43,909 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log INFO 10:03:43,910 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log INFO 10:03:43,911 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log INFO 10:03:43,912 Replaying /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5900) Setting bloom filter fp chance to 1.0 causes ClassCastExceptions
J.B. Langston created CASSANDRA-5900: Summary: Setting bloom filter fp chance to 1.0 causes ClassCastExceptions Key: CASSANDRA-5900 URL: https://issues.apache.org/jira/browse/CASSANDRA-5900 Project: Cassandra Issue Type: Bug Components: Core Reporter: J.B. Langston In 1.2, we introduced the ability to turn SSTables off completely by setting fp chance to 1.0. It looks like there is a bug with this though. When it's set to one the following errors occur because AlwaysPresentFilter is not present in the switch statement here at https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/utils/FilterFactory.java#L91, and we default to Murmur3BloomFilter for an unknown type. Exception in thread main java.lang.ClassCastException: org.apache.cassandra.utils.AlwaysPresentFilter cannot be cast to org.apache.cassandra.utils.Murmur3BloomFilter at org.apache.cassandra.utils.FilterFactory.serializedSize(FilterFactory.java:91) at org.apache.cassandra.io.sstable.SSTableReader.getBloomFilterSerializedSize(SSTableReader.java:531) at org.apache.cassandra.metrics.ColumnFamilyMetrics$15.value(ColumnFamilyMetrics.java:273) at org.apache.cassandra.metrics.ColumnFamilyMetrics$15.value(ColumnFamilyMetrics.java:268) at org.apache.cassandra.db.ColumnFamilyStore.getBloomFilterDiskSpaceUsed(ColumnFamilyStore.java:1825) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira