from:"J.B. Langston \(JIRA\)"

[jira] [Updated] (CASSANDRA-14884) Move TWCS message "No compaction necessary for bucket size" to Trace level

2018-11-09 Thread J.B. Langston (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-14884:
--
  Assignee: J.B. Langston
Attachment: CASSANDRA-14884.patch
Status: Patch Available  (was: Open)

> Move TWCS message "No compaction necessary for bucket size" to Trace level
> --
>
> Key: CASSANDRA-14884
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14884
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: J.B. Langston
>Assignee: J.B. Langston
>Priority: Trivial
> Attachments: CASSANDRA-14884.patch
>
>
> When using TWCS, this message sometimes spams the debug logs:
> DEBUG 
> [CompactionExecutor:4993|https://datastax.jira.com/wiki/display/CompactionExecutor/4993]
>  2018-04-20 00:41:13,795 TimeWindowCompactionStrategy.java:304 - No 
> compaction necessary for bucket size 1 , key 152176320, now 152418240
> The similar message is already at trace level for LCS, so this patch changes 
> the message from TWCS to trace as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-14884) Move TWCS message "No compaction necessary for bucket size" to Trace level

2018-11-09 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-14884:
-

 Summary: Move TWCS message "No compaction necessary for bucket 
size" to Trace level
 Key: CASSANDRA-14884
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14884
 Project: Cassandra
  Issue Type: Improvement
  Components: Compaction
Reporter: J.B. Langston


When using TWCS, this message sometimes spams the debug logs:

DEBUG 
[CompactionExecutor:4993|https://datastax.jira.com/wiki/display/CompactionExecutor/4993]
 2018-04-20 00:41:13,795 TimeWindowCompactionStrategy.java:304 - No compaction 
necessary for bucket size 1 , key 152176320, now 152418240

The similar message is already at trace level for LCS, so this patch changes 
the message from TWCS to trace as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14522) sstableloader options assume the rpc/native interface is the same as the internode interface

2018-08-02 Thread J.B. Langston (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566897#comment-16566897
 ] 

J.B. Langston commented on CASSANDRA-14522:
---

Yes, it does appear to be fixed in trunk.

> sstableloader options assume the rpc/native interface is the same as the 
> internode interface
> 
>
> Key: CASSANDRA-14522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14522
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jeremy Hanna
>Assignee: Jeremy
>Priority: Major
>  Labels: lhf
> Attachments: CASSANDRA-14522.patch
>
>
> Currently, in the LoaderOptions for the BulkLoader, the user can give a list 
> of initial host addresses.  That's to do the initial connection to the 
> cluster but also to stream the sstables.  If you have two physical 
> interfaces, one for rpc, the other for internode traffic, then bulk loader 
> won't currently work.  It will throw an error such as:
> {quote}
> > sstableloader -v -u cassadmin -pw xxx -d 
> > 10.133.210.101,10.133.210.102,10.133.210.103,10.133.210.104 
> > /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of 
> /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl/mc-1-big-Data.db 
> /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl/mc-2-big-Data.db  
> to [/10.133.210.101, /10.133.210.103, /10.133.210.102, /10.133.210.104]
> progress: total: 100% 0  MB/s(avg: 0 MB/s)ERROR 10:16:05,311 [Stream 
> #9ed00130-6ff6-11e8-965c-93a78bf96e60] Streaming error occurred
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_101]
> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_101]
> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_101]
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) 
> ~[na:1.8.0_101]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189) 
> ~[na:1.8.0_101]
> at 
> org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:266)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:253) 
> ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212)
>  [cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_101]
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  [cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  ~[netty-all-4.0.54.Final.jar:4.0.54.Final]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]
> ERROR 10:16:05,312 [Stream #9ed00130-6ff6-11e8-965c-93a78bf96e60] Streaming 
> error occurred
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_101]
> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_101]
> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_101]
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) 
> ~[na:1.8.0_101]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189) 
> ~[na:1.8.0_101]
> at 
> org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:266)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:253) 
> ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212)
>

[jira] [Updated] (CASSANDRA-14522) sstableloader options assume the rpc/native interface is the same as the internode interface

2018-08-01 Thread J.B. Langston (JIRA)



 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-14522:
--
Attachment: CASSANDRA-14522.patch
Status: Patch Available  (was: Open)

There is a simpler fix. The Host object being iterated over in that loop has 
methods to get the listen address directly. You just need to change 
endpoint.getAddress to endpoint.getBroadcastAddress.  I have also attached a 
patch.  The patch is against Cassandra 3.0 and should merge forward cleanly.

Note: there is also a Host.getListenAddress method which returns the local 
listen address, but we want to use the broadcast address in case the 
sstableloader is run in a different DC that cannot communicate with a remote 
node over the local listen address.  

I also noticed that you checked in your changes to cassandra.yaml that you were 
using to test this. That should be reverted.

> sstableloader options assume the rpc/native interface is the same as the 
> internode interface
> 
>
> Key: CASSANDRA-14522
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14522
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jeremy Hanna
>Assignee: Jeremy
>Priority: Major
>  Labels: lhf
> Attachments: CASSANDRA-14522.patch
>
>
> Currently, in the LoaderOptions for the BulkLoader, the user can give a list 
> of initial host addresses.  That's to do the initial connection to the 
> cluster but also to stream the sstables.  If you have two physical 
> interfaces, one for rpc, the other for internode traffic, then bulk loader 
> won't currently work.  It will throw an error such as:
> {quote}
> > sstableloader -v -u cassadmin -pw xxx -d 
> > 10.133.210.101,10.133.210.102,10.133.210.103,10.133.210.104 
> > /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of 
> /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl/mc-1-big-Data.db 
> /var/lib/cassandra/commitlog/backup_tmp/test_bkup/bkup_tbl/mc-2-big-Data.db  
> to [/10.133.210.101, /10.133.210.103, /10.133.210.102, /10.133.210.104]
> progress: total: 100% 0  MB/s(avg: 0 MB/s)ERROR 10:16:05,311 [Stream 
> #9ed00130-6ff6-11e8-965c-93a78bf96e60] Streaming error occurred
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_101]
> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_101]
> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_101]
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) 
> ~[na:1.8.0_101]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189) 
> ~[na:1.8.0_101]
> at 
> org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:266)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86)
>  ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:253) 
> ~[cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212)
>  [cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_101]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_101]
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  [cassandra-all-3.0.15.2128.jar:3.0.15.2128]
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  ~[netty-all-4.0.54.Final.jar:4.0.54.Final]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]
> ERROR 10:16:05,312 [Stream #9ed00130-6ff6-11e8-965c-93a78bf96e60] Streaming 
> error occurred
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_101]
> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_101]
> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_101]
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) 
> ~[na:1.8.0_101]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189) 
> ~[na:1.8.0_101]
> at 
>

[jira] [Created] (CASSANDRA-12197) Integrate top threads command in nodetool

2016-07-13 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-12197:
-

 Summary: Integrate top threads command in nodetool
 Key: CASSANDRA-12197
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12197
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Priority: Minor


SJK (https://github.com/aragozin/jvm-tools) has a command called ttop that 
displays the top threads within the JVM, sorted either by CPU utilization or 
heap allocation rate. When diagnosing garbage collection or high cpu 
utilization, this is very helpful information.  It would be great if users can 
get this directly with nodetool without having to download something else.  SJK 
is Apache 2.0 licensed so it might be possible leverage its code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11939) Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms

2016-06-01 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-11939:
--
Description: 
It’s triggering my ocd that read and write latency columns are swapped in 
proxyhistograms vs cfhistograms. I guess the argument against changing it now 
is that it could screw with some peoples scripts or expectations, but it does 
make it hard to eyeball when you’re trying to compare local latencies vs 
coordinator latencies.

{code}
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50% 4.00 17.00770.00  8239  
   4
75% 5.00 24.00924.00 17084  
  17
95% 5.00 35.00  61214.00 51012  
  24
98% 6.00 35.00 126934.00105778  
  24
99% 6.00 72.00 152321.00152321  
  35
Min 0.00  9.00 36.0021  
   0
Max 6.00 86.00 263210.00  20924300  
1109

Percentile  Read Latency Write Latency Range Latency
(micros)  (micros)  (micros)
50%  1331.00535.00  11864.00
75% 17084.00642.00  20501.00
95%219342.00   1331.00  20501.00
98%315852.00   2759.00  20501.00
99%379022.00   3311.00  20501.00
Min   373.00 73.00   9888.00
Max379022.00   9887.00  20501.00
{code}

Ideally read and write latencies should be in the same order and the first and 
second columns on both so they’re directly aligned.  The sstables column should 
be moved to the 3rd column to make way.

  was:
It’s triggering my ocd that read and write latency columns are swapped in 
proxyhistograms vs cfhistograms. I guess the argument against changing it now 
is that it could screw with some peoples scripts or expectations, but it does 
make it hard to eyeball when you’re trying to compare local latencies vs 
coordinator latencies.

{code}
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50% 4.00 17.00770.00  8239  
   4
75% 5.00 24.00924.00 17084  
  17
95% 5.00 35.00  61214.00 51012  
  24
98% 6.00 35.00 126934.00105778  
  24
99% 6.00 72.00 152321.00152321  
  35
Min 0.00  9.00 36.0021  
   0
Max 6.00 86.00 263210.00  20924300  
1109

Percentile  Read Latency Write Latency Range Latency
(micros)  (micros)  (micros)
50%  1331.00535.00  11864.00
75% 17084.00642.00  20501.00
95%219342.00   1331.00  20501.00
98%315852.00   2759.00  20501.00
99%379022.00   3311.00  20501.00
Min   373.00 73.00   9888.00
Max379022.00   9887.00  20501.00
{code}

Ideally read and write latencies should be in the same order and the first and 
second columns on both so they’re directly comparable.  The sstables column 
should be moved to the 3rd column to make way.


> Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms
> -
>
> Key: CASSANDRA-11939
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11939
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: J.B. Langston
>Priority: Minor
>
> It’s triggering my ocd that read and write latency columns are swapped in 
> proxyhistograms vs cfhistograms. I guess the argument against changing it now 
> is that it could screw with some peoples scripts or expectations, but it does 
> make it hard to eyeball when you’re trying to compare local latencies vs 
> coordinator latencies.
> {code}
> Percentile  SSTables Write Latency  Read

[jira] [Updated] (CASSANDRA-11939) Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms

2016-06-01 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-11939:
--
Description: 
It’s triggering my ocd that read and write latency columns are swapped in 
proxyhistograms vs cfhistograms. I guess the argument against changing it now 
is that it could screw with some peoples scripts or expectations, but it does 
make it hard to eyeball when you’re trying to compare local latencies vs 
coordinator latencies.

{code}
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50% 4.00 17.00770.00  8239  
   4
75% 5.00 24.00924.00 17084  
  17
95% 5.00 35.00  61214.00 51012  
  24
98% 6.00 35.00 126934.00105778  
  24
99% 6.00 72.00 152321.00152321  
  35
Min 0.00  9.00 36.0021  
   0
Max 6.00 86.00 263210.00  20924300  
1109

Percentile  Read Latency Write Latency Range Latency
(micros)  (micros)  (micros)
50%  1331.00535.00  11864.00
75% 17084.00642.00  20501.00
95%219342.00   1331.00  20501.00
98%315852.00   2759.00  20501.00
99%379022.00   3311.00  20501.00
Min   373.00 73.00   9888.00
Max379022.00   9887.00  20501.00
{code}

Ideally read and write latencies should be in the same order and the first and 
second columns on both so they’re directly comparable.  The sstables column 
should be moved to the 3rd column to make way.

  was:
It’s triggering my ocd that read and write latency columns are swapped in 
proxyhistograms vs cfhistograms. I guesst the argument against changing it now 
is that it could screw with some peoples scripts or expectations, but it does 
make it hard to eyeball when you’re trying to compare local latencies vs 
coordinator latencies.

{code}
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50% 4.00 17.00770.00  8239  
   4
75% 5.00 24.00924.00 17084  
  17
95% 5.00 35.00  61214.00 51012  
  24
98% 6.00 35.00 126934.00105778  
  24
99% 6.00 72.00 152321.00152321  
  35
Min 0.00  9.00 36.0021  
   0
Max 6.00 86.00 263210.00  20924300  
1109

Percentile  Read Latency Write Latency Range Latency
(micros)  (micros)  (micros)
50%  1331.00535.00  11864.00
75% 17084.00642.00  20501.00
95%219342.00   1331.00  20501.00
98%315852.00   2759.00  20501.00
99%379022.00   3311.00  20501.00
Min   373.00 73.00   9888.00
Max379022.00   9887.00  20501.00
{code}

Ideally read and write latencies should be in the same order and the first and 
second columns on both so they’re directly comparable.  The sstables column 
should be moved to the 3rd column to make way.


> Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms
> -
>
> Key: CASSANDRA-11939
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11939
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: J.B. Langston
>Priority: Minor
>
> It’s triggering my ocd that read and write latency columns are swapped in 
> proxyhistograms vs cfhistograms. I guess the argument against changing it now 
> is that it could screw with some peoples scripts or expectations, but it does 
> make it hard to eyeball when you’re trying to compare local latencies vs 
> coordinator latencies.
> {code}
> Percentile  SSTables Write Latency

[jira] [Created] (CASSANDRA-11939) Read and Write Latency columns are swapped in proxyhistograms vs cfhistograms

2016-06-01 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-11939:
-

 Summary: Read and Write Latency columns are swapped in 
proxyhistograms vs cfhistograms
 Key: CASSANDRA-11939
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11939
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: J.B. Langston
Priority: Minor


It’s triggering my ocd that read and write latency columns are swapped in 
proxyhistograms vs cfhistograms. I guesst the argument against changing it now 
is that it could screw with some peoples scripts or expectations, but it does 
make it hard to eyeball when you’re trying to compare local latencies vs 
coordinator latencies.

{code}
Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count
  (micros)  (micros)   (bytes)
50% 4.00 17.00770.00  8239  
   4
75% 5.00 24.00924.00 17084  
  17
95% 5.00 35.00  61214.00 51012  
  24
98% 6.00 35.00 126934.00105778  
  24
99% 6.00 72.00 152321.00152321  
  35
Min 0.00  9.00 36.0021  
   0
Max 6.00 86.00 263210.00  20924300  
1109

Percentile  Read Latency Write Latency Range Latency
(micros)  (micros)  (micros)
50%  1331.00535.00  11864.00
75% 17084.00642.00  20501.00
95%219342.00   1331.00  20501.00
98%315852.00   2759.00  20501.00
99%379022.00   3311.00  20501.00
Min   373.00 73.00   9888.00
Max379022.00   9887.00  20501.00
{code}

Ideally read and write latencies should be in the same order and the first and 
second columns on both so they’re directly comparable.  The sstables column 
should be moved to the 3rd column to make way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-11664) Tab completion in cqlsh doesn't work for capitalized letters

2016-04-26 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-11664:
-

 Summary: Tab completion in cqlsh doesn't work for capitalized 
letters
 Key: CASSANDRA-11664
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11664
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston
Priority: Minor


Tab completion in cqlsh doesn't work for capitalized letters, either in 
keyspace names or table names. Typing quotes and a corresponding capital letter 
should complete the table/keyspace name and the closing quote.

{code}

cqlsh> create keyspace "Test" WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};
cqlsh> use "Tes
cqlsh> use tes
cqlsh> use Test;
InvalidRequest: code=2200 [Invalid query] message="Keyspace 'test' does not 
exist"
cqlsh> use "Test";
cqlsh:Test> drop keyspace "Test"
cqlsh:Test> create table "TestTable" (a text primary key, b text);
cqlsh:Test> select * from "TestTable";

 a | b
---+---

(0 rows)
cqlsh:Test> select * from "Test
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8969) Add indication in cassandra.yaml that rpc timeouts going too high will cause memory build up

2016-03-23 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208691#comment-15208691
 ] 

J.B. Langston commented on CASSANDRA-8969:
--

I agree this could be a good warning to have. I've seen a lot of customers 
naively increase the timeout. Usually it's caused by I/O not keeping up with 
requests, but a lot of users won't take the time to figure that out. They just 
see their application timing out and they see something in cassandra.yaml 
called timeout so they increase it without thinking of the cost.  Now they have 
GC death spiral and OOM to contend with in addition to the original problem.

> Add indication in cassandra.yaml that rpc timeouts going too high will cause 
> memory build up
> 
>
> Key: CASSANDRA-8969
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8969
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Jeremy Hanna
>Assignee: Jeremy Hanna
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 8969.txt
>
>
> It would be helpful to communicate that setting the rpc timeouts too high may 
> cause memory problems on the server as it can become overloaded and has to 
> retain the in flight requests in memory.  I'll get this done but just adding 
> the ticket as a placeholder for memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8969) Add indication in cassandra.yaml that rpc timeouts going too high will cause memory build up

2016-03-23 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208691#comment-15208691
 ] 

J.B. Langston edited comment on CASSANDRA-8969 at 3/23/16 4:19 PM:
---

I agree this could be a good warning to have. I've seen a lot of users naively 
increase the timeout. Usually it's caused by I/O not keeping up with requests, 
but a lot of users won't take the time to figure that out. They just see their 
application timing out and they see something in cassandra.yaml called timeout 
so they increase it without thinking of the cost.  Now they have GC death 
spiral and OOM to contend with in addition to the original problem.


was (Author: jblangs...@datastax.com):
I agree this could be a good warning to have. I've seen a lot of customers 
naively increase the timeout. Usually it's caused by I/O not keeping up with 
requests, but a lot of users won't take the time to figure that out. They just 
see their application timing out and they see something in cassandra.yaml 
called timeout so they increase it without thinking of the cost.  Now they have 
GC death spiral and OOM to contend with in addition to the original problem.

> Add indication in cassandra.yaml that rpc timeouts going too high will cause 
> memory build up
> 
>
> Key: CASSANDRA-8969
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8969
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Jeremy Hanna
>Assignee: Jeremy Hanna
>Priority: Minor
>  Labels: lhf
> Fix For: 3.x
>
> Attachments: 8969.txt
>
>
> It would be helpful to communicate that setting the rpc timeouts too high may 
> cause memory problems on the server as it can become overloaded and has to 
> retain the in flight requests in memory.  I'll get this done but just adding 
> the ticket as a placeholder for memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10593) Unintended interactions between commitlog archiving and commitlog recycling

2015-10-26 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-10593:
--
Description: 
Currently the comments in commitlog_archiving.properties suggest using either 
cp or ln for the archive_command.  

Using ln is problematic because commitlog recycling marks segments as recycled 
once the corresponding memtables are flushed and Cassandra will no longer 
replay them. This means it's only possible to do PITR on any records that were 
written since the last flush.

Using cp works, and this is currently how OpsCenter does for PITR, however 
[~brandon.williams] has pointed out this could have some performance impact 
because of the additional I/O overhead of copying the commitlog segments.

Starting in 2.1, we can disable commit log recycling in cassandra.yaml so I 
thought this would allow me to do PITR without the extra overhead of using cp.  
However, when I disable commitlog recycling and try to do a PITR, Cassandra 
blows up when trying to replay the restored commit logs:

{code}
ERROR 16:56:42  Exception encountered during startup
java.lang.IllegalStateException: Cannot safely construct descriptor for 
segment, as name and header descriptors do not match ((4,1445878452545) vs 
(4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log
at 
org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive(CommitLogArchiver.java:207)
 ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:116) 
~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352) 
~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335) 
~[dse-core-4.8.0.jar:4.8.0]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) 
~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at com.datastax.bdp.DseModule.main(DseModule.java:75) 
[dse-core-4.8.0.jar:4.8.0]
java.lang.IllegalStateException: Cannot safely construct descriptor for 
segment, as name and header descriptors do not match ((4,1445878452545) vs 
(4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log
at 
org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive(CommitLogArchiver.java:207)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:116)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537)
at com.datastax.bdp.DseModule.main(DseModule.java:75)
Exception encountered during startup: Cannot safely construct descriptor for 
segment, as name and header descriptors do not match ((4,1445878452545) vs 
(4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log
INFO  16:56:42  DSE shutting down...
INFO  16:56:42  All plugins are stopped.
ERROR 16:56:42  Exception in thread Thread[Thread-2,5,main]
java.lang.AssertionError: null
at 
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1403) 
~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:196) 
~[dse-core-4.8.0.jar:4.8.0]
at com.datastax.bdp.server.DseDaemon.preStop(DseDaemon.java:426) 
~[dse-core-4.8.0.jar:4.8.0]
at com.datastax.bdp.server.DseDaemon.safeStop(DseDaemon.java:436) 
~[dse-core-4.8.0.jar:4.8.0]
at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:676) 
~[dse-core-4.8.0.jar:4.8.0]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_31]
{code}

For the sake of completeness, I also tested using cp for the archive_command 
and commitlog recycling disabled, and PITR works as expected, but this of 
course defeats the point.

It would be good to have some guidance on what is supported here. If ln isn't 
expected to work at all, it shouldn't be documented as an acceptable option for 
the archive_command in commitlog_archiving.properties.  If it should work with 
commitlog recycling disabled, the bug causing the IllegalStateException needs 
to be fixed. 

It would also be good to do some testing and quantify the performance impact of 
enabling commitlog archiving using cp as the archve_command.

I realize there are several different issues described here, so maybe they 
should be separate JIRAs, but first I wanted to just clarify whether we want to 
support ln at all, and we can go from there.

  was:
Currently the comments in commitlog_archiving.properties suggest using either 
cp or ln for the archive_command.  

Using ln is problematic because commitlog recycling marks segments as recycled 
once the corresponding memtables are flushed and Cassandra will

[jira] [Created] (CASSANDRA-10593) Unintended interactions between commitlog archiving and commitlog recycling

2015-10-26 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-10593:
-

 Summary: Unintended interactions between commitlog archiving and 
commitlog recycling
 Key: CASSANDRA-10593
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10593
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


Currently the comments in commitlog_archiving.properties suggest using either 
cp or ln for the archive_command.  

Using ln is problematic because commitlog recycling marks segments as recycled 
once the corresponding memtables are flushed and Cassandra will no longer be 
replay them. This means it's only possible to do PITR on any records that were 
written since the last flush.

Using cp works, and this is currently how OpsCenter does for PITR, however 
[~brandon.williams] has pointed out this could have some performance impact 
because of the additional I/O overhead of copying the commitlog segments.

Starting in 2.1, we can disable commit log recycling in cassandra.yaml so I 
thought this would allow me to do PITR without the extra overhead of using cp.  
However, when I disable commitlog recycling and try to do a PITR, Cassandra 
blows up when trying to replay the restored commit logs:

{code}
ERROR 16:56:42  Exception encountered during startup
java.lang.IllegalStateException: Cannot safely construct descriptor for 
segment, as name and header descriptors do not match ((4,1445878452545) vs 
(4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log
at 
org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive(CommitLogArchiver.java:207)
 ~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:116) 
~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352) 
~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335) 
~[dse-core-4.8.0.jar:4.8.0]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) 
~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at com.datastax.bdp.DseModule.main(DseModule.java:75) 
[dse-core-4.8.0.jar:4.8.0]
java.lang.IllegalStateException: Cannot safely construct descriptor for 
segment, as name and header descriptors do not match ((4,1445878452545) vs 
(4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log
at 
org.apache.cassandra.db.commitlog.CommitLogArchiver.maybeRestoreArchive(CommitLogArchiver.java:207)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:116)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:335)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537)
at com.datastax.bdp.DseModule.main(DseModule.java:75)
Exception encountered during startup: Cannot safely construct descriptor for 
segment, as name and header descriptors do not match ((4,1445878452545) vs 
(4,1445876822565)): /opt/dse/backup/CommitLog-4-1445876822565.log
INFO  16:56:42  DSE shutting down...
INFO  16:56:42  All plugins are stopped.
ERROR 16:56:42  Exception in thread Thread[Thread-2,5,main]
java.lang.AssertionError: null
at 
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1403) 
~[cassandra-all-2.1.9.791.jar:2.1.9.791]
at com.datastax.bdp.gms.DseState.setActiveStatus(DseState.java:196) 
~[dse-core-4.8.0.jar:4.8.0]
at com.datastax.bdp.server.DseDaemon.preStop(DseDaemon.java:426) 
~[dse-core-4.8.0.jar:4.8.0]
at com.datastax.bdp.server.DseDaemon.safeStop(DseDaemon.java:436) 
~[dse-core-4.8.0.jar:4.8.0]
at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:676) 
~[dse-core-4.8.0.jar:4.8.0]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_31]
{code}

For the sake of completeness, I also tested using cp for the archive_command 
and commitlog recycling disabled, and PITR works as expected, but this of 
course defeats the point.

It would be good to have some guidance on what is supported here. If ln isn't 
expected to work at all, it shouldn't be documented as an acceptable option for 
the archive_command in commitlog_archiving.properties.  If it should work with 
commitlog recycling disabled, the bug causing the IllegalStateException needs 
to be fixed. 

It would also be good to do some testing and quantify the performance impact of 
enabling commitlog archiving using cp as the archve_command.

I realize there are several different issues described here, so maybe they 
should be separate JIRAs, but first I wanted to just clarify whether we want to 
support ln at all, and we can go from there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8720) Provide tools for finding wide row/partition keys

2015-08-24 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709984#comment-14709984
 ] 

J.B. Langston commented on CASSANDRA-8720:
--

Specifically, what I would like to see: a command-line tool that will list 
partition keys of partitions over a specified number of cells and/or bytes, 
along with the size of the each partition in cells and bytes.  This can be an 
offline tool if it's easier to implement that way.

 Provide tools for finding wide row/partition keys
 -

 Key: CASSANDRA-8720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston

 Multiple users have requested some sort of tool to help identify wide row 
 keys. They get into a situation where they know a wide row/partition has been 
 inserted and it's causing problems for them but they have no idea what the 
 row key is in order to remove it.  
 Maintaining the widest row key currently encountered and displaying it in 
 cfstats would be one possible approach.
 Another would be an offline tool (possibly an enhancement to sstablekeys) to 
 show the number of columns/bytes per key in each sstable. If a tool to 
 aggregate the information at a CF-level could be provided that would be a 
 bonus, but it shouldn't be too hard to write a script wrapper to aggregate 
 them if not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8720) Provide tools for finding wide row/partition keys

2015-08-24 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710030#comment-14710030
]

J.B. Langston commented on CASSANDRA-8720:
--

Looks like we crossed over each other's comments. I think if this offline tool
needs to go through the motions of compacting without actually writing out new
files or deleting the old ones, then that would be fine. Of course it would
require lots of I/O and people would need to be aware of that, but in some
cases I think they'd be willing to accept that in order to identify large
partitions.

Provide tools for finding wide row/partition keys
-

Key: CASSANDRA-8720
URL: https://issues.apache.org/jira/browse/CASSANDRA-8720
Project: Cassandra
Issue Type: Improvement
Reporter: J.B. Langston

Multiple users have requested some sort of tool to help identify wide row
keys. They get into a situation where they know a wide row/partition has been
inserted and it's causing problems for them but they have no idea what the
row key is in order to remove it.
Maintaining the widest row key currently encountered and displaying it in
cfstats would be one possible approach.
Another would be an offline tool (possibly an enhancement to sstablekeys) to
show the number of columns/bytes per key in each sstable. If a tool to
aggregate the information at a CF-level could be provided that would be a
bonus, but it shouldn't be too hard to write a script wrapper to aggregate
them if not.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9585) Make truncate table X an alias for truncate X

2015-06-11 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-9585:


 Summary: Make truncate table X an alias for truncate X
 Key: CASSANDRA-9585
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9585
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


CQL syntax is inconsistent: it's drop table X but truncate X. It used to 
trip me up all the time until I wrapped my brain around this inconsistency and 
it still triggers a tiny bout of OCD every time I type it.  I realize it's too 
late to change it,  but why not have both? truncate table X is also 
consistent with the syntax in SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9585) Make truncate table X an alias for truncate X

2015-06-11 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-9585:
-
Priority: Trivial  (was: Major)

 Make truncate table X an alias for truncate X
 -

 Key: CASSANDRA-9585
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9585
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston
Priority: Trivial

 CQL syntax is inconsistent: it's drop table X but truncate X. It used to 
 trip me up all the time until I wrapped my brain around this inconsistency 
 and it still triggers a tiny bout of OCD every time I type it.  I realize 
 it's too late to change it,  but why not have both? truncate table X is 
 also consistent with the syntax in SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9325) cassandra-stress requires keystore but provides no way to configure it

2015-05-07 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-9325:


 Summary: cassandra-stress requires keystore but provides no way to 
configure it
 Key: CASSANDRA-9325
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9325
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


Even though it shouldn't be required unless client certificate authentication 
is enabled, the stress tool is looking for a keystore in the default location 
of conf/.keystore with the default password of cassandra. There is no command 
line option to override these defaults so you have to provide a keystore that 
satisfies the default. It looks for conf/.keystore in the working directory, so 
you need to create this in the directory you are running cassandra-stress 
from.It doesn't really matter what's in the keystore; it just needs to exist in 
the expected location and have a password of cassandra.

Since the keystore might be required if client certificate authentication is 
enabled, we need to add -transport parameters for keystore and 
keystore-password.  These should be optional unless client certificate 
authentication is enabled on the server.

In case it wasn't apparent, this is for Cassandra 2.1 and later's stress tool.  
I actually had even more problems getting Cassandra 2.0's stress tool working 
with SSL and gave up on it.  We probably don't need to fix 2.0; we can just 
document that it doesn't support SSL and recommend using 2.1 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it

2015-05-07 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

J.B. Langston updated CASSANDRA-9325:
-
Description:
Even though it shouldn't be required unless client certificate authentication
is enabled, the stress tool is looking for a keystore in the default location
of conf/.keystore with the default password of cassandra. There is no command
line option to override these defaults so you have to provide a keystore that
satisfies the default. It looks for conf/.keystore in the working directory, so
you need to create this in the directory you are running cassandra-stress
from.It doesn't really matter what's in the keystore; it just needs to exist in
the expected location and have a password of cassandra.

Since the keystore might be required if client certificate authentication is
enabled, we need to add -transport parameters for keystore and
keystore-password. Ideally, these should be optional and stress shouldn't
require the keystore unless client certificate authentication is enabled on the
server.

In case it wasn't apparent, this is for Cassandra 2.1 and later's stress tool.
I actually had even more problems getting Cassandra 2.0's stress tool working
with SSL and gave up on it. We probably don't need to fix 2.0; we can just
document that it doesn't support SSL and recommend using 2.1 instead.

was:
Even though it shouldn't be required unless client certificate authentication
is enabled, the stress tool is looking for a keystore in the default location
of conf/.keystore with the default password of cassandra. There is no command
line option to override these defaults so you have to provide a keystore that
satisfies the default. It looks for conf/.keystore in the working directory, so
you need to create this in the directory you are running cassandra-stress
from.It doesn't really matter what's in the keystore; it just needs to exist in
the expected location and have a password of cassandra.

Since the keystore might be required if client certificate authentication is
enabled, we need to add -transport parameters for keystore and
keystore-password. These should be optional unless client certificate
authentication is enabled on the server.

cassandra-stress requires keystore for SSL but provides no way to configure it
--

Key: CASSANDRA-9325
URL: https://issues.apache.org/jira/browse/CASSANDRA-9325
Project: Cassandra
Issue Type: Bug
Reporter: J.B. Langston

Even though it shouldn't be required unless client certificate authentication
is enabled, the stress tool is looking for a keystore in the default location
of conf/.keystore with the default password of cassandra. There is no command
line option to override these defaults so you have to provide a keystore that
satisfies the default. It looks for conf/.keystore in the working directory,
so you need to create this in the directory you are running cassandra-stress
from.It doesn't really matter what's in the keystore; it just needs to exist
in the expected location and have a password of cassandra.
Since the keystore might be required if client certificate authentication is
enabled, we need to add -transport parameters for keystore and
keystore-password. Ideally, these should be optional and stress shouldn't
require the keystore unless client certificate authentication is enabled on
the server.
In case it wasn't apparent, this is for Cassandra 2.1 and later's stress
tool. I actually had even more problems getting Cassandra 2.0's stress tool
working with SSL and gave up on it. We probably don't need to fix 2.0; we
can just document that it doesn't support SSL and recommend using 2.1 instead.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9325) cassandra-stress requires keystore for SSL but provides no way to configure it

2015-05-07 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

J.B. Langston updated CASSANDRA-9325:
-
Summary: cassandra-stress requires keystore for SSL but provides no way to
configure it (was: cassandra-stress requires keystore but provides no way to
configure it)

cassandra-stress requires keystore for SSL but provides no way to configure it
--

Key: CASSANDRA-9325
URL: https://issues.apache.org/jira/browse/CASSANDRA-9325
Project: Cassandra
Issue Type: Bug
Reporter: J.B. Langston

Even though it shouldn't be required unless client certificate authentication
is enabled, the stress tool is looking for a keystore in the default location
of conf/.keystore with the default password of cassandra. There is no command
line option to override these defaults so you have to provide a keystore that
satisfies the default. It looks for conf/.keystore in the working directory,
so you need to create this in the directory you are running cassandra-stress
from.It doesn't really matter what's in the keystore; it just needs to exist
in the expected location and have a password of cassandra.
Since the keystore might be required if client certificate authentication is
enabled, we need to add -transport parameters for keystore and
keystore-password. These should be optional unless client certificate
authentication is enabled on the server.
In case it wasn't apparent, this is for Cassandra 2.1 and later's stress
tool. I actually had even more problems getting Cassandra 2.0's stress tool
working with SSL and gave up on it. We probably don't need to fix 2.0; we
can just document that it doesn't support SSL and recommend using 2.1 instead.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-9308) Decouple streaming from secondary index rebuild

2015-05-05 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529067#comment-14529067
 ] 

J.B. Langston edited comment on CASSANDRA-9308 at 5/5/15 7:03 PM:
--

3.0 is a long way off for a lot of production users though.  And for the 
scenario Brandon brings up, I think logging a warning for the user to run 
repair may be preferable to forcing them to go through the whole bootstrap and 
reindex process again, which in some cases can take days.


was (Author: jblangs...@datastax.com):
3.0 is a long way off for a lot of production users though.

 Decouple streaming from secondary index rebuild
 ---

 Key: CASSANDRA-9308
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9308
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston

 Currently, streaming is not considered complete until any secondary indexes 
 on the table being streamed have been rebuilt.  If any source replicas go 
 down after streaming completes, but before the secondary indexes have been 
 rebuilt, it will cause the bootstrap to fail, requiring the user to go 
 through the whole bootstrap process again. Ideally, the two should be 
 decoupled so that once the streaming is complete, the new node can complete 
 the secondary index rebuild and successfully boostrap regardless of the 
 status of the source replicas at that point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9308) Decouple streaming from secondary index rebuild

2015-05-05 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529067#comment-14529067
 ] 

J.B. Langston commented on CASSANDRA-9308:
--

3.0 is a long way off for a lot of production users though.

 Decouple streaming from secondary index rebuild
 ---

 Key: CASSANDRA-9308
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9308
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston

 Currently, streaming is not considered complete until any secondary indexes 
 on the table being streamed have been rebuilt.  If any source replicas go 
 down after streaming completes, but before the secondary indexes have been 
 rebuilt, it will cause the bootstrap to fail, requiring the user to go 
 through the whole bootstrap process again. Ideally, the two should be 
 decoupled so that once the streaming is complete, the new node can complete 
 the secondary index rebuild and successfully boostrap regardless of the 
 status of the source replicas at that point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9308) Decouple streaming from secondary index rebuild

2015-05-05 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-9308:


 Summary: Decouple streaming from secondary index rebuild
 Key: CASSANDRA-9308
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9308
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


Currently, streaming is not considered complete until any secondary indexes on 
the table being streamed have been rebuilt.  If any source replicas go down 
after streaming completes, but before the secondary indexes have been rebuilt, 
it will cause the bootstrap to fail, requiring the user to go through the whole 
bootstrap process again. Ideally, the two should be decoupled so that once the 
streaming is complete, the new node can complete the secondary index rebuild 
and successfully boostrap regardless of the status of the source replicas at 
that point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons

2015-02-13 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14320585#comment-14320585
 ] 

J.B. Langston commented on CASSANDRA-8730:
--

Looks like these changes are fairly isolated to two classes... would this be 
feasible to backport to 2.0?

 Optimize UUIDType comparisons
 -

 Key: CASSANDRA-8730
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Assignee: Benedict
 Fix For: 2.1.4


 Compaction is slow on tables using compound keys containing UUIDs due to 
 being CPU bound by key comparison.  [~benedict] said he sees some easy 
 optimizations that could be made for UUID comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons

2015-02-09 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312474#comment-14312474
 ] 

J.B. Langston commented on CASSANDRA-8730:
--

12.88MB/sec from the latest code.

 Optimize UUIDType comparisons
 -

 Key: CASSANDRA-8730
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Assignee: Benedict
 Fix For: 2.1.4


 Compaction is slow on tables using compound keys containing UUIDs due to 
 being CPU bound by key comparison.  [~benedict] said he sees some easy 
 optimizations that could be made for UUID comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons

2015-02-09 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312266#comment-14312266
 ] 

J.B. Langston commented on CASSANDRA-8730:
--

[~iamaleksey] This is the schema I am testing against. It uses uuid and 
timestamp (not timeuuid):

{code}
CREATE TABLE x ( 
a bigint, 
b bigint, 
c timestamp, 
d uuid, 
e text, 
f text, 
g text, 
h float, 
PRIMARY KEY ((a, b), c, d) 
) WITH CLUSTERING ORDER BY (ts DESC, uuid DESC) AND 
bloom_filter_fp_chance=0.01 AND 
caching='KEYS_ONLY' AND 
comment='' AND 
dclocal_read_repair_chance=0.10 AND 
gc_grace_seconds=0 AND 
index_interval=128 AND 
read_repair_chance=0.00 AND 
replicate_on_write='true' AND 
populate_io_cache_on_flush='false' AND 
default_time_to_live=0 AND 
speculative_retry='99.0PERCENTILE' AND 
memtable_flush_period_in_ms=0 AND 
compaction={'class': 'SizeTieredCompactionStrategy'} AND 
compression={'sstable_compression': 'LZ4Compressor'};
{code}

 Optimize UUIDType comparisons
 -

 Key: CASSANDRA-8730
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Assignee: Benedict
 Fix For: 2.1.4


 Compaction is slow on tables using compound keys containing UUIDs due to 
 being CPU bound by key comparison.  [~benedict] said he sees some easy 
 optimizations that could be made for UUID comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8730) Optimize UUIDType comparisons

2015-02-06 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309901#comment-14309901
 ] 

J.B. Langston edited comment on CASSANDRA-8730 at 2/6/15 8:59 PM:
--

Looks like a bit of improvement: 12.63MB/sec vs 10.19MB/sec.  Looks like it 
threw away more data this time. I guess some tombstones passed gc grace since I 
last tested. Therefore, I'm not sure how apples-to-apples the comparison is, so 
I'm going to try again while setting my clock back to the date when I ran it 
before.

Before:

{code}
INFO 15:19:05 Compacted 4 sstables to 
[./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,].
 9,183,829,489 bytes to 9,180,536,394 (~99% of original) in 901,172ms = 
9.715395MB/s. 311,495 total partitions merged to 253,490. Partition merge 
counts were {1:195485, 2:58005, }
{code}

After:

{code}
INFO  20:47:24 Compacted 4 sstables to 
[./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,].
  8,152,562,772 bytes to 4,659,100,313 (~57% of original) in 615,577ms = 
7.218048MB/s.  311,495 total partitions merged to 80,012.  Partition merge 
counts were {1:195485, 2:58005, }
{code}


was (Author: jblangs...@datastax.com):
Looks like a bit of improvement: 12.63MB/sec vs 10.19MB/sec.  Looks like it 
threw away more data this time. I guess some tombstones passed gc grace since I 
last tested. Therefore, I'm not sure how apples-to-apples the comparison is, so 
I'm going to try again while setting my clock back to the date when I ran it 
before.

Before:

{code}
INFO 15:19:05 Compacted 4 sstables to 
[./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,].
 9,183,829,489 bytes to 9,180,536,394 (~99% of original) in 901,172ms = 
9.715395MB/s. 311,495 total partitions merged to 253,490. Partition merge 
counts were {1:195485, 2:58005, }
{code}

After:

{code}
INFO  20:47:24 Completed flushing 
/Users/jblangston/repos/cassandra/bin/./../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-48-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1423254980101, 
position=758851)
INFO  20:47:24 Compacted 4 sstables to 
[./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,].
  8,152,562,772 bytes to 4,659,100,313 (~57% of original) in 615,577ms = 
7.218048MB/s.  311,495 total partitions merged to 80,012.  Partition merge 
counts were {1:195485, 2:58005, }
{code}

 Optimize UUIDType comparisons
 -

 Key: CASSANDRA-8730
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Assignee: Benedict
 Fix For: 2.1.4


 Compaction is slow on tables using compound keys containing UUIDs due to 
 being CPU bound by key comparison.  [~benedict] said he sees some easy 
 optimizations that could be made for UUID comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons

2015-02-06 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309901#comment-14309901
 ] 

J.B. Langston commented on CASSANDRA-8730:
--

Looks like a bit of improvement: 12.63MB/sec vs 10.19MB/sec.  Looks like it 
threw away more data this time. I guess some tombstones passed gc grace since I 
last tested. Therefore, I'm not sure how apples-to-apples the comparison is, so 
I'm going to try again while setting my clock back to the date when I ran it 
before.

Before:

{code}
INFO 15:19:05 Compacted 4 sstables to 
[./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,].
 9,183,829,489 bytes to 9,180,536,394 (~99% of original) in 901,172ms = 
9.715395MB/s. 311,495 total partitions merged to 253,490. Partition merge 
counts were {1:195485, 2:58005, }
{code}

After:

{code}
INFO  20:47:24 Completed flushing 
/Users/jblangston/repos/cassandra/bin/./../data/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-48-Data.db
 (42 bytes) for commitlog position ReplayPosition(segmentId=1423254980101, 
position=758851)
INFO  20:47:24 Compacted 4 sstables to 
[./../data/data/ocean/tbl_metric_data_dyn-0f578640a59211e4a5a2ef9f87394ca6/ocean-tbl_metric_data_dyn-ka-144263,].
  8,152,562,772 bytes to 4,659,100,313 (~57% of original) in 615,577ms = 
7.218048MB/s.  311,495 total partitions merged to 80,012.  Partition merge 
counts were {1:195485, 2:58005, }
{code}

 Optimize UUIDType comparisons
 -

 Key: CASSANDRA-8730
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Assignee: Benedict
 Fix For: 2.1.4


 Compaction is slow on tables using compound keys containing UUIDs due to 
 being CPU bound by key comparison.  [~benedict] said he sees some easy 
 optimizations that could be made for UUID comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8730) Optimize UUIDType comparisons

2015-02-06 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309953#comment-14309953
 ] 

J.B. Langston commented on CASSANDRA-8730:
--

Hmm, setting my clock back didn't help. I still got the same results as before. 
I'm not sure why it did not compact away almost half the data before.

 Optimize UUIDType comparisons
 -

 Key: CASSANDRA-8730
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Assignee: Benedict
 Fix For: 2.1.4


 Compaction is slow on tables using compound keys containing UUIDs due to 
 being CPU bound by key comparison.  [~benedict] said he sees some easy 
 optimizations that could be made for UUID comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8559) OOM caused by large tombstone warning.

2015-02-04 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305113#comment-14305113
 ] 

J.B. Langston commented on CASSANDRA-8559:
--

I have seen another user hit this. In this case, the log message was triggered 
by a bad query and an even worse data model, but it's too easy for new users to 
Cassandra to stumble across this. If someone shoots themselves in the foot, we 
should try not to blow their whole leg off.  So I'm +1 on having a limit to the 
amount of information we log.

 OOM caused by large tombstone warning.
 --

 Key: CASSANDRA-8559
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8559
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: 2.0.11 / 2.1
Reporter: Dominic Letz
  Labels: tombstone
 Fix For: 2.0.13

 Attachments: Selection_048.png, cassandra-2.0.11-8559.txt, 
 stacktrace.log


 When running with high amount of tombstones the error message generation from 
 CASSANDRA-6117 can lead to out of memory situation with the default setting.
 Attached a heapdump viewed in visualvm showing how this construct created two 
 777mb strings to print the error message for a read query and then crashed 
 OOM.
 {code}
 if (respectTombstoneThresholds()  columnCounter.ignored()  
 DatabaseDescriptor.getTombstoneWarnThreshold())
 {
 StringBuilder sb = new StringBuilder();
 CellNameType type = container.metadata().comparator;
 for (ColumnSlice sl : slices)
 {
 assert sl != null;
 sb.append('[');
 sb.append(type.getString(sl.start));
 sb.append('-');
 sb.append(type.getString(sl.finish));
 sb.append(']');
 }
 logger.warn(Read {} live and {} tombstoned cells in {}.{} (see 
 tombstone_warn_threshold). {} columns was requested, slices={}, delInfo={},
 columnCounter.live(), columnCounter.ignored(), 
 container.metadata().ksName, container.metadata().cfName, count, sb, 
 container.deletionInfo());
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8730) Optimize UUIDType comparisons

2015-02-03 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-8730:


 Summary: Optimize UUIDType comparisons
 Key: CASSANDRA-8730
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8730
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston
Assignee: Benedict


Compaction is slow on tables using compound keys containing UUIDs due to being 
CPU bound by key comparison.  [~benedict] said he sees some easy optimizations 
that could be made for UUID comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8720) Provide tools for finding wide row/partition keys

2015-02-02 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-8720:


 Summary: Provide tools for finding wide row/partition keys
 Key: CASSANDRA-8720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston


Multiple users have requested some sort of tool to help identify wide row keys. 
They get into a situation where they know a wide row/partition has been 
inserted and it's causing problems for them but they have no idea what the row 
key is in order to remove it.  

Maintaining the widest row key currently encountered and displaying it in 
cfstats would be one possible approach.

Another would be an offline tool (possibly an enhancement to sstablekeys) to 
show the number of columns/bytes per key in each sstable. If a tool to 
aggregate the information at a CF-level could be provided that would be a 
bonus, but it shouldn't be too hard to write a script wrapper to aggregate them 
if not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8720) Provide tools for finding wide row/partition keys

2015-02-02 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302211#comment-14302211
 ] 

J.B. Langston commented on CASSANDRA-8720:
--

Better than nothing, but logs can get rotated, deleted, etc. and it would good 
to have a way to get this information on demand without having to wait for a 
compaction to occur.

 Provide tools for finding wide row/partition keys
 -

 Key: CASSANDRA-8720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8720
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston

 Multiple users have requested some sort of tool to help identify wide row 
 keys. They get into a situation where they know a wide row/partition has been 
 inserted and it's causing problems for them but they have no idea what the 
 row key is in order to remove it.  
 Maintaining the widest row key currently encountered and displaying it in 
 cfstats would be one possible approach.
 Another would be an offline tool (possibly an enhancement to sstablekeys) to 
 show the number of columns/bytes per key in each sstable. If a tool to 
 aggregate the information at a CF-level could be provided that would be a 
 bonus, but it shouldn't be too hard to write a script wrapper to aggregate 
 them if not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8606) sstablesplit does not remove original sstable

2015-01-30 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298760#comment-14298760
 ] 

J.B. Langston commented on CASSANDRA-8606:
--

This also affects offline sstableupgrade.  I'd say this should be a higher 
priority since people could fill up their disks during an upgrade.

 sstablesplit does not remove original sstable
 -

 Key: CASSANDRA-8606
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8606
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
Priority: Minor
 Fix For: 2.1.3


 sstablesplit leaves the original file on disk, it should not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8615) Create -D flag to disable speculative retry by default

2015-01-13 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-8615:


 Summary: Create -D flag to disable speculative retry by default
 Key: CASSANDRA-8615
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8615
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston


Some clusters have shown increased latency with speculative retry enabled.  
Speculative retry is enabled by default when upgrading from 1.2 to 2.0, and for 
large clusters it can take a long time to complete a rolling upgrade, during 
which time speculative retry will be enabled. Therefore it would be helpful to 
have a -D flag that will disable it by default during an upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8448) Comparison method violates its general contract in AbstractEndpointSnitch

2014-12-09 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-8448:


 Summary: Comparison method violates its general contract in 
AbstractEndpointSnitch
 Key: CASSANDRA-8448
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8448
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


Seen in both 1.2 and 2.0.  The error is occurring here: 
https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/locator/AbstractEndpointSnitch.java#L49

ERROR [Thrift:9] 2014-12-04 20:12:28,732 CustomTThreadPoolServer.java (line 
219) Error occurred during processing of message.
com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.IllegalArgumentException: Comparison method violates its general 
contract!
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2199)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
at 
org.apache.cassandra.service.ClientState.authorize(ClientState.java:352)
at 
org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:224)
at 
org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:218)
at 
org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:202)
at 
org.apache.cassandra.thrift.CassandraServer.createMutationList(CassandraServer.java:822)
at 
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:954)
at com.datastax.bdp.server.DseServer.batch_mutate(DseServer.java:576)
at 
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3922)
at 
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3906)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:201)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Comparison method violates its 
general contract!
at java.util.TimSort.mergeHi(TimSort.java:868)
at java.util.TimSort.mergeAt(TimSort.java:485)
at java.util.TimSort.mergeCollapse(TimSort.java:410)
at java.util.TimSort.sort(TimSort.java:214)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at 
org.apache.cassandra.locator.AbstractEndpointSnitch.sortByProximity(AbstractEndpointSnitch.java:49)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithScore(DynamicEndpointSnitch.java:157)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithBadness(DynamicEndpointSnitch.java:186)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(DynamicEndpointSnitch.java:151)
at 
org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1408)
at 
org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1402)
at 
org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:148)
at 
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1223)
at 
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1165)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:255)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:225)
at org.apache.cassandra.auth.Auth.selectUser(Auth.java:243)
at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84)
at 
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50)
at 
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:69)
at org.apache.cassandra.service.ClientState$1.load(ClientState.java:338)
at org.apache.cassandra.service.ClientState$1.load(ClientState.java:335)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
...

[jira] [Updated] (CASSANDRA-8448) Comparison method violates its general contract in AbstractEndpointSnitch

2014-12-09 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-8448:
-
Description: 
Seen in both 1.2 and 2.0.  The error is occurring here: 
https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/locator/AbstractEndpointSnitch.java#L49

{code}
ERROR [Thrift:9] 2014-12-04 20:12:28,732 CustomTThreadPoolServer.java (line 
219) Error occurred during processing of message.
com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.IllegalArgumentException: Comparison method violates its general 
contract!
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2199)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3936)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4806)
at 
org.apache.cassandra.service.ClientState.authorize(ClientState.java:352)
at 
org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:224)
at 
org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:218)
at 
org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:202)
at 
org.apache.cassandra.thrift.CassandraServer.createMutationList(CassandraServer.java:822)
at 
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:954)
at com.datastax.bdp.server.DseServer.batch_mutate(DseServer.java:576)
at 
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3922)
at 
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3906)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:201)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Comparison method violates its 
general contract!
at java.util.TimSort.mergeHi(TimSort.java:868)
at java.util.TimSort.mergeAt(TimSort.java:485)
at java.util.TimSort.mergeCollapse(TimSort.java:410)
at java.util.TimSort.sort(TimSort.java:214)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at 
org.apache.cassandra.locator.AbstractEndpointSnitch.sortByProximity(AbstractEndpointSnitch.java:49)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithScore(DynamicEndpointSnitch.java:157)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximityWithBadness(DynamicEndpointSnitch.java:186)
at 
org.apache.cassandra.locator.DynamicEndpointSnitch.sortByProximity(DynamicEndpointSnitch.java:151)
at 
org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1408)
at 
org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1402)
at 
org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:148)
at 
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1223)
at 
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1165)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:255)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:225)
at org.apache.cassandra.auth.Auth.selectUser(Auth.java:243)
at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84)
at 
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50)
at 
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:69)
at org.apache.cassandra.service.ClientState$1.load(ClientState.java:338)
at org.apache.cassandra.service.ClientState$1.load(ClientState.java:335)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
... 18 more
{code}

Workaround: Setting  -Djava.util.Arrays.useLegacyMergeSort=true causes the 
error to go away.

  was:
Seen in both 1.2 and 2.0.  The

[jira] [Created] (CASSANDRA-8329) LeveledCompactionStrategy should split large files across data directories when compacting

2014-11-17 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-8329:


 Summary: LeveledCompactionStrategy should split large files across 
data directories when compacting
 Key: CASSANDRA-8329
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8329
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


Because we fall back to STCS for L0 when LCS gets behind, the sstables in L0 
can get quite large during sustained periods of heavy writes.  This can result 
in large imbalances between data volumes when using JBOD support.  

Eventually these large files get broken up as L0 sstables are moved up into 
higher levels; however, because LCS only chooses a single volume on which to 
write all of the sstables created during a single compaction, the imbalance is 
persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7386) JBOD threshold to prevent unbalanced disk utilization

2014-11-12 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208119#comment-14208119
]

J.B. Langston commented on CASSANDRA-7386:
--

I've seen a lot of users hitting this issue lately, so the sooner we can get a
patch the better. This also needs to be back ported to 2.0 if at all possible.
In several cases I've seen severe imbalances like the ones described where
there are some drives completely full and others at 10-20% utilization.

JBOD threshold to prevent unbalanced disk utilization
-

Key: CASSANDRA-7386
URL: https://issues.apache.org/jira/browse/CASSANDRA-7386
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Chris Lohfink
Assignee: Alan Boudreault
Priority: Minor
Fix For: 2.1.3

Attachments: 7386-v1.patch, 7386v2.diff, Mappe1.ods,
mean-writevalue-7disks.png, patch_2_1_branch_proto.diff,
sstable-count-second-run.png

Currently the pick the disks are picked first by number of current tasks,
then by free space. This helps with performance but can lead to large
differences in utilization in some (unlikely but possible) scenarios. Ive
seen 55% to 10% and heard reports of 90% to 10% on IRC. With both LCS and
STCS (although my suspicion is that STCS makes it worse since harder to be
balanced).
I purpose the algorithm change a little to have some maximum range of
utilization where it will pick by free space over load (acknowledging it can
be slower). So if a disk A is 30% full and disk B is 5% full it will never
pick A over B until it balances out.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-7386) JBOD threshold to prevent unbalanced disk utilization

2014-11-12 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208119#comment-14208119
 ] 

J.B. Langston edited comment on CASSANDRA-7386 at 11/12/14 3:31 PM:


I've seen a lot of users hitting this issue lately, so the sooner we can get a 
patch the better. This also needs to be back ported to 2.0 if at all possible.  
In several cases I've seen severe imbalances like the ones described where 
there are some drives completely full and others at 10-20% utilization.

Here are a couple of stack traces. It happens both during flushes and 
compactions.

{code}
ERROR [FlushWriter:6241] 2014-09-07 08:27:35,298 CassandraDaemon.java (line 
198) Exception in thread Thread[FlushWriter:6241,5,main]
FSWriteError in 
/data6/system/compactions_in_progress/system-compactions_in_progress-tmp-jb-8222-Index.db
at 
org.apache.cassandra.io.util.SequentialWriter.flushData(SequentialWriter.java:267)
at 
org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:219)
at 
org.apache.cassandra.io.util.SequentialWriter.syncInternal(SequentialWriter.java:191)
at 
org.apache.cassandra.io.util.SequentialWriter.close(SequentialWriter.java:381)
at 
org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:481)
at 
org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212)
at 
org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301)
at 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:417)
at 
org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes0(Native Method)
at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
at 
org.apache.cassandra.io.util.SequentialWriter.flushData(SequentialWriter.java:263)
... 13 more

ERROR [CompactionExecutor:9166] 2014-09-06 16:09:14,786 CassandraDaemon.java 
(line 198) Exception in thread Thread[CompactionExecutor:9166,1,main]
FSWriteError in /data6/keyspace_1/data/keyspace_1-data-tmp-jb-13599-Filter.db
at 
org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475)
at 
org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212)
at 
org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301)
at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:209)
at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.write(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:295)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
at 
org.apache.cassandra.utils.BloomFilterSerializer.serialize(BloomFilterSerializer.java:34)
at 
org.apache.cassandra.utils.Murmur3BloomFilter$Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44)
at 
org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:41)
at 
org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:468)
... 13 more
{code}


was (Author: jblangs...@datastax.com):
I've seen a lot of users hitting this issue lately, so the sooner we can get a 
patch the better. This also needs to be back ported to 2.0 if at all possible.  
In several cases

[jira] [Created] (CASSANDRA-8253) cassandra-stress 2.1 doesn't support LOCAL_ONE

2014-11-04 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-8253:


 Summary: cassandra-stress 2.1 doesn't support LOCAL_ONE
 Key: CASSANDRA-8253
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8253
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


Looks like a simple oversight in argument parsing:

➜  bin  ./cassandra-stress write cl=LOCAL_ONE
Invalid value LOCAL_ONE; must match pattern 
ONE|QUORUM|LOCAL_QUORUM|EACH_QUORUM|ALL|ANY

Also, CASSANDRA-7077 argues that it should be using LOCAL_ONE by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8253) cassandra-stress 2.1 doesn't support LOCAL_ONE

2014-11-04 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-8253:
-
Reproduced In: 2.1.1

 cassandra-stress 2.1 doesn't support LOCAL_ONE
 --

 Key: CASSANDRA-8253
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8253
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston

 Looks like a simple oversight in argument parsing:
 ➜  bin  ./cassandra-stress write cl=LOCAL_ONE
 Invalid value LOCAL_ONE; must match pattern 
 ONE|QUORUM|LOCAL_QUORUM|EACH_QUORUM|ALL|ANY
 Also, CASSANDRA-7077 argues that it should be using LOCAL_ONE by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai

2014-10-17 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175588#comment-14175588
 ] 

J.B. Langston commented on CASSANDRA-8084:
--

I don't think sstableloader is working right. Here is the output for 
sstableloader itself:

{code}
automaton@ip-172-31-7-50:~/Keyspace1/Standard1$ sstableloader -d localhost `pwd`
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-320-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-326-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-325-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-283-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-267-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-211-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-301-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-316-Data.db to 
[/54.183.192.248, /54.215.139.161, /54.165.222.3, /54.172.118.222]
Streaming session ID: ac5dd440-5645-11e4-a813-3d13c3d3c540
progress: [/54.172.118.222 8/8 (100%)] [/54.183.192.248 8/8 (100%)] 
[/54.165.222.3 8/8 (100%)] [/54.215.139.161 8/8 (100%)] [total: 100% - 
2147483647MB/s (avg: 30MB/s)
{code}

Here is netstats on the node where it is running:

{code}
Responses   n/a 0812
automaton@ip-172-31-7-50:~$ nodetool netstats
Mode: NORMAL
Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540
/172.31.7.50 (using /54.183.192.248)
Receiving 8 files, 1059673728 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-10-Data.db
 56468194/164372226 bytes(34%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-4-Data.db
 27800/27800 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-3-Data.db
 50674396/50674396 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-5-Data.db
 68597334/68597334 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-7-Data.db
 139068110/139068110 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-6-Data.db
 12682638/12682638 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-9-Data.db
 27800/27800 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-8-Data.db
 68279024/68279024 bytes(100%) received from /172.31.7.50
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 0  0
Responses   n/a 0970
{code}

Here's netstats on the other node in the same DC:

{code}
automaton@ip-172-31-40-169:~$ nodetool netstats
Mode: NORMAL
Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540
/172.31.7.50 (using /54.183.192.248)
Receiving 8 files, 1059673728 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-239-Data.db
 68279024/68279024 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-245-Data.db
 27800/27800 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-246-Data.db
 43078602/50674396 bytes(85%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-240-Data.db
 27800/27800 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-241-Data.db
 12682638/12682638 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-243-Data.db
 139068110/139068110 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-242-Data.db
 164372226/164372226 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-244-Data.db
 68597334/68597334 bytes(100%) received from /172.31.7.50
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 0 249589
Responses

[jira] [Comment Edited] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool

2014-10-17 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175588#comment-14175588
 ] 

J.B. Langston edited comment on CASSANDRA-8084 at 10/17/14 9:43 PM:


I don't think sstableloader is working right. Here is the output for 
sstableloader itself:

{code}
automaton@ip-172-31-7-50:~/Keyspace1/Standard1$ sstableloader -d localhost `pwd`
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-320-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-326-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-325-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-283-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-267-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-211-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-301-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-316-Data.db to 
[/54.183.192.248, /54.215.139.161, /54.165.222.3, /54.172.118.222]
Streaming session ID: ac5dd440-5645-11e4-a813-3d13c3d3c540
progress: [/54.172.118.222 8/8 (100%)] [/54.183.192.248 8/8 (100%)] 
[/54.165.222.3 8/8 (100%)] [/54.215.139.161 8/8 (100%)] [total: 100% - 
2147483647MB/s (avg: 30MB/s)
{code}

Here is netstats on the node where it is running (54.183.192.248):

{code}
Responses   n/a 0812
automaton@ip-172-31-7-50:~$ nodetool netstats
Mode: NORMAL
Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540
/172.31.7.50 (using /54.183.192.248)
Receiving 8 files, 1059673728 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-10-Data.db
 56468194/164372226 bytes(34%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-4-Data.db
 27800/27800 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-3-Data.db
 50674396/50674396 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-5-Data.db
 68597334/68597334 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-7-Data.db
 139068110/139068110 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-6-Data.db
 12682638/12682638 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-9-Data.db
 27800/27800 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-8-Data.db
 68279024/68279024 bytes(100%) received from /172.31.7.50
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 0  0
Responses   n/a 0970
{code}

Here's netstats on the other node in the same DC (54.215.139.161):

{code}
automaton@ip-172-31-40-169:~$ nodetool netstats
Mode: NORMAL
Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540
/172.31.7.50 (using /54.183.192.248)
Receiving 8 files, 1059673728 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-239-Data.db
 68279024/68279024 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-245-Data.db
 27800/27800 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-246-Data.db
 43078602/50674396 bytes(85%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-240-Data.db
 27800/27800 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-241-Data.db
 12682638/12682638 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-243-Data.db
 139068110/139068110 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-242-Data.db
 164372226/164372226 bytes(100%) received from /172.31.7.50

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-244-Data.db
 68597334/68597334 bytes(100%) received from /172.31.7.50
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed

[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai

2014-10-16 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173821#comment-14173821
 ] 

J.B. Langston commented on CASSANDRA-8084:
--

Test v3; nodetool netstats looks good as well as actual ports used via netstat 
-an.  In the logs, I only see the internal IP mentioned in one place. Is this 
the INFO line you were talking about?

{code}
 INFO [STREAM-INIT-/172.31.5.143:43953] 2014-10-16 14:36:11,292 
StreamResultFuture.java (line 121) [Stream 
#c5fbdb90-5541-11e4-8eb3-c9fac3589773] Received streaming plan for Repair
 INFO [STREAM-INIT-/172.31.5.143:43994] 2014-10-16 14:38:16,120 
StreamResultFuture.java (line 121) [Stream 
#10424ae0-5542-11e4-8eb3-c9fac3589773] Received streaming plan for Repair
{code}

 GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE 
 clusters doesnt use the PRIVATE IPS for Intra-DC communications - When 
 running nodetool repair
 -

 Key: CASSANDRA-8084
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: Tested this in GCE and AWS clusters. Created multi 
 region and multi dc cluster once in GCE and once in AWS and ran into the same 
 problem. 
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=12.04
 DISTRIB_CODENAME=precise
 DISTRIB_DESCRIPTION=Ubuntu 12.04.3 LTS
 NAME=Ubuntu
 VERSION=12.04.3 LTS, Precise Pangolin
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME=Ubuntu precise (12.04.3 LTS)
 VERSION_ID=12.04
 Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also 
 latest DSE version which is 4.5 and which corresponds to 2.0.8.39.
Reporter: Jana
Assignee: Yuki Morishita
  Labels: features
 Fix For: 2.0.11

 Attachments: 8084-2.0-v2.txt, 8084-2.0-v3.txt, 8084-2.0.txt


 Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) 
 used the PRIVATE IPS for communication between INTRA-DC nodes in my 
 multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran 
 nodetool repair -local. It works fine during regular reads.
  Here are the various cluster flavors I tried and failed- 
 AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
 (Prefer_local=true) in rackdc-properties file. 
 AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
 rackdc-properties file. 
 GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
 (Prefer_local=true) in rackdc-properties file. 
 GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
 rackdc-properties file. 
 I am expecting with the above setup all of my nodes in a given DC all 
 communicate via private ips since the cloud providers dont charge us for 
 using the private ips and they charge for using public ips.
 But they can use PUBLIC IPs for INTER-DC communications which is working as 
 expected. 
 Here is a snippet from my log files when I ran the nodetool repair -local - 
 Node responding to 'node running repair' 
 INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) 
 [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
 to /54.172.118.222 for system_traces/sessions
  INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) 
 [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
 to /54.172.118.222 for system_traces/events
 Node running repair - 
 INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 
 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for 
 events from /54.172.118.222
 Note: The IPs its communicating is all PUBLIC Ips and it should have used the 
 PRIVATE IPs starting with 172.x.x.x
 YAML file values : 
 The listen address is set to: PRIVATE IP
 The broadcast address is set to: PUBLIC IP
 The SEEDs address is set to: PUBLIC IPs from both DCs
 The SNITCHES tried: GPFS and EC2MultiRegionSnitch
 RACK-DC: Had prefer_local set to true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai

2014-10-16 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173927#comment-14173927
 ] 

J.B. Langston commented on CASSANDRA-8084:
--

Confirmed the log messages:

{code}
 INFO [StreamConnectionEstablisher:1] 2014-10-16 14:36:11,277 
StreamSession.java (line 218) [Stream #c5fbdb90-5541-11e4-8eb3-c9fac3589773] 
Starting streaming to /54.183.192.248 through /172.31.7.50
 INFO [StreamConnectionEstablisher:2] 2014-10-16 14:38:16,083 
StreamSession.java (line 218) [Stream #10424ae0-5542-11e4-8eb3-c9fac3589773] 
Starting streaming to /54.183.192.248 through /172.31.7.50
 INFO [StreamConnectionEstablisher:1] 2014-10-16 14:39:53,600 
StreamSession.java (line 218) [Stream #4a9133f0-5542-11e4-8eb3-c9fac3589773] 
Starting streaming to /54.183.192.248 through /172.31.7.50
 INFO [StreamConnectionEstablisher:2] 2014-10-16 14:40:50,476 
StreamSession.java (line 218) [Stream #6c5b4200-5542-11e4-8eb3-c9fac3589773] 
Starting streaming to /54.183.192.248 through /172.31.7.50
{code}

Everything looks like it's working as expected.  I haven't tested sstableloader 
as suggested by [~jjordan].

 GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE 
 clusters doesnt use the PRIVATE IPS for Intra-DC communications - When 
 running nodetool repair
 -

 Key: CASSANDRA-8084
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: Tested this in GCE and AWS clusters. Created multi 
 region and multi dc cluster once in GCE and once in AWS and ran into the same 
 problem. 
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=12.04
 DISTRIB_CODENAME=precise
 DISTRIB_DESCRIPTION=Ubuntu 12.04.3 LTS
 NAME=Ubuntu
 VERSION=12.04.3 LTS, Precise Pangolin
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME=Ubuntu precise (12.04.3 LTS)
 VERSION_ID=12.04
 Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also 
 latest DSE version which is 4.5 and which corresponds to 2.0.8.39.
Reporter: Jana
Assignee: Yuki Morishita
  Labels: features
 Fix For: 2.0.11

 Attachments: 8084-2.0-v2.txt, 8084-2.0-v3.txt, 8084-2.0.txt


 Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) 
 used the PRIVATE IPS for communication between INTRA-DC nodes in my 
 multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran 
 nodetool repair -local. It works fine during regular reads.
  Here are the various cluster flavors I tried and failed- 
 AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
 (Prefer_local=true) in rackdc-properties file. 
 AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
 rackdc-properties file. 
 GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
 (Prefer_local=true) in rackdc-properties file. 
 GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
 rackdc-properties file. 
 I am expecting with the above setup all of my nodes in a given DC all 
 communicate via private ips since the cloud providers dont charge us for 
 using the private ips and they charge for using public ips.
 But they can use PUBLIC IPs for INTER-DC communications which is working as 
 expected. 
 Here is a snippet from my log files when I ran the nodetool repair -local - 
 Node responding to 'node running repair' 
 INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) 
 [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
 to /54.172.118.222 for system_traces/sessions
  INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) 
 [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
 to /54.172.118.222 for system_traces/events
 Node running repair - 
 INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 
 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for 
 events from /54.172.118.222
 Note: The IPs its communicating is all PUBLIC Ips and it should have used the 
 PRIVATE IPs starting with 172.x.x.x
 YAML file values : 
 The listen address is set to: PRIVATE IP
 The broadcast address is set to: PUBLIC IP
 The SEEDs address is set to: PUBLIC IPs from both DCs
 The SNITCHES tried: GPFS and EC2MultiRegionSnitch
 RACK-DC: Had prefer_local set to true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai

2014-10-13 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14169555#comment-14169555
 ] 

J.B. Langston commented on CASSANDRA-8084:
--

I think it is most important to show the private IP in netstats, and my vote 
would be to show both the public and private IP in that case.  On the logs, I 
can see that would be more work to fix and I don't necessarily think it needs 
to show the private IP everywhere, but maybe on the messages that specifically 
concern streaming, we could show both.

 GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE 
 clusters doesnt use the PRIVATE IPS for Intra-DC communications - When 
 running nodetool repair
 -

 Key: CASSANDRA-8084
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084
 Project: Cassandra
  Issue Type: Bug
  Components: Config
 Environment: Tested this in GCE and AWS clusters. Created multi 
 region and multi dc cluster once in GCE and once in AWS and ran into the same 
 problem. 
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=12.04
 DISTRIB_CODENAME=precise
 DISTRIB_DESCRIPTION=Ubuntu 12.04.3 LTS
 NAME=Ubuntu
 VERSION=12.04.3 LTS, Precise Pangolin
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME=Ubuntu precise (12.04.3 LTS)
 VERSION_ID=12.04
 Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also 
 latest DSE version which is 4.5 and which corresponds to 2.0.8.39.
Reporter: Jana
Assignee: Yuki Morishita
  Labels: features
 Fix For: 2.0.11

 Attachments: 8084-2.0.txt


 Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) 
 used the PRIVATE IPS for communication between INTRA-DC nodes in my 
 multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran 
 nodetool repair -local. It works fine during regular reads.
  Here are the various cluster flavors I tried and failed- 
 AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
 (Prefer_local=true) in rackdc-properties file. 
 AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
 rackdc-properties file. 
 GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
 (Prefer_local=true) in rackdc-properties file. 
 GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
 rackdc-properties file. 
 I am expecting with the above setup all of my nodes in a given DC all 
 communicate via private ips since the cloud providers dont charge us for 
 using the private ips and they charge for using public ips.
 But they can use PUBLIC IPs for INTER-DC communications which is working as 
 expected. 
 Here is a snippet from my log files when I ran the nodetool repair -local - 
 Node responding to 'node running repair' 
 INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) 
 [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
 to /54.172.118.222 for system_traces/sessions
  INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) 
 [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
 to /54.172.118.222 for system_traces/events
 Node running repair - 
 INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 
 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for 
 events from /54.172.118.222
 Note: The IPs its communicating is all PUBLIC Ips and it should have used the 
 PRIVATE IPs starting with 172.x.x.x
 YAML file values : 
 The listen address is set to: PRIVATE IP
 The broadcast address is set to: PUBLIC IP
 The SEEDs address is set to: PUBLIC IPs from both DCs
 The SNITCHES tried: GPFS and EC2MultiRegionSnitch
 RACK-DC: Had prefer_local set to true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai

2014-10-10 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166908#comment-14166908
 ] 

J.B. Langston commented on CASSANDRA-8084:
--

I tested and it appears to work. Here is the cluster I am testing with:

{code}
Datacenter: DC1_EAST

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID   
Rack
UN  54.165.222.3711.26 MB  1   25.0%  
dd449706-2059-4b65-ae98-0012d2cf8f67  rack1
UN  54.172.118.222  561.14 MB  1   25.0%  
18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef  rack1
Datacenter: DC1_WEST

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID   
Rack
UN  54.183.192.248  721.2 MB   1   25.0%  
c4dd37f1-d937-4876-8669-f0b01a3942db  rack1
UN  54.215.139.161  909.26 MB  1   25.0%  
16499349-8cef-4a62-a99c-ab145cb70921  rack1

I wasn't sure initially because the logs and `nodetool netstats` still show the 
broadcast address. You can see here that nodetool netstats, when run on 
54.215.139.161, shows we are streaming from 54.183.192.248 (the broadcast 
address of the other node in the same DC):

{code}
Mode: NORMAL
Repair dbc7ea40-5082-11e4-8190-c9fac3589773
/54.183.192.248
Receiving 9 files, 229856794 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db
 58878176/58878176 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db
 97856/97856 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db
 69407704/69407704 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db
 3203116/3203116 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db
 12545306/12545306 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db
 69407704/69407704 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db
 1536228/1536228 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db
 12589230/12589230 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db
 2191474/2191474 bytes(100%) received from /54.183.192.248
Sending 5 files, 109645980 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db 
14323672/14323672 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db 
20581730/20581730 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db 
3161694/3161694 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db 
69407704/69407704 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db 
2171180/2171180 bytes(100%) sent to /54.183.192.248
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 01495191
Responses   n/a 0 714928
{code}

However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we 
are only connecting to the local node on its listen address (172.31.7.50):

{code}
tcp0  0 172.31.5.143:7000   0.0.0.0:*   LISTEN  
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34936  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34937  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34938  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34936  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34937  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34938  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.7.50:52125   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.7.50:52126   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:57502  172.31.7.50:7000

[jira] [Comment Edited] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool

2014-10-10 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166908#comment-14166908
 ] 

J.B. Langston edited comment on CASSANDRA-8084 at 10/10/14 2:25 PM:


I tested and it appears to work. Here is the cluster I am testing with:

{code}
Datacenter: DC1_EAST

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID   
Rack
UN  54.165.222.3711.26 MB  1   25.0%  
dd449706-2059-4b65-ae98-0012d2cf8f67  rack1
UN  54.172.118.222  561.14 MB  1   25.0%  
18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef  rack1
Datacenter: DC1_WEST

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID   
Rack
UN  54.183.192.248  721.2 MB   1   25.0%  
c4dd37f1-d937-4876-8669-f0b01a3942db  rack1
UN  54.215.139.161  909.26 MB  1   25.0%  
16499349-8cef-4a62-a99c-ab145cb70921  rack1
{code}

I wasn't sure initially because the logs and `nodetool netstats` still show the 
broadcast address. You can see here that nodetool netstats, when run on 
54.215.139.161, shows we are streaming from 54.183.192.248 (the broadcast 
address of the other node in the same DC):

{code}
Mode: NORMAL
Repair dbc7ea40-5082-11e4-8190-c9fac3589773
/54.183.192.248
Receiving 9 files, 229856794 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db
 58878176/58878176 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db
 97856/97856 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db
 69407704/69407704 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db
 3203116/3203116 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db
 12545306/12545306 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db
 69407704/69407704 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db
 1536228/1536228 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db
 12589230/12589230 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db
 2191474/2191474 bytes(100%) received from /54.183.192.248
Sending 5 files, 109645980 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db 
14323672/14323672 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db 
20581730/20581730 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db 
3161694/3161694 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db 
69407704/69407704 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db 
2171180/2171180 bytes(100%) sent to /54.183.192.248
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 01495191
Responses   n/a 0 714928
{code}

However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we 
are only connecting to the local node on its listen address (172.31.7.50):

{code}
tcp0  0 172.31.5.143:7000   0.0.0.0:*   LISTEN  
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34936  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34937  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34938  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34936  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34937  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34938  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.7.50:52125   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.7.50:52126   ESTABLISHED 
17279/java
tcp

[jira] [Comment Edited] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool

2014-10-10 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166908#comment-14166908
 ] 

J.B. Langston edited comment on CASSANDRA-8084 at 10/10/14 2:26 PM:


I tested and it appears to work. Here is the cluster I am testing with:

{code}
Datacenter: DC1_EAST

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID   
Rack
UN  54.165.222.3711.26 MB  1   25.0%  
dd449706-2059-4b65-ae98-0012d2cf8f67  rack1
UN  54.172.118.222  561.14 MB  1   25.0%  
18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef  rack1
Datacenter: DC1_WEST

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID   
Rack
UN  54.183.192.248  721.2 MB   1   25.0%  
c4dd37f1-d937-4876-8669-f0b01a3942db  rack1
UN  54.215.139.161  909.26 MB  1   25.0%  
16499349-8cef-4a62-a99c-ab145cb70921  rack1
{code}

I wasn't sure initially because the logs and `nodetool netstats` still show the 
broadcast address. You can see here that nodetool netstats, when run on 
54.215.139.161, shows we are streaming from 54.183.192.248 (the broadcast 
address of the other node in the same DC):

{code}
Mode: NORMAL
Repair dbc7ea40-5082-11e4-8190-c9fac3589773
/54.183.192.248
Receiving 9 files, 229856794 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db
 58878176/58878176 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db
 97856/97856 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db
 69407704/69407704 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db
 3203116/3203116 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db
 12545306/12545306 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db
 69407704/69407704 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db
 1536228/1536228 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db
 12589230/12589230 bytes(100%) received from /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db
 2191474/2191474 bytes(100%) received from /54.183.192.248
Sending 5 files, 109645980 bytes total

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db 
14323672/14323672 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db 
20581730/20581730 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db 
3161694/3161694 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db 
69407704/69407704 bytes(100%) sent to /54.183.192.248

/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db 
2171180/2171180 bytes(100%) sent to /54.183.192.248
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 01495191
Responses   n/a 0 714928
{code}

However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we 
are only connecting to the local node on its listen address (172.31.7.50):

{code}
tcp0  0 172.31.5.143:7000   0.0.0.0:*   LISTEN  
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34936  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34937  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.5.143:34938  ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34936  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34937  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:34938  172.31.5.143:7000   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.7.50:52125   ESTABLISHED 
17279/java
tcp0  0 172.31.5.143:7000   172.31.7.50:52126   ESTABLISHED 
17279/java
tcp

[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repai

2014-10-08 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163680#comment-14163680
 ] 

J.B. Langston commented on CASSANDRA-8084:
--

Here is the AWS cluster used to reproduce this:

{code}
automaton@ip-172-31-0-237:~$ nodetool status
Note: Ownership information does not include topology; for complete 
information, specify a keyspace

Datacenter: aws_east

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.165.86.246 304.01 MB 256 26.8% 1042deb8-5395-42b1-adf4-2a373149b052 rack1
UN 54.209.121.225 302.82 MB 256 21.8% 7e7499c2-acfb-4eda-b786-7878907038b8 rack1

Datacenter: aws_west

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.183.246.79 79.01 MB 256 24.7% 9a4450a4-d00b-407c-8217-464ca5d3d74c rack1
UN 54.183.249.149 319.14 MB 256 26.7% cb6579d4-3eac-48c6-a8c0-ca30071a97e8 rack1
{code}

Here is the test case I ran to reproduce this:

1) Run cassandra-stress once to create Keyspace1 and Standard1 CF.

2) Alter keyspace with replication to all nodes:

{code}
ALTER KEYSPACE Keyspace1 WITH replication = {  'class': 
'NetworkTopologyStrategy',  'aws_east': '2',  'aws_west': '2' };
{code}

3) Shut down one of the nodes in aws_west.

4) Run cassandra-stress on the other node in aws-west (just cassandra-stress 
with no options). Let it finish.

5) Start back up the node.

6) Run nodetool repair -local

7) Repair and streaming messages in system.log will show that it is using the 
broadcast IP for nodes in the same DC.  You can also watch the connections 
being established over the broadcast IP with this command:

{code}
sudo netstat -anp | grep 7000 | sort -k5
{code}

This was conducted on DSE with GPFS. We should repeat with EC2MRS on DSE and 
with GPFS on Apache Cassandra/DSC.

Here is the netstat output showing that it is establishing connections to the 
node in the same DC (54.183.249.149). This command is being run on 
54.183.246.79, so it should have used the private 172 address to talk to 
54.183.249.149 instead.

{code}
automaton@ip-172-31-0-237:~$ sudo netstat -anp | grep 7000 | sort -k5
tcp 0 0 172.31.0.237:7000 0.0.0.0:* LISTEN 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54148 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54149 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54150 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54148 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54149 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54150 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.4.163:56894 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.4.163:56895 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:55510 172.31.4.163:7000 ESTABLISHED 8959/java
tcp 0 35 172.31.0.237:55504 172.31.4.163:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 54.165.86.246:36101 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:50600 54.165.86.246:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:50606 54.165.86.246:7000 ESTABLISHED 8959/java
tcp 1 0 172.31.0.237:60588 54.183.249.149:7000 CLOSE_WAIT 8959/java
tcp 0 0 172.31.0.237:60587 54.183.249.149:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:60505 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60508 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60509 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60511 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60513 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60514 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60515 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60517 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60521 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60523 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60524 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60527 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60528 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60532 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60534 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60536 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60538 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60544 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60546 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60552 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60554 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60560 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60562 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60564 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60565 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60566 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60568 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60570 54.183.249.149:7000 TIME_WAIT -
tcp 0 0

[jira] [Comment Edited] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool

2014-10-08 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163680#comment-14163680
 ] 

J.B. Langston edited comment on CASSANDRA-8084 at 10/8/14 4:17 PM:
---

Here is the AWS cluster used to reproduce this:

{code}
automaton@ip-172-31-0-237:~$ nodetool status
Note: Ownership information does not include topology; for complete 
information, specify a keyspace

Datacenter: aws_east

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.165.86.246 304.01 MB 256 26.8% 1042deb8-5395-42b1-adf4-2a373149b052 rack1
UN 54.209.121.225 302.82 MB 256 21.8% 7e7499c2-acfb-4eda-b786-7878907038b8 rack1

Datacenter: aws_west

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.183.246.79 79.01 MB 256 24.7% 9a4450a4-d00b-407c-8217-464ca5d3d74c rack1
UN 54.183.249.149 319.14 MB 256 26.7% cb6579d4-3eac-48c6-a8c0-ca30071a97e8 rack1
{code}

Here is the test case I ran to reproduce this:

1) Run cassandra-stress once to create Keyspace1 and Standard1 CF.

2) Alter keyspace with replication to all nodes:

{code}
ALTER KEYSPACE Keyspace1 WITH replication = {  'class': 
'NetworkTopologyStrategy',  'aws_east': '2',  'aws_west': '2' };
{code}

3) Shut down one of the nodes in aws_west.

4) Run cassandra-stress on the other node in aws-west (just cassandra-stress 
with no options). Let it finish.

5) Start back up the node.

6) Run nodetool repair -local

7) Repair and streaming messages in system.log will show that it is using the 
broadcast IP for nodes in the same DC.  You can also watch the connections 
being established over the broadcast IP with this command:

{code}
sudo netstat -anp | grep 7000 | sort -k5
{code}

The original test was conducted on DSE. We also reproduced it on on Apache 
Cassandra/DSC 2.0.10.

Here is the netstat output showing that it is establishing connections to the 
node in the same DC (54.183.249.149). This command is being run on 
54.183.246.79, so it should have used the private 172 address to talk to 
54.183.249.149 instead.

{code}
automaton@ip-172-31-0-237:~$ sudo netstat -anp | grep 7000 | sort -k5
tcp 0 0 172.31.0.237:7000 0.0.0.0:* LISTEN 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54148 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54149 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.0.237:54150 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54148 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54149 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:54150 172.31.0.237:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.4.163:56894 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 172.31.4.163:56895 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:55510 172.31.4.163:7000 ESTABLISHED 8959/java
tcp 0 35 172.31.0.237:55504 172.31.4.163:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:7000 54.165.86.246:36101 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:50600 54.165.86.246:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:50606 54.165.86.246:7000 ESTABLISHED 8959/java
tcp 1 0 172.31.0.237:60588 54.183.249.149:7000 CLOSE_WAIT 8959/java
tcp 0 0 172.31.0.237:60587 54.183.249.149:7000 ESTABLISHED 8959/java
tcp 0 0 172.31.0.237:60505 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60508 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60509 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60511 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60513 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60514 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60515 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60517 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60521 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60523 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60524 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60527 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60528 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60532 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60534 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60536 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60538 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60544 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60546 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60552 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60554 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60560 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60562 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60564 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60565 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60566 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60568 54.183.249.149:7000 TIME_WAIT -
tcp 0 0 172.31.0.237:60570 54.183.249.149:7000

[jira] [Created] (CASSANDRA-7805) Performance regression in multi-get (in clause) due to automatic paging

2014-08-20 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-7805:


 Summary: Performance regression in multi-get (in clause) due to 
automatic paging
 Key: CASSANDRA-7805
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7805
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston
Priority: Minor


Comparative benchmarking of 1.2 vs. 2.0 shows a regression in multi-get (in 
clause) queries due to automatic paging.  Take the following example:

select myId, col1, col2, col3 from myTable where col1 = 'xyz' and myId IN (id1, 
id1, ..., id100); // primary key is (myId, col1)

We were suprised to see that in 2.0, the above query was giving an order of 
magnitude worse performance than in 1.2. Digging in, I believe it is due to the 
issue described in the comment at the top of MultiPartitionPager.java (v2.0.9): 
Note that this is not easy to make efficient. Indeed, we need to page the 
first command fully before returning results from the next one, but if the 
result returned by each command is small (compared to pageSize), paging the 
commands one at a time under-performs compared to parallelizing.

The perf regression is due to the new paging feature in 2.0. The server is 
executing the read for each id in the IN clause *sequentially* in order to 
implement the paging semantics.

The wisdom of using multi-get like this has been debated in other forums, but 
the thing that's unfortunate from a user point of view, is if they had a bunch 
of code working against 1.2 and then they upgrade their cluster to 2.0 and all 
of a sudden start to see an order of magnitude or worse perf regression. That 
will be perceived as a problem. I think it would surprise anyone not familiar 
with the code that the separate reads for the IN clause would be done 
sequentially rather than in parallel.

As a workaround, disable paging in the Java driver by setting fetchSize to 
Integer.MAX_VALUE on your QueryOptions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7767) Expose sizes of off-heap data structures via JMX and `nodetool info`

2014-08-14 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-7767:


 Summary: Expose sizes of off-heap data structures via JMX and 
`nodetool info`
 Key: CASSANDRA-7767
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7767
 Project: Cassandra
  Issue Type: New Feature
Reporter: J.B. Langston


It would be very helpful for troubleshooting memory consumption to know the 
individual sizes of off-heap data structures such as bloom filters, index 
summaries, compression metadata, etc. Can we expose this over JMX? Also, since 
`nodetool info` already shows size of heap, key cache, etc. it seems like a 
natural place to show this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7745) Background LCS compactions stall with pending compactions remaining

2014-08-11 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-7745:


 Summary: Background LCS compactions stall with pending compactions 
remaining
 Key: CASSANDRA-7745
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7745
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


We've hit a scenario where background LCS compactions will stall. 
compactionstats output shows hundreds of pending compactions but none active. 
The thread dumps show no CompactionExecutor threads running, and no compaction 
activity is being logged to system.log.  This seems to happen when there are no 
writes to the node. There are no flushes logged either, and when writes resume, 
compactions seem to resume as well, but still don't ever get to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7723) sstable2json (and possibly other command-line tools) hang if no write permission to the commitlogs

2014-08-08 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-7723:


 Summary: sstable2json (and possibly other command-line tools) hang 
if no write permission to the commitlogs
 Key: CASSANDRA-7723
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7723
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


sstable2json (and potentially other command-line tools that call 
DatabaseDescriptor.loadSchemas) will hang if the user running them doesn't have 
write permission on the commit logs.  loadSchemas calls Schema.updateVersion, 
which causes a mutation to the system tables, then it just spins forever trying 
to acquire a commit log segment.  See this thread dump: 
https://gist.github.com/markcurtis1970/837e770d1cad5200943c. The tools should 
recognize this and present an understandable error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7723) sstable2json (and possibly other command-line tools) hang if no write permission to the commitlogs

2014-08-08 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-7723:
-

Priority: Minor  (was: Major)

 sstable2json (and possibly other command-line tools) hang if no write 
 permission to the commitlogs
 --

 Key: CASSANDRA-7723
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7723
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston
Priority: Minor

 sstable2json (and potentially other command-line tools that call 
 DatabaseDescriptor.loadSchemas) will hang if the user running them doesn't 
 have write permission on the commit logs.  loadSchemas calls 
 Schema.updateVersion, which causes a mutation to the system tables, then it 
 just spins forever trying to acquire a commit log segment.  See this thread 
 dump: https://gist.github.com/markcurtis1970/837e770d1cad5200943c. The tools 
 should recognize this and present an understandable error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-7117) cqlsh should return a non-zero error code if a query fails

2014-04-30 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-7117:


 Summary: cqlsh should return a non-zero error code if a query fails
 Key: CASSANDRA-7117
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7117
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Priority: Minor


cqlsh should return a non-zero error code when the last query in a file or 
piped stdin fails.  This is so that shell scripts to determine if a cql script 
failed or succeeded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7117) cqlsh should return a non-zero error code if a query fails

2014-04-30 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-7117:
-

Description: cqlsh should return a non-zero error code when a query in a 
file or piped stdin fails.  This is so that shell scripts to determine if a cql 
script failed or succeeded.  (was: cqlsh should return a non-zero error code 
when the last query in a file or piped stdin fails.  This is so that shell 
scripts to determine if a cql script failed or succeeded.)

 cqlsh should return a non-zero error code if a query fails
 --

 Key: CASSANDRA-7117
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7117
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Priority: Minor

 cqlsh should return a non-zero error code when a query in a file or piped 
 stdin fails.  This is so that shell scripts to determine if a cql script 
 failed or succeeded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5624) Memory leak in SerializingCache

2014-04-07 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13962308#comment-13962308
 ] 

J.B. Langston commented on CASSANDRA-5624:
--

Nobody on 1.2 has hit it. As far as I know, just the one occurrence.

 Memory leak in SerializingCache
 ---

 Key: CASSANDRA-5624
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5624
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Jonathan Ellis
Assignee: Ryan McGuire

 A customer reported a memory leak when off-heap row cache is enabled.
 I gave them a patch against 1.1.9 to troubleshoot 
 (https://github.com/jbellis/cassandra/commits/row-cache-finalizer).  This 
 confirms that row cache is responsible.  Here is a sample of the log:
 {noformat}
 DEBUG [Finalizer] 2013-06-08 06:49:58,656 FreeableMemory.java (line 69) 
 Unreachable memory still has nonzero refcount 1
 DEBUG [Finalizer] 2013-06-08 06:49:58,656 FreeableMemory.java (line 71) 
 Unreachable memory 140337996747792 has not been freed (will free now)
 DEBUG [Finalizer] 2013-06-08 06:49:58,656 FreeableMemory.java (line 69) 
 Unreachable memory still has nonzero refcount 1
 DEBUG [Finalizer] 2013-06-08 06:49:58,656 FreeableMemory.java (line 71) 
 Unreachable memory 140337989287984 has not been freed (will free now)
 {noformat}
 That is, memory is not being freed because we never got to zero references.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6841) ConcurrentModificationException in commit-log-writer after local schema reset

2014-04-01 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6841:
-

Fix Version/s: 1.2.17

 ConcurrentModificationException in commit-log-writer after local schema reset
 -

 Key: CASSANDRA-6841
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6841
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux 3.2.0 (Debian Wheezy) Cassandra 2.0.6, Oracle JVM 
 1.7.0_51
 Almost default cassandra.yaml (IPs and cluster name changed)
 This is the 2nd node in a 2-node ring. It has ~2500 keyspaces and very low 
 traffic. (Only new keyspaces see reads and writes.)
Reporter: Pas
Assignee: Benedict
Priority: Minor
 Fix For: 1.2.17, 2.0.7, 2.1 beta2


 {code}
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,013 
 MigrationManager.java (line 329) Starting local schema reset...
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,016 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,016 Memtable.java (line 331) 
 Writing Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,182 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-398-Data.db (145 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33159822)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,185 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,185 Memtable.java (line 331) 
 Writing Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,357 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-399-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33159959)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,361 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@768887091(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,361 Memtable.java (line 331) 
 Writing Memtable-local@768887091(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,516 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-400-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33160096)
  INFO [CompactionExecutor:38] 2014-03-12 11:37:54,517 CompactionTask.java 
 (line 115) Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-398-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-400-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-399-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-397-Data.db')]
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,519 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@271993477(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,519 Memtable.java (line 331) 
 Writing Memtable-local@271993477(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,794 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-401-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33160233)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,799 
 MigrationManager.java (line 357) Local schema reset is complete.
  INFO [CompactionExecutor:38] 2014-03-12 11:37:54,848 CompactionTask.java 
 (line 275) Compacted 4 sstables to 
 [/var/lib/cassandra/data/system/local/system-local-jb-402,].  6,099 bytes to 
 5,821 (~95% of original) in 330ms = 0.016822MB/s.  4 total partitions merged 
 to 1.  Partition merge counts were {4:1, }
  INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 
 785) Enqueuing flush of 
 Memtable-schema_columnfamilies@106276050(181506/509164 serialized/live bytes, 
 3276 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:55,110 Memtable.java (line 331) 
 Writing Memtable-schema_columnfamilies@106276050(181506/509164 
 serialized/live bytes, 3276 ops)
  INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 
 785) Enqueuing flush of Memtable-schema_columns@252242773(185191/630698 
 serialized/live bytes, 3614 ops)
 ERROR [COMMIT-LOG-WRITER]

[jira] [Reopened] (CASSANDRA-6841) ConcurrentModificationException in commit-log-writer after local schema reset

2014-04-01 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston reopened CASSANDRA-6841:
--


Reopening to get a backport for 1.2.

 ConcurrentModificationException in commit-log-writer after local schema reset
 -

 Key: CASSANDRA-6841
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6841
 Project: Cassandra
  Issue Type: Bug
 Environment: Linux 3.2.0 (Debian Wheezy) Cassandra 2.0.6, Oracle JVM 
 1.7.0_51
 Almost default cassandra.yaml (IPs and cluster name changed)
 This is the 2nd node in a 2-node ring. It has ~2500 keyspaces and very low 
 traffic. (Only new keyspaces see reads and writes.)
Reporter: Pas
Assignee: Benedict
Priority: Minor
 Fix For: 1.2.17, 2.0.7, 2.1 beta2


 {code}
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,013 
 MigrationManager.java (line 329) Starting local schema reset...
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,016 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,016 Memtable.java (line 331) 
 Writing Memtable-local@394448776(114/1140 serialized/live bytes, 3 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,182 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-398-Data.db (145 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33159822)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,185 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,185 Memtable.java (line 331) 
 Writing Memtable-local@1087210140(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,357 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-399-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33159959)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,361 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@768887091(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,361 Memtable.java (line 331) 
 Writing Memtable-local@768887091(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,516 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-400-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33160096)
  INFO [CompactionExecutor:38] 2014-03-12 11:37:54,517 CompactionTask.java 
 (line 115) Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-398-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-400-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-399-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-jb-397-Data.db')]
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,519 
 ColumnFamilyStore.java (line 785) Enqueuing flush of 
 Memtable-local@271993477(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,519 Memtable.java (line 331) 
 Writing Memtable-local@271993477(62/620 serialized/live bytes, 1 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:54,794 Memtable.java (line 371) 
 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-jb-401-Data.db (96 bytes) 
 for commitlog position ReplayPosition(segmentId=1394620057452, 
 position=33160233)
  INFO [RMI TCP Connection(38)-192.168.36.171] 2014-03-12 11:37:54,799 
 MigrationManager.java (line 357) Local schema reset is complete.
  INFO [CompactionExecutor:38] 2014-03-12 11:37:54,848 CompactionTask.java 
 (line 275) Compacted 4 sstables to 
 [/var/lib/cassandra/data/system/local/system-local-jb-402,].  6,099 bytes to 
 5,821 (~95% of original) in 330ms = 0.016822MB/s.  4 total partitions merged 
 to 1.  Partition merge counts were {4:1, }
  INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 
 785) Enqueuing flush of 
 Memtable-schema_columnfamilies@106276050(181506/509164 serialized/live bytes, 
 3276 ops)
  INFO [FlushWriter:6] 2014-03-12 11:37:55,110 Memtable.java (line 331) 
 Writing Memtable-schema_columnfamilies@106276050(181506/509164 
 serialized/live bytes, 3276 ops)
  INFO [OptionalTasks:1] 2014-03-12 11:37:55,110 ColumnFamilyStore.java (line 
 785) Enqueuing flush of Memtable-schema_columns@252242773(185191/630698 
 serialized/live bytes, 3614 ops)
 ERROR

[jira] [Created] (CASSANDRA-6960) Cassandra requires allow filtering

2014-03-31 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-6960:


 Summary: Cassandra requires allow filtering
 Key: CASSANDRA-6960
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6960
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6960) Cassandra requires ALLOW FILTERING for a range scan

2014-03-31 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6960:
-

Reproduced In: 2.0.5
  Description: 
Given this table definition:

{code}
CREATE TABLE metric_log_a (
  destination_id text,
  rate_plan_id int,
  metric_name text,
  extraction_date 'org.apache.cassandra.db.marshal.TimestampType',
  metric_value text,
  PRIMARY KEY (destination_id, rate_plan_id, metric_name, extraction_date)
);
{code}

It seems that Cassandra should be able to perform the following query without 
ALLOW FILTERING:

{code}
select destination_id, rate_plan_id, metric_name, extraction_date, metric_value 
from metric_log_a 
where token(destination_id)  ? 
and token(destination_id) = ? 
and rate_plan_id=90 
and metric_name='minutesOfUse' 
and extraction_date = '2014-03-05' 
and extraction_date = '2014-03-05' 
allow filtering;
{code}

However, it will refuse to run unless ALLOW FILTERING is specified.
  Summary: Cassandra requires ALLOW FILTERING for a range scan  (was: 
Cassandra requires allow filtering)

 Cassandra requires ALLOW FILTERING for a range scan
 ---

 Key: CASSANDRA-6960
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6960
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston

 Given this table definition:
 {code}
 CREATE TABLE metric_log_a (
   destination_id text,
   rate_plan_id int,
   metric_name text,
   extraction_date 'org.apache.cassandra.db.marshal.TimestampType',
   metric_value text,
   PRIMARY KEY (destination_id, rate_plan_id, metric_name, extraction_date)
 );
 {code}
 It seems that Cassandra should be able to perform the following query without 
 ALLOW FILTERING:
 {code}
 select destination_id, rate_plan_id, metric_name, extraction_date, 
 metric_value 
 from metric_log_a 
 where token(destination_id)  ? 
 and token(destination_id) = ? 
 and rate_plan_id=90 
 and metric_name='minutesOfUse' 
 and extraction_date = '2014-03-05' 
 and extraction_date = '2014-03-05' 
 allow filtering;
 {code}
 However, it will refuse to run unless ALLOW FILTERING is specified.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one

2014-03-21 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-6902:


 Summary: Make cqlsh prompt for a password if the user doesn't 
enter one
 Key: CASSANDRA-6902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902
 Project: Cassandra
  Issue Type: New Feature
Reporter: J.B. Langston
Priority: Minor


If the user specifies -u username and leaves off -p password, cqlsh should 
prompt for a password without echoing it to the screen instead of throwing an 
exception, which it currently does.  I know that you can put a username and 
password in the .cqlshrc file but if a user wants to log in with multiple 
accounts and not have the password visible on the screen, there's no way to 
currently do that.  This feature has been requested by a customer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one

2014-03-21 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6902:
-

Description: If the user specifies -u username and leaves off -p password, 
cqlsh should prompt for a password without echoing it to the screen instead of 
throwing an exception, which it currently does.  I know that you can put a 
username and password in the .cqlshrc file but if a user wants to log in with 
multiple accounts and not have the password visible on the screen, there's no 
way to currently do that.  (was: If the user specifies -u username and leaves 
off -p password, cqlsh should prompt for a password without echoing it to the 
screen instead of throwing an exception, which it currently does.  I know that 
you can put a username and password in the .cqlshrc file but if a user wants to 
log in with multiple accounts and not have the password visible on the screen, 
there's no way to currently do that.  This feature has been requested by a 
customer.)

 Make cqlsh prompt for a password if the user doesn't enter one
 --

 Key: CASSANDRA-6902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902
 Project: Cassandra
  Issue Type: New Feature
Reporter: J.B. Langston
Priority: Minor

 If the user specifies -u username and leaves off -p password, cqlsh should 
 prompt for a password without echoing it to the screen instead of throwing an 
 exception, which it currently does.  I know that you can put a username and 
 password in the .cqlshrc file but if a user wants to log in with multiple 
 accounts and not have the password visible on the screen, there's no way to 
 currently do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one

2014-03-21 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6902:
-

Attachment: trunk-6902.txt

 Make cqlsh prompt for a password if the user doesn't enter one
 --

 Key: CASSANDRA-6902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902
 Project: Cassandra
  Issue Type: New Feature
Reporter: J.B. Langston
Assignee: Mikhail Stepura
Priority: Minor
 Attachments: trunk-6902.txt


 If the user specifies -u username and leaves off -p password, cqlsh should 
 prompt for a password without echoing it to the screen instead of throwing an 
 exception, which it currently does.  I know that you can put a username and 
 password in the .cqlshrc file but if a user wants to log in with multiple 
 accounts and not have the password visible on the screen, there's no way to 
 currently do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one

2014-03-21 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6902:
-

Attachment: trunk-6902.txt

 Make cqlsh prompt for a password if the user doesn't enter one
 --

 Key: CASSANDRA-6902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: J.B. Langston
Assignee: J.B. Langston
Priority: Minor
 Fix For: 2.0.7

 Attachments: trunk-6902.txt


 If the user specifies -u username and leaves off -p password, cqlsh should 
 prompt for a password without echoing it to the screen instead of throwing an 
 exception, which it currently does.  I know that you can put a username and 
 password in the .cqlshrc file but if a user wants to log in with multiple 
 accounts and not have the password visible on the screen, there's no way to 
 currently do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6902) Make cqlsh prompt for a password if the user doesn't enter one

2014-03-21 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6902:
-

Attachment: (was: trunk-6902.txt)

 Make cqlsh prompt for a password if the user doesn't enter one
 --

 Key: CASSANDRA-6902
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6902
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: J.B. Langston
Assignee: J.B. Langston
Priority: Minor
 Fix For: 2.0.7

 Attachments: trunk-6902.txt


 If the user specifies -u username and leaves off -p password, cqlsh should 
 prompt for a password without echoing it to the screen instead of throwing an 
 exception, which it currently does.  I know that you can put a username and 
 password in the .cqlshrc file but if a user wants to log in with multiple 
 accounts and not have the password visible on the screen, there's no way to 
 currently do that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6449) Tools error out if they can't make ~/.cassandra

2014-01-29 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885584#comment-13885584
 ] 

J.B. Langston commented on CASSANDRA-6449:
--

From a customer:

The culprit is: / src / java / org / apache / cassandra / utils / 
FBUtilities.java

File historyDir = new File(System.getProperty(user.home), .cassandra);

Setting an alternate environment variable HOME doesn't fix. I've tried patching 
the nodetool wrapper script to provide -Duser.home at runtime, but it seems 
when defining user.home, I get runtime errors with missing libraries. It would 
be nice if the tool just honoured $HOME (or let you specify a commandline 
override without hacking the script).

 Tools error out if they can't make ~/.cassandra
 ---

 Key: CASSANDRA-6449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6449
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Jeremiah Jordan

 We shouldn't error out if we can't make the .cassandra folder for the new 
 history stuff.
 {noformat}
 Exception in thread main FSWriteError in 
 /usr/share/opscenter-agent/.cassandra
   at 
 org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:261)
   at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:627)
   at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1403)
   at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1122)
 Caused by: java.io.IOException: Failed to mkdirs 
 /usr/share/opscenter-agent/.cassandra
   ... 4 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6449) Tools error out if they can't make ~/.cassandra

2014-01-29 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885608#comment-13885608
 ] 

J.B. Langston commented on CASSANDRA-6449:
--

This is the error that occurs when manually defining -Duser.home in the 
nodetool shell script:

{code}
Exception in thread main java.lang.NoClassDefFoundError: 
com/google/common/collect/AbstractMultimap$WrappedSortedSet 
at 
com.google.common.collect.AbstractMultimap.wrapCollection(AbstractMultimap.java:374)
 
at com.google.common.collect.AbstractMultimap.get(AbstractMultimap.java:363) 
at 
com.google.common.collect.AbstractSetMultimap.get(AbstractSetMultimap.java:59) 
at 
com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:65)
 
at com.google.common.collect.TreeMultimap.get(TreeMultimap.java:74) 
at 
com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:35)
 
at 
com.google.common.collect.Multimaps$UnmodifiableMultimap.get(Multimaps.java:563)
 
at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:507) 
at 
org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2048) 
at 
org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2042) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
at java.lang.reflect.Method.invoke(Method.java:597) 
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
 
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
 
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) 
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) 
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:264) 
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:762) 
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1454)
 
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:74)
 
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1295)
 
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1387)
 
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:818)
 
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
at java.lang.reflect.Method.invoke(Method.java:597) 
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) 
at sun.rmi.transport.Transport$1.run(Transport.java:159) 
at java.security.AccessController.doPrivileged(Native Method) 
at sun.rmi.transport.Transport.serviceCall(Transport.java:155) 
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) 
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
 
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) 
at java.lang.Thread.run(Thread.java:662)
{code}

I'm guessing maybe we use user.home elsewhere to set up the CLASSPATH.

 Tools error out if they can't make ~/.cassandra
 ---

 Key: CASSANDRA-6449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6449
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Jeremiah Jordan

 We shouldn't error out if we can't make the .cassandra folder for the new 
 history stuff.
 {noformat}
 Exception in thread main FSWriteError in 
 /usr/share/opscenter-agent/.cassandra
   at 
 org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:261)
   at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:627)
   at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1403)
   at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1122)
 Caused by: java.io.IOException: Failed to mkdirs 
 /usr/share/opscenter-agent/.cassandra
   ... 4 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (CASSANDRA-6449) Tools error out if they can't make ~/.cassandra

2014-01-29 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885608#comment-13885608
 ] 

J.B. Langston edited comment on CASSANDRA-6449 at 1/29/14 6:18 PM:
---

This is the error that occurs when manually defining -Duser.home in the 
nodetool shell script:

{code}
Exception in thread main java.lang.NoClassDefFoundError: 
com/google/common/collect/AbstractMultimap$WrappedSortedSet 
at 
com.google.common.collect.AbstractMultimap.wrapCollection(AbstractMultimap.java:374)
 
at com.google.common.collect.AbstractMultimap.get(AbstractMultimap.java:363) 
at 
com.google.common.collect.AbstractSetMultimap.get(AbstractSetMultimap.java:59) 
at 
com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:65)
 
at com.google.common.collect.TreeMultimap.get(TreeMultimap.java:74) 
at 
com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:35)
 
at 
com.google.common.collect.Multimaps$UnmodifiableMultimap.get(Multimaps.java:563)
 
at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:507) 
at 
org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2048) 
at 
org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2042) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
at java.lang.reflect.Method.invoke(Method.java:597) 
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
 
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
 
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) 
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) 
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:264) 
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
 
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:762) 
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1454)
 
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:74)
 
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1295)
 
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1387)
 
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:818)
 
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
at java.lang.reflect.Method.invoke(Method.java:597) 
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) 
at sun.rmi.transport.Transport$1.run(Transport.java:159) 
at java.security.AccessController.doPrivileged(Native Method) 
at sun.rmi.transport.Transport.serviceCall(Transport.java:155) 
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) 
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
 
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) 
at java.lang.Thread.run(Thread.java:662)
{code}


was (Author: jblangs...@datastax.com):
This is the error that occurs when manually defining -Duser.home in the 
nodetool shell script:

{code}
Exception in thread main java.lang.NoClassDefFoundError: 
com/google/common/collect/AbstractMultimap$WrappedSortedSet 
at 
com.google.common.collect.AbstractMultimap.wrapCollection(AbstractMultimap.java:374)
 
at com.google.common.collect.AbstractMultimap.get(AbstractMultimap.java:363) 
at 
com.google.common.collect.AbstractSetMultimap.get(AbstractSetMultimap.java:59) 
at 
com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:65)
 
at com.google.common.collect.TreeMultimap.get(TreeMultimap.java:74) 
at 
com.google.common.collect.AbstractSortedSetMultimap.get(AbstractSortedSetMultimap.java:35)
 
at 
com.google.common.collect.Multimaps$UnmodifiableMultimap.get(Multimaps.java:563)
 
at org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:507) 
at 
org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2048) 
at 
org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2042) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at

[jira] [Created] (CASSANDRA-6548) Order nodetool ring output by token when vnodes aren't in use

2014-01-03 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-6548:


 Summary: Order nodetool ring output by token when vnodes aren't in 
use
 Key: CASSANDRA-6548
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6548
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


It is confusing to order the nodes by hostId in nodetool ring when vnodes 
aren't in use. This happens in 1.2 when providing a keyspace name:

{code}
Datacenter: DC1
==
Replicas: 2

Address RackStatus State   LoadOwns
Token
   
42535295865117307932921825928971026432
xxx.xxx.xxx.48   RAC2Up Normal  324.26 GB   25.00%  
85070591730234615865843651857942052864
xxx.xxx.xxx.42   RAC1Up Normal  284.39 GB   25.00%  0
xxx.xxx.xxx.44   RAC1Up Normal  931.07 GB   75.00%  
127605887595351923798765477786913079296
xxx.xxx.xxx.46   RAC2Up Normal  881.93 GB   75.00%  
42535295865117307932921825928971026432

Datacenter: DC2
==
Replicas: 2

Address RackStatus State   LoadOwns
Token
   
148873535527910577765226390751398592512
xxx.xxx.xxx.19  RAC2Up Normal  568.22 GB   50.00%  
63802943797675961899382738893456539648
xxx.xxx.xxx.17  RAC1Up Normal  621.58 GB   50.00%  
106338239662793269832304564822427566080
xxx.xxx.xxx.15  RAC1Up Normal  566.99 GB   50.00%  
21267647932558653966460912964485513216
xxx.xxx.xxx.21  RAC2Up Normal  619.41 GB   50.00%  
148873535527910577765226390751398592512
{code}

Among other things, it makes it hard to spot rack imbalances.  In the above 
output, the racks DC1 is actually incorrectly ordered and DC2 is correctly 
ordered, but it's not obvious until you manually sort the nodes by token.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (CASSANDRA-6548) Order nodetool ring output by token when vnodes aren't in use

2014-01-03 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6548:
-

Description: 
It is confusing to order the nodes by hostId in nodetool ring when vnodes 
aren't in use. This happens in 1.2 when providing a keyspace name:

{code}
Datacenter: DC1
==
Replicas: 2

Address RackStatus State   LoadOwns
Token
   
42535295865117307932921825928971026432
xxx.xxx.xxx.48   RAC2Up Normal  324.26 GB   25.00%  
85070591730234615865843651857942052864
xxx.xxx.xxx.42   RAC1Up Normal  284.39 GB   25.00%  0
xxx.xxx.xxx.44   RAC1Up Normal  931.07 GB   75.00%  
127605887595351923798765477786913079296
xxx.xxx.xxx.46   RAC2Up Normal  881.93 GB   75.00%  
42535295865117307932921825928971026432

Datacenter: DC2
==
Replicas: 2

Address RackStatus State   LoadOwns
Token
   
148873535527910577765226390751398592512
xxx.xxx.xxx.19  RAC2Up Normal  568.22 GB   50.00%  
63802943797675961899382738893456539648
xxx.xxx.xxx.17  RAC1Up Normal  621.58 GB   50.00%  
106338239662793269832304564822427566080
xxx.xxx.xxx.15  RAC1Up Normal  566.99 GB   50.00%  
21267647932558653966460912964485513216
xxx.xxx.xxx.21  RAC2Up Normal  619.41 GB   50.00%  
148873535527910577765226390751398592512
{code}

Among other things, this makes it hard to spot rack imbalances.  In the above 
output, the racks in DC1 are actually incorrectly ordered and those in DC2 are 
correctly ordered, but it's not obvious until you manually sort the nodes by 
token.

  was:
It is confusing to order the nodes by hostId in nodetool ring when vnodes 
aren't in use. This happens in 1.2 when providing a keyspace name:

{code}
Datacenter: DC1
==
Replicas: 2

Address RackStatus State   LoadOwns
Token
   
42535295865117307932921825928971026432
xxx.xxx.xxx.48   RAC2Up Normal  324.26 GB   25.00%  
85070591730234615865843651857942052864
xxx.xxx.xxx.42   RAC1Up Normal  284.39 GB   25.00%  0
xxx.xxx.xxx.44   RAC1Up Normal  931.07 GB   75.00%  
127605887595351923798765477786913079296
xxx.xxx.xxx.46   RAC2Up Normal  881.93 GB   75.00%  
42535295865117307932921825928971026432

Datacenter: DC2
==
Replicas: 2

Address RackStatus State   LoadOwns
Token
   
148873535527910577765226390751398592512
xxx.xxx.xxx.19  RAC2Up Normal  568.22 GB   50.00%  
63802943797675961899382738893456539648
xxx.xxx.xxx.17  RAC1Up Normal  621.58 GB   50.00%  
106338239662793269832304564822427566080
xxx.xxx.xxx.15  RAC1Up Normal  566.99 GB   50.00%  
21267647932558653966460912964485513216
xxx.xxx.xxx.21  RAC2Up Normal  619.41 GB   50.00%  
148873535527910577765226390751398592512
{code}

Among other things, it makes it hard to spot rack imbalances.  In the above 
output, the racks DC1 is actually incorrectly ordered and DC2 is correctly 
ordered, but it's not obvious until you manually sort the nodes by token.


 Order nodetool ring output by token when vnodes aren't in use
 -

 Key: CASSANDRA-6548
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6548
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston

 It is confusing to order the nodes by hostId in nodetool ring when vnodes 
 aren't in use. This happens in 1.2 when providing a keyspace name:
 {code}
 Datacenter: DC1
 ==
 Replicas: 2
 Address RackStatus State   LoadOwns   
  Token
   
  42535295865117307932921825928971026432
 xxx.xxx.xxx.48   RAC2Up Normal  324.26 GB   25.00%
   85070591730234615865843651857942052864
 xxx.xxx.xxx.42   RAC1Up Normal  284.39 GB   25.00%
   0
 xxx.xxx.xxx.44   RAC1Up Normal  931.07 GB   75.00%
   127605887595351923798765477786913079296
 xxx.xxx.xxx.46   RAC2Up Normal  881.93 GB   75.00%

[jira] [Created] (CASSANDRA-6262) Nodetool compact throws an error after importing data with sstableloader

2013-10-28 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-6262:


 Summary: Nodetool compact throws an error after importing data 
with sstableloader
 Key: CASSANDRA-6262
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6262
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


Exception when running nodetool compact:

{code}
Error occurred during compaction
java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: 
index (2) must be less than size (2)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at 
org.apache.cassandra.db.compaction.CompactionManager.performMaximal(CompactionManager.java:331)
at 
org.apache.cassandra.db.ColumnFamilyStore.forceMajorCompaction(ColumnFamilyStore.java:1691)
at 
org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:2198)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.IndexOutOfBoundsException: index (2) must be less than 
size (2)
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305)
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284)
at 
com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:81)
at 
org.apache.cassandra.db.marshal.CompositeType.getComparator(CompositeType.java:94)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:76)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:31)
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:128)
at 
org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:119)
at 
org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:114)
at

[jira] [Commented] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-10-09 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790844#comment-13790844
 ] 

J.B. Langston commented on CASSANDRA-6097:
--

Customer compiled Cassandra from git and ran the resulting nodetool against his 
DSE installation. He reported that the hang is still reproducible.  I haven't 
tried to duplicate this myself yet.

 nodetool repair randomly hangs.
 ---

 Key: CASSANDRA-6097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax AMI
Reporter: J.B. Langston
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.2.11

 Attachments: 6097-1.2.txt, dse.stack, nodetool.stack


 nodetool repair randomly hangs. This is not the same issue where repair hangs 
 if a stream is disrupted. This can be reproduced on a single-node cluster 
 where no streaming takes place, so I think this may be a JMX connection or 
 timeout issue. Thread dumps show that nodetool is waiting on a JMX response 
 and there are no repair-related threads running in Cassandra. Nodetool main 
 thread waiting for JMX response:
 {code}
 main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
   at java.lang.Object.wait(Object.java:485)
   at 
 org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34)
   - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
   at 
 org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976)
   at 
 org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221)
   at 
 org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444)
   at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213)
 {code}
 When nodetool hangs, it does not print out the following message:
 Starting repair command #XX, repairing 1 ranges for keyspace XXX
 However, Cassandra logs that repair in system.log:
 1380033480.95  INFO [Thread-154] 10:38:00,882 Starting repair command #X, 
 repairing X ranges for keyspace XXX
 This suggests that the repair command was received by Cassandra but the 
 connection then failed and nodetool didn't receive a response.
 Obviously, running repair on a single-node cluster is pointless but it's the 
 easiest way to demonstrate this problem. The customer who reported this has 
 also seen the issue on his real multi-node cluster.
 Steps to reproduce:
 Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 
 (Cassandra 1.2.6+patches).  I was unable to reproduce on my Mac using the 
 same version, and subsequent attempts to reproduce it on the AMI were 
 unsuccessful. The customer says he is able is able to reliably reproduce on 
 his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 
 1) Deploy an AMI using the DataStax AMI at 
 https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2
 2) Create a test keyspace
 {code}
 create keyspace test WITH replication = {'class': 'SimpleStrategy', 
 'replication_factor': 1};
 {code}
 3) Run an endless loop that runs nodetool repair repeatedly:
 {code}
 while true; do nodetool repair -pr test; done
 {code}
 4) Wait until repair hangs. It may take many tries; the behavior is random.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain

2013-10-07 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-5911:
-

Attachment: 6528_140171_knwmuqxe9bjv5re_system.log

Attached system.log showing commitlog replay.  This was produced by running 
stress against a single-node cassandra cluster, then running drain and 
restarting.

 Commit logs are not removed after nodetool flush or nodetool drain
 --

 Key: CASSANDRA-5911
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: J.B. Langston
Assignee: Vijay
Priority: Minor
 Fix For: 2.0.2

 Attachments: 6528_140171_knwmuqxe9bjv5re_system.log


 Commit logs are not removed after nodetool flush or nodetool drain. This can 
 lead to unnecessary commit log replay during startup.  I've reproduced this 
 on Apache Cassandra 1.2.8.  Usually this isn't much of an issue but on a 
 Solr-indexed column family in DSE, each replayed mutation has to be reindexed 
 which can make startup take a long time (on the order of 20-30 min).
 Reproduction follows:
 {code}
 jblangston:bin jblangston$ ./cassandra  /dev/null
 jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000  
 /dev/null
 jblangston:bin jblangston$ du -h ../commitlog
 576M  ../commitlog
 jblangston:bin jblangston$ nodetool flush
 jblangston:bin jblangston$ du -h ../commitlog
 576M  ../commitlog
 jblangston:bin jblangston$ nodetool drain
 jblangston:bin jblangston$ du -h ../commitlog
 576M  ../commitlog
 jblangston:bin jblangston$ pkill java
 jblangston:bin jblangston$ du -h ../commitlog
 576M  ../commitlog
 jblangston:bin jblangston$ ./cassandra -f | grep Replaying
  INFO 10:03:42,915 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log
  INFO 10:03:42,922 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log
  INFO 10:03:43,907 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log
  INFO 10:03:43,907 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log
  INFO 10:03:43,907 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log
  INFO 10:03:43,908 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log
  INFO 10:03:43,908 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log
  INFO 10:03:43,908 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log
  INFO 10:03:43,909 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log
  INFO 10:03:43,909 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log
  INFO 10:03:43,909 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log
  INFO 10:03:43,910 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log
  INFO 10:03:43,910 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log
  INFO 10:03:43,911 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log
  INFO 10:03:43,911 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log
  INFO 10:03:43,911 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log
  INFO 10:03:43,912 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log
  INFO 10:03:43,912 Replaying 
 /opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log
  INFO 10:03:43,912 Replaying

[jira] [Updated] (CASSANDRA-4785) Secondary Index Sporadically Doesn't Return Rows

2013-10-04 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-4785:
-

Attachment: repro.py
entity_aliases.txt

Reproducible test case. Steps to reproduce:

1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'.

2) Create schema: 'cassandra-cli  entity_aliases.txt'

3) Run reproducible test case (requires pycassa): 'python repro.py'

Script inserts a row into Entity_Aliases table, then queries first by rowId and 
then by secondary index. Both queries should return the same row. 

5) Sometimes the node needs to be flushed and restarted after the initial 
insert before the issue is reproducible.

Expected result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
{code}

Actual Result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
{code}


 Secondary Index Sporadically Doesn't Return Rows
 

 Key: CASSANDRA-4785
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4785
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5, 1.1.6
 Environment: Ubuntu 10.04
 Java 6 Sun
 Cassandra 1.1.5 upgraded from 1.1.2 - 1.1.3 - 1.1.5
Reporter: Arya Goudarzi
 Attachments: entity_aliases.txt, repro.py


 I have a ColumnFamily with caching = ALL. I have 2 secondary indexes on it. I 
 have noticed if I query using the secondary index in the where clause, 
 sometimes I get the results and sometimes I don't. Until 2 weeks ago, the 
 caching option on this CF was set to NONE. So, I suspect something happened 
 in secondary index caching scheme. 
 Here are things I tried:
 1. I rebuild indexes for that CF on all nodes;
 2. I set the caching to KEYS_ONLY and rebuild the index again;
 3. I set the caching to NONE and rebuild the index again;
 None of the above helped. I suppose the caching still exists as this behavior 
 looks like cache mistmatch.
 I did a bit research, and found CASSANDRA-4197 that could be related.
 Please advice.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-4973) Secondary Index stops returning rows when caching=ALL

2013-10-04 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786389#comment-13786389
 ] 

J.B. Langston commented on CASSANDRA-4973:
--

Reproducible test case attached to CASSANDRA-4785, of which this appears to be 
a duplicate.

 Secondary Index stops returning rows when caching=ALL
 -

 Key: CASSANDRA-4973
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4973
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2, 1.1.6
 Environment: Centos 6.3, Java 1.6.0_35, cass. 1.1.2 upgraded to 1.1.6
Reporter: Daniel Strawson
 Attachments: secondary_index_rowcache_restart_test.py


 I've been using cassandra on a project for a little while in development and 
 have recently suddenly started having an issue where the secondary index 
 stops working, this is happening on my new production system, we are not yet 
 live.   Things work ok one moment, then suddenly queries to the cf through 
 the secondary index stop returning data.  I've seen it happen on 3 CFs. I've 
 tried:
 - various nodetools repair / scrub / rebuild_indexes options, none seem to 
 make a difference.
 - Doing a 'update column family whatever with column_metadata=[]' then 
 repeating with my correct column_metadata definition.  This seems to fix the 
 problem (temporarily) until it comes back.
 The last time it happened I had just restarted cassandra, so it could be that 
 which is causing the issue, I've got the production system ok at the moment, 
 I will try restarting a bit later when its not being used and if I can get 
 the issue to reoccur I will add more information.
 The problem first manifested itself in 1.1.2, so I upgraded to 1.1.6, this 
 has not fixed it.
 Here is an example of the create column family I'm using for one of the CFs 
 that affected:
 create column family region
   with column_type = 'Standard'
   and comparator = 'UTF8Type'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'UTF8Type'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 864000
   and min_compaction_threshold = 4
   and max_compaction_threshold = 32
   and replicate_on_write = true
   and compaction_strategy = 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
   and caching = 'KEYS_ONLY'
   and column_metadata = [
 
 {column_name : 'label',
 validation_class : UTF8Type},
 
 {column_name : 'countryCode',
 validation_class : UTF8Type,
 index_name : 'region_countryCode_idx',
 index_type : 0},
 
 ]
   and compression_options = {'sstable_compression' : 
 'org.apache.cassandra.io.compress.SnappyCompressor'};
 I've noticed that CASSANDRA-4785 looks similar, in my case once the system 
 has the problem, it doesn't go away until I fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (CASSANDRA-4785) Secondary Index Sporadically Doesn't Return Rows

2013-10-04 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786381#comment-13786381
 ] 

J.B. Langston edited comment on CASSANDRA-4785 at 10/4/13 5:38 PM:
---

Reproducible test case. Steps to reproduce:

1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'.

2) Create schema: 'cassandra-cli  entity_aliases.txt'

3) Run reproducible test case (requires pycassa): 'python repro.py'

Script inserts a row into Entity_Aliases table, then queries first by rowId and 
then by secondary index. Both queries should return the same row. 

Note: Sometimes the node needs to be flushed and restarted after the initial 
insert before the issue is reproducible.

Expected result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
{code}

Actual Result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
{code}



was (Author: jblangs...@datastax.com):
Reproducible test case. Steps to reproduce:

1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'.

2) Create schema: 'cassandra-cli  entity_aliases.txt'

3) Run reproducible test case (requires pycassa): 'python repro.py'

Script inserts a row into Entity_Aliases table, then queries first by rowId and 
then by secondary index. Both queries should return the same row. 

5) Sometimes the node needs to be flushed and restarted after the initial 
insert before the issue is reproducible.

Expected result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
{code}

Actual Result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
{code}


 Secondary Index Sporadically Doesn't Return Rows
 

 Key: CASSANDRA-4785
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4785
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5, 1.1.6
 Environment: Ubuntu 10.04
 Java 6 Sun
 Cassandra 1.1.5 upgraded from 1.1.2 - 1.1.3 - 1.1.5
Reporter: Arya Goudarzi
 Attachments: entity_aliases.txt, repro.py


 I have a ColumnFamily with caching = ALL. I have 2 secondary indexes on it. I 
 have noticed if I query using the secondary index in the where clause, 
 sometimes I get the results and sometimes I don't. Until 2 weeks ago, the 
 caching option on this CF was set to NONE. So, I suspect something happened 
 in secondary index caching scheme. 
 Here are things I tried:
 1. I rebuild indexes for that CF on all nodes;
 2. I set the caching to KEYS_ONLY and rebuild the index again;
 3. I set the caching to NONE and rebuild the index again;
 None of the above helped. I suppose the caching still exists as this behavior 
 looks like cache mistmatch.
 I did a bit research, and found CASSANDRA-4197 that could be related.
 Please advice.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (CASSANDRA-4785) Secondary Index Sporadically Doesn't Return Rows

2013-10-04 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786381#comment-13786381
 ] 

J.B. Langston edited comment on CASSANDRA-4785 at 10/4/13 5:41 PM:
---

I have attached files for a reproducible test case. Steps to reproduce:

1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'.

2) Create schema: 'cassandra-cli  entity_aliases.txt'

3) Run reproducible test case (requires pycassa): 'python repro.py'

Script inserts a row into Entity_Aliases table, then queries first by rowId and 
then by secondary index. Both queries should return the same row. 

Note: Sometimes the node needs to be flushed and restarted after the initial 
insert before the issue is reproducible.

Expected result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
{code}

Actual Result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
{code}

Reproduced in both 1.1.9 and 1.2.10. Customer is requesting a fix against 1.1.x.


was (Author: jblangs...@datastax.com):
Reproducible test case. Steps to reproduce:

1) Enable row cache in cassandra.yaml. I used 'row_cache_size_in_mb: 100'.

2) Create schema: 'cassandra-cli  entity_aliases.txt'

3) Run reproducible test case (requires pycassa): 'python repro.py'

Script inserts a row into Entity_Aliases table, then queries first by rowId and 
then by secondary index. Both queries should return the same row. 

Note: Sometimes the node needs to be flushed and restarted after the initial 
insert before the issue is reproducible.

Expected result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
{code}

Actual Result:

{code}
Getting by rowId ...
OrderedDict([('alias', u'17SQ0W'), ('aliasType', 'TIP4GQ'), ('entityId', 
UUID('9202a758-c605-445d-a67f-30ec8dfebc59')), ('entityType', 'BBN27L')])
Querying with get_indexed_slice ...
{code}


 Secondary Index Sporadically Doesn't Return Rows
 

 Key: CASSANDRA-4785
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4785
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.5, 1.1.6
 Environment: Ubuntu 10.04
 Java 6 Sun
 Cassandra 1.1.5 upgraded from 1.1.2 - 1.1.3 - 1.1.5
Reporter: Arya Goudarzi
 Attachments: entity_aliases.txt, repro.py


 I have a ColumnFamily with caching = ALL. I have 2 secondary indexes on it. I 
 have noticed if I query using the secondary index in the where clause, 
 sometimes I get the results and sometimes I don't. Until 2 weeks ago, the 
 caching option on this CF was set to NONE. So, I suspect something happened 
 in secondary index caching scheme. 
 Here are things I tried:
 1. I rebuild indexes for that CF on all nodes;
 2. I set the caching to KEYS_ONLY and rebuild the index again;
 3. I set the caching to NONE and rebuild the index again;
 None of the above helped. I suppose the caching still exists as this behavior 
 looks like cache mistmatch.
 I did a bit research, and found CASSANDRA-4197 that could be related.
 Please advice.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-10-01 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6097:
-

Attachment: dse.stack
nodetool.stack

Stack trace for nodetool and cassandra attached.

 nodetool repair randomly hangs.
 ---

 Key: CASSANDRA-6097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax AMI
Reporter: J.B. Langston
Priority: Trivial
 Attachments: dse.stack, nodetool.stack


 nodetool repair randomly hangs. This is not the same issue where repair hangs 
 if a stream is disrupted. This can be reproduced on a single-node cluster 
 where no streaming takes place, so I think this may be a JMX connection or 
 timeout issue. Thread dumps show that nodetool is waiting on a JMX response 
 and there are no repair-related threads running in Cassandra. Nodetool main 
 thread waiting for JMX response:
 {code}
 main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
   at java.lang.Object.wait(Object.java:485)
   at 
 org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34)
   - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
   at 
 org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976)
   at 
 org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221)
   at 
 org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444)
   at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213)
 {code}
 When nodetool hangs, it does not print out the following message:
 Starting repair command #XX, repairing 1 ranges for keyspace XXX
 However, Cassandra logs that repair in system.log:
 1380033480.95  INFO [Thread-154] 10:38:00,882 Starting repair command #X, 
 repairing X ranges for keyspace XXX
 This suggests that the repair command was received by Cassandra but the 
 connection then failed and nodetool didn't receive a response.
 Obviously, running repair on a single-node cluster is pointless but it's the 
 easiest way to demonstrate this problem. The customer who reported this has 
 also seen the issue on his real multi-node cluster.
 Steps to reproduce:
 Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 
 (Cassandra 1.2.6+patches).  I was unable to reproduce on my Mac using the 
 same version, and subsequent attempts to reproduce it on the AMI were 
 unsuccessful. The customer says he is able is able to reliably reproduce on 
 his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 
 1) Deploy an AMI using the DataStax AMI at 
 https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2
 2) Create a test keyspace
 {code}
 create keyspace test WITH replication = {'class': 'SimpleStrategy', 
 'replication_factor': 1};
 {code}
 3) Run an endless loop that runs nodetool repair repeatedly:
 {code}
 while true; do nodetool repair -pr test; done
 {code}
 4) Wait until repair hangs. It may take many tries; the behavior is random.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-10-01 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783281#comment-13783281
]

J.B. Langston commented on CASSANDRA-6097:
--

The JMX documentation
[states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765]
that notifications are not guaranteed to always be delivered. The API only
guarantees that a client either receives all notifications for which it is
listening, or can discover that notifications may have been lost. A client can
discover when notifications are lost by registering a listener using
JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't
doing this last part. Seems like we should register a list
ConnectionNotificationListener and if a connection fails, signal the condition
so that nodetool doesn't hang. Maybe have nodetool query for the status of the
repair at that point via separate JMX call, or just print a warning that The
status of the repair command can't be determined, please check the log. or
something like that.

I would disagree with prioritizing this as trivial. It's not critical but I
have had many customers express frustration with the nodetool repair's
proclivity for hanging. It makes automating repairs painful because they can't
count on nodetool to ever return.

nodetool repair randomly hangs.
---

Key: CASSANDRA-6097
URL: https://issues.apache.org/jira/browse/CASSANDRA-6097
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: DataStax AMI
Reporter: J.B. Langston
Priority: Trivial
Attachments: dse.stack, nodetool.stack

nodetool repair randomly hangs. This is not the same issue where repair hangs
if a stream is disrupted. This can be reproduced on a single-node cluster
where no streaming takes place, so I think this may be a JMX connection or
timeout issue. Thread dumps show that nodetool is waiting on a JMX response
and there are no repair-related threads running in Cassandra. Nodetool main
thread waiting for JMX response:
{code}
main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
at java.lang.Object.wait(Object.java:485)
at
org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34)
- locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
at
org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976)
at
org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221)
at
org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213)
{code}
When nodetool hangs, it does not print out the following message:
Starting repair command #XX, repairing 1 ranges for keyspace XXX
However, Cassandra logs that repair in system.log:
1380033480.95 INFO [Thread-154] 10:38:00,882 Starting repair command #X,
repairing X ranges for keyspace XXX
This suggests that the repair command was received by Cassandra but the
connection then failed and nodetool didn't receive a response.
Obviously, running repair on a single-node cluster is pointless but it's the
easiest way to demonstrate this problem. The customer who reported this has
also seen the issue on his real multi-node cluster.
Steps to reproduce:
Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3
(Cassandra 1.2.6+patches). I was unable to reproduce on my Mac using the
same version, and subsequent attempts to reproduce it on the AMI were
unsuccessful. The customer says he is able is able to reliably reproduce on
his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster.
1) Deploy an AMI using the DataStax AMI at
https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2
2) Create a test keyspace
{code}
create keyspace test WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1};
{code}
3) Run an endless loop that runs nodetool repair repeatedly:
{code}
while true; do nodetool repair -pr test; done
{code}
4) Wait until repair hangs. It may take many tries; the behavior is random.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-10-01 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783281#comment-13783281
]

J.B. Langston edited comment on CASSANDRA-6097 at 10/1/13 8:06 PM:
---

The JMX documentation
[states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765]
that notifications are not guaranteed to always be delivered. The API only
guarantees that a client either receives all notifications for which it is
listening, or can discover that notifications may have been lost. A client can
discover when notifications are lost by registering a listener using
JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't
doing this last part. Seems like we should register a listener
ConnectionNotificationListener and if a notification fails, signal the
condition so that nodetool doesn't hang. Maybe have nodetool query for the
status of the repair at that point via separate JMX call, or just print a
warning that The status of the repair command can't be determined, please
check the log. or something like that.

was (Author: jblangs...@datastax.com):
The JMX documentation
[states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765]
that notifications are not guaranteed to always be delivered. The API only
guarantees that a client either receives all notifications for which it is
listening, or can discover that notifications may have been lost. A client can
discover when notifications are lost by registering a listener using
JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't
doing this last part. Seems like we should register a list
ConnectionNotificationListener and if a connection fails, signal the condition
so that nodetool doesn't hang. Maybe have nodetool query for the status of the
repair at that point via separate JMX call, or just print a warning that The
status of the repair command can't be determined, please check the log. or
something like that.

nodetool repair randomly hangs.
---

[jira] [Comment Edited] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-10-01 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783281#comment-13783281
]

J.B. Langston edited comment on CASSANDRA-6097 at 10/1/13 8:07 PM:
---

The JMX documentation
[states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765]
that notifications are not guaranteed to always be delivered. The API only
guarantees that a client either receives all notifications for which it is
listening, or can discover that notifications may have been lost. A client can
discover when notifications are lost by registering a listener using
JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't
doing this last part. Seems like we should register a
ConnectionNotificationListener and if a notification fails, signal the
condition so that nodetool doesn't hang. Maybe have nodetool query for the
status of the repair at that point via separate JMX call, or just print a
warning that The status of the repair command can't be determined, please
check the log. or something like that.

was (Author: jblangs...@datastax.com):
The JMX documentation
[states|http://www.oracle.com/technetwork/java/javase/tech/best-practices-jsp-136021.html#mozTocId387765]
that notifications are not guaranteed to always be delivered. The API only
guarantees that a client either receives all notifications for which it is
listening, or can discover that notifications may have been lost. A client can
discover when notifications are lost by registering a listener using
JMXConnector.addConnectionNotificationListener. It looks like nodetool isn't
doing this last part. Seems like we should register a listener
ConnectionNotificationListener and if a notification fails, signal the
condition so that nodetool doesn't hang. Maybe have nodetool query for the
status of the repair at that point via separate JMX call, or just print a
warning that The status of the repair command can't be determined, please
check the log. or something like that.

nodetool repair randomly hangs.
---

[jira] [Commented] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-09-30 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13782397#comment-13782397
 ] 

J.B. Langston commented on CASSANDRA-6097:
--

I think the ease with which this can be reproduced is dependent on the number 
of keyspaces.  I started up a stock 3.1.3 AMI in hadoop mode so that DSE would 
create the cfs/HiveMetaStore/dse_system keyspaces and also created an 
additional keyspace using the customer's schema. Now I am able to reproduce the 
issue very readily by running nodetool repair -pr in a loop.  I thought that it 
might have been something to do with having hadoop enabled, so i disabled it 
again, but I am still able to reproduce the issue.  On the other hand, if I 
give repair a specific keyspace name, it takes much longer to reproduce, if at 
all.

 nodetool repair randomly hangs.
 ---

 Key: CASSANDRA-6097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax AMI
Reporter: J.B. Langston
Priority: Trivial

 nodetool repair randomly hangs. This is not the same issue where repair hangs 
 if a stream is disrupted. This can be reproduced on a single-node cluster 
 where no streaming takes place, so I think this may be a JMX connection or 
 timeout issue. Thread dumps show that nodetool is waiting on a JMX response 
 and there are no repair-related threads running in Cassandra. Nodetool main 
 thread waiting for JMX response:
 {code}
 main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
   at java.lang.Object.wait(Object.java:485)
   at 
 org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34)
   - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
   at 
 org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976)
   at 
 org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221)
   at 
 org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444)
   at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213)
 {code}
 When nodetool hangs, it does not print out the following message:
 Starting repair command #XX, repairing 1 ranges for keyspace XXX
 However, Cassandra logs that repair in system.log:
 1380033480.95  INFO [Thread-154] 10:38:00,882 Starting repair command #X, 
 repairing X ranges for keyspace XXX
 This suggests that the repair command was received by Cassandra but the 
 connection then failed and nodetool didn't receive a response.
 Obviously, running repair on a single-node cluster is pointless but it's the 
 easiest way to demonstrate this problem. The customer who reported this has 
 also seen the issue on his real multi-node cluster.
 Steps to reproduce:
 Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 
 (Cassandra 1.2.6+patches).  I was unable to reproduce on my Mac using the 
 same version, and subsequent attempts to reproduce it on the AMI were 
 unsuccessful. The customer says he is able is able to reliably reproduce on 
 his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 
 1) Deploy an AMI using the DataStax AMI at 
 https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2
 2) Create a test keyspace
 {code}
 create keyspace test WITH replication = {'class': 'SimpleStrategy', 
 'replication_factor': 1};
 {code}
 3) Run an endless loop that runs nodetool repair repeatedly:
 {code}
 while true; do nodetool repair -pr test; done
 {code}
 4) Wait until repair hangs. It may take many tries; the behavior is random.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (CASSANDRA-6110) Workaround for JMX random port selection

2013-09-29 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-6110:


 Summary: Workaround for JMX random port selection
 Key: CASSANDRA-6110
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6110
 Project: Cassandra
  Issue Type: Improvement
Reporter: J.B. Langston
Priority: Minor


Many people have been annoyed by the way that JMX selects a second port at 
random for the RMIServer, which makes it almost impossible to use JMX through a 
firewall.  There is a 
[workaround|https://blogs.oracle.com/jmxetc/entry/connecting_through_firewall_using_jmx]
 using a custom java agent.  Since jamm is already specified as the java agent 
for Cassandra, this would have to subclass or wrap the jamm MemoryMeter class.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-09-29 Thread J.B. Langston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781504#comment-13781504
 ] 

J.B. Langston commented on CASSANDRA-6097:
--

If I'm reading 
[this|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/NodeProbe.java#L1036-L1039]
 correctly, the condition that nodetool repair is waiting on won't get signaled 
if the status returned to the NotificationListener is SESSION_FAILED. Could 
that explain why it's hanging?

 nodetool repair randomly hangs.
 ---

 Key: CASSANDRA-6097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax AMI
Reporter: J.B. Langston
Priority: Trivial

 nodetool repair randomly hangs. This is not the same issue where repair hangs 
 if a stream is disrupted. This can be reproduced on a single-node cluster 
 where no streaming takes place, so I think this may be a JMX connection or 
 timeout issue. Thread dumps show that nodetool is waiting on a JMX response 
 and there are no repair-related threads running in Cassandra. Nodetool main 
 thread waiting for JMX response:
 {code}
 main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
   at java.lang.Object.wait(Object.java:485)
   at 
 org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34)
   - locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
   at 
 org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976)
   at 
 org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221)
   at 
 org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444)
   at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213)
 {code}
 When nodetool hangs, it does not print out the following message:
 Starting repair command #XX, repairing 1 ranges for keyspace XXX
 However, Cassandra logs that repair in system.log:
 1380033480.95  INFO [Thread-154] 10:38:00,882 Starting repair command #X, 
 repairing X ranges for keyspace XXX
 This suggests that the repair command was received by Cassandra but the 
 connection then failed and nodetool didn't receive a response.
 Obviously, running repair on a single-node cluster is pointless but it's the 
 easiest way to demonstrate this problem. The customer who reported this has 
 also seen the issue on his real multi-node cluster.
 Steps to reproduce:
 Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 
 (Cassandra 1.2.6+patches).  I was unable to reproduce on my Mac using the 
 same version, and subsequent attempts to reproduce it on the AMI were 
 unsuccessful. The customer says he is able is able to reliably reproduce on 
 his Mac using DSE 3.1.3 and occasionally reproduce it on his real cluster. 
 1) Deploy an AMI using the DataStax AMI at 
 https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2
 2) Create a test keyspace
 {code}
 create keyspace test WITH replication = {'class': 'SimpleStrategy', 
 'replication_factor': 1};
 {code}
 3) Run an endless loop that runs nodetool repair repeatedly:
 {code}
 while true; do nodetool repair -pr test; done
 {code}
 4) Wait until repair hangs. It may take many tries; the behavior is random.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (CASSANDRA-6104) Add additional limits in cassandra.conf provided by Debian package

2013-09-26 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-6104:


 Summary: Add additional limits in cassandra.conf provided by 
Debian package
 Key: CASSANDRA-6104
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6104
 Project: Cassandra
  Issue Type: Bug
  Components: Packaging
Reporter: J.B. Langston
Priority: Trivial


/etc/security/limits.d/cassandra.conf distributed with DSC deb/rpm packages 
should contain additional settings. We have found these limits to be necessary 
for some customers through various support tickets.

{code}
cassandra - memlock  unlimited
cassandra - nofile  10
cassandra - nproc 32768
cassandra - as unlimited
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-09-25 Thread J.B. Langston (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.B. Langston updated CASSANDRA-6097:
-

Description: 
nodetool repair randomly hangs. This is not the same issue where repair hangs 
if a stream is disrupted. This can be reproduced on a single-node cluster where 
no streaming takes place, so I think this may be a JMX connection or timeout 
issue. Thread dumps show that nodetool is waiting on a JMX response and there 
are no repair-related threads running in Cassandra. Nodetool main thread 
waiting for JMX response:

{code}
main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
at java.lang.Object.wait(Object.java:485)
at 
org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34)
- locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
at 
org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976)
at 
org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221)
at 
org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213)
{code}

When nodetool hangs, it does not print out the following message:

Starting repair command #XX, repairing 1 ranges for keyspace XXX

However, Cassandra logs that repair in system.log:

1380033480.95  INFO [Thread-154] 10:38:00,882 Starting repair command #X, 
repairing X ranges for keyspace XXX

This suggests that the repair command was received by Cassandra but the 
connection then failed and nodetool didn't receive a response.

Obviously, running repair on a single-node cluster is pointless but it's the 
easiest way to demonstrate this problem. The customer who reported this has 
also seen the issue on his real multi-node cluster.

Steps to reproduce:

Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 
(Cassandra 1.2.6+patches).  I was unable to reproduce on my Mac using the same 
version, and subsequent attempts to reproduce it on the AMI were unsuccessful. 
The customer says he is able is able to reliably reproduce on his Mac using DSE 
3.1.3 and occasionally reproduce it on his real cluster. 

1) Deploy an AMI using the DataStax AMI at 
https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2

2) Create a test keyspace
{code}
create keyspace test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};
{code}
3) Run an endless loop that runs nodetool repair repeatedly:

{code}
while true; do nodetool repair -pr test; done
{code}

4) Wait until repair hangs. It may take many tries; the behavior is random.

  was:
nodetool repair randomly hangs. This is not the same issue where repair hangs 
if a stream is disrupted. This can be reproduced on a single-node cluster where 
no streaming takes place, so I think this may be a JMX connection or timeout 
issue. Thread dumps show that nodetool is waiting on a JMX response and there 
are no repair-related threads running in Cassandra. Nodetool main thread 
waiting for JMX response:

{code}
main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
at java.lang.Object.wait(Object.java:485)
at 
org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34)
- locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
at 
org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976)
at 
org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221)
at 
org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213)
{code}

When nodetool hangs, it does not print out the following message:

Starting repair command #XX, repairing 1 ranges for keyspace XXX

However, Cassandra logs that repair in system.log:

1380033480.95  INFO [Thread-154] 10:38:00,882 Starting repair command #X, 
repairing X ranges for keyspace XXX

This suggests that the repair command was received by Cassandra but the 
connection then failed and nodetool didn't receive a response.

Obviously, running repair on a single-node cluster is pointless but it's the 
easiest way to demonstrate this problem. The customer who reported this has 
also seen the issue on his real multi-node cluster.

Steps to reproduce:

Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 
(Cassandra 1.2.6+patches).  I was unable to reproduce on my Mac using the same 
version, and subsequent

[jira] [Created] (CASSANDRA-6097) nodetool repair randomly hangs.

2013-09-25 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-6097:


 Summary: nodetool repair randomly hangs.
 Key: CASSANDRA-6097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6097
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: DataStax AMI
Reporter: J.B. Langston


nodetool repair randomly hangs. This is not the same issue where repair hangs 
if a stream is disrupted. This can be reproduced on a single-node cluster where 
no streaming takes place, so I think this may be a JMX connection or timeout 
issue. Thread dumps show that nodetool is waiting on a JMX response and there 
are no repair-related threads running in Cassandra. Nodetool main thread 
waiting for JMX response:

{code}
main prio=5 tid=7ffa4b001800 nid=0x10aedf000 in Object.wait() [10aede000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
at java.lang.Object.wait(Object.java:485)
at 
org.apache.cassandra.utils.SimpleCondition.await(SimpleCondition.java:34)
- locked 7f90d62e8 (a org.apache.cassandra.utils.SimpleCondition)
at 
org.apache.cassandra.tools.RepairRunner.repairAndWait(NodeProbe.java:976)
at 
org.apache.cassandra.tools.NodeProbe.forceRepairAsync(NodeProbe.java:221)
at 
org.apache.cassandra.tools.NodeCmd.optionalKSandCFs(NodeCmd.java:1444)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1213)
{code}

When nodetool hangs, it does not print out the following message:

Starting repair command #XX, repairing 1 ranges for keyspace XXX

However, Cassandra logs that repair in system.log:

1380033480.95  INFO [Thread-154] 10:38:00,882 Starting repair command #X, 
repairing X ranges for keyspace XXX

This suggests that the repair command was received by Cassandra but the 
connection then failed and nodetool didn't receive a response.

Obviously, running repair on a single-node cluster is pointless but it's the 
easiest way to demonstrate this problem. The customer who reported this has 
also seen the issue on his real multi-node cluster.

Steps to reproduce:

Note: I reproduced this once on the official DataStax AMI with DSE 3.1.3 
(Cassandra 1.2.6+patches).  I was unable to reproduce on my Mac using the same 
version, and subsequent attempts to reproduce it on the AMI were unsuccessful. 
The customer says he is able is able to reliably reproduce on his Mac using DSE 
3.1.3 and occasionally reproduce it on his real cluster. 

1) Deploy an AMI using the DataStax AMI at 
https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2

2) Create a test keyspace

create keyspace test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};

3) Run an endless loop that runs nodetool repair repeatedly:

while true; do nodetool repair -pr test; done

4) Wait until repair hangs. It may take hundreds or thousands of tries; the 
behavior is random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-6047) Memory leak when using snapshot repairs

2013-09-17 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-6047:

Summary: Memory leak when using snapshot repairs
Key: CASSANDRA-6047
URL: https://issues.apache.org/jira/browse/CASSANDRA-6047
Project: Cassandra
Issue Type: Bug
Reporter: J.B. Langston

Running nodetool repair repeatedly with the -snapshot parameter results in a
native memory leak. The JVM process will take up more and more physical memory
until it is killed by the Linux OOM killer.

The command used was as follows:

nodetool repair keyspace -local -snapshot -pr -st start_token -et end_token

Removing the -snapshot flag prevented the memory leak. The subrange repair
necessitated multiple repairs, so it made the problem noticeable, but I believe
the problem would be reproducible even if you ran repair repeatedly without
specifying a start and end token.

Notes from [~yukim]:

Probably the cause is too many snapshots. Snapshot sstables are opened during
validation, but memories used are freed when releaseReferences called. But
since snapshots never get marked compacted, memories never freed.

We only cleanup mmap'd memories when sstable is mark compacted.
https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/io/sstable/SSTableReader.java#L974

Validation compaction never marks snapshots compacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing

2013-08-30 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

J.B. Langston updated CASSANDRA-5958:
-

Description:
When an unexpected property is present in cassandra.yaml (e.g. after
upgrading), snakeyaml outputs the following message:

{code}Unable to find property 'some_property' on class:
org.apache.cassandra.config.Config{code}

The error message is kind of counterintuitive because at first glance it seems
to suggest the property is missing from the yaml file, when in fact the error
is caused by the *presence* of an unrecognized property. I know if you read it
carefully it says it can't find the property on the class, but this has
confused more than one user.

I think we should catch this exception and wrap it in another exception that
says something like this:

{code}Please remove 'some_property' from your cassandra.yaml. It is not
recognized by this version of Cassandra.{code}

was:
When an unexpected property is present in cassandra.yaml (e.g. after
upgrading), snakeyaml outputs the following message:

Unable to find property 'some_property' on class:
org.apache.cassandra.config.Config

I think we catch this exception and wrap it in another exception that says
something like this:

Please remove 'some_property' from your cassandra.yaml. It is not recognized by
this version of Cassandra.

Unable to find property errors from snakeyaml are confusing
-

Key: CASSANDRA-5958
URL: https://issues.apache.org/jira/browse/CASSANDRA-5958
Project: Cassandra
Issue Type: Bug
Reporter: J.B. Langston
Priority: Minor

When an unexpected property is present in cassandra.yaml (e.g. after
upgrading), snakeyaml outputs the following message:
{code}Unable to find property 'some_property' on class:
org.apache.cassandra.config.Config{code}
The error message is kind of counterintuitive because at first glance it
seems to suggest the property is missing from the yaml file, when in fact the
error is caused by the *presence* of an unrecognized property. I know if you
read it carefully it says it can't find the property on the class, but this
has confused more than one user.
I think we should catch this exception and wrap it in another exception that
says something like this:
{code}Please remove 'some_property' from your cassandra.yaml. It is not
recognized by this version of Cassandra.{code}

[jira] [Updated] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing

2013-08-30 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

J.B. Langston updated CASSANDRA-5958:
-

Description:
When an unexpected property is present in cassandra.yaml (e.g. after
upgrading), snakeyaml outputs the following message:

{code}Unable to find property 'some_property' on class:
org.apache.cassandra.config.Config{code}

I think we should catch this exception and wrap it in another exception that
says something like this:

{code}Please remove 'some_property' from your cassandra.yaml. It is not
recognized by this version of Cassandra.{code}

Also, it might make sense to make this a warning instead of a fatal error, and
just ignore the unwanted property.

was:
When an unexpected property is present in cassandra.yaml (e.g. after
upgrading), snakeyaml outputs the following message:

{code}Unable to find property 'some_property' on class:
org.apache.cassandra.config.Config{code}

I think we should catch this exception and wrap it in another exception that
says something like this:

{code}Please remove 'some_property' from your cassandra.yaml. It is not
recognized by this version of Cassandra.{code}

Unable to find property errors from snakeyaml are confusing
-

Key: CASSANDRA-5958
URL: https://issues.apache.org/jira/browse/CASSANDRA-5958
Project: Cassandra
Issue Type: Bug
Reporter: J.B. Langston
Priority: Minor

[jira] [Commented] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing

2013-08-30 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754836#comment-13754836
]

J.B. Langston commented on CASSANDRA-5958:
--

1.2 and prior

Unable to find property errors from snakeyaml are confusing
-

Key: CASSANDRA-5958
URL: https://issues.apache.org/jira/browse/CASSANDRA-5958
Project: Cassandra
Issue Type: Bug
Reporter: J.B. Langston
Priority: Minor

[jira] [Commented] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing

2013-08-30 Thread J.B. Langston (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754928#comment-13754928
]

J.B. Langston commented on CASSANDRA-5958:
--

I just tested with 2.0.0-rc2 and the message is the same as before.

Unable to find property errors from snakeyaml are confusing
-

Key: CASSANDRA-5958
URL: https://issues.apache.org/jira/browse/CASSANDRA-5958
Project: Cassandra
Issue Type: Bug
Reporter: J.B. Langston
Priority: Minor

[jira] [Created] (CASSANDRA-5958) Unable to find property errors from snakeyaml are confusing

2013-08-30 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-5958:


 Summary: Unable to find property errors from snakeyaml are 
confusing
 Key: CASSANDRA-5958
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5958
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston
Priority: Minor


When an unexpected property is present in cassandra.yaml (e.g. after 
upgrading), snakeyaml outputs the following message:

Unable to find property 'some_property' on class: 
org.apache.cassandra.config.Config

The error message is kind of counterintuitive because at first glance it seems 
to suggest the property is missing from the yaml file, when in fact the error 
is caused by the *presence* of an unrecognized property.  I know if you read it 
carefully it says it can't find the property on the class, but this has 
confused more than one user.

I think we catch this exception and wrap it in another exception that says 
something like this:

Please remove 'some_property' from your cassandra.yaml. It is not recognized by 
this version of Cassandra.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-5947) Sampling bug in metrics-core-2.0.3.jar used by Cassandra

2013-08-27 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-5947:


 Summary: Sampling bug in metrics-core-2.0.3.jar used by Cassandra
 Key: CASSANDRA-5947
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5947
 Project: Cassandra
  Issue Type: Bug
Reporter: J.B. Langston


There is a sampling bug in the version of the metrics library we're using in 
Cassandra. See https://github.com/codahale/metrics/issues/421. 
ExponentiallyDecayingSample is used by the Timer's histogram that is used in 
stress tool, and according to [~brandon.williams] it is also in a few other 
places like the dynamic snitch. The statistical theory involved in this bug 
goes over my head so i'm not sure if this would bug would meaningfully affect 
its usage by Cassandra.  One of the comments on the bug mentions that it 
affects slow sampling rates (10 samples/min was the example given).  We're 
currently distributing metrics-core-2.0.3.jar and according to the release 
nodes, this bug is fixed in 2.1.3: 
http://metrics.codahale.com/about/release-notes/#v2-1-3-aug-06-2012

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-5911) Commit logs are not removed after nodetool flush or nodetool drain

2013-08-21 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-5911:


 Summary: Commit logs are not removed after nodetool flush or 
nodetool drain
 Key: CASSANDRA-5911
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5911
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: J.B. Langston
Priority: Minor


Commit logs are not removed after nodetool flush or nodetool drain. This can 
lead to unnecessary commit log replay during startup.  I've reproduced this on 
Apache Cassandra 1.2.8.  Usually this isn't much of an issue but on a 
Solr-indexed column family in DSE, each replayed mutation has to be reindexed 
which can make startup take a long time (on the order of 20-30 min).

Reproduction follows:

{code}
jblangston:bin jblangston$ ./cassandra  /dev/null
jblangston:bin jblangston$ ../tools/bin/cassandra-stress -n 2000  /dev/null
jblangston:bin jblangston$ du -h ../commitlog
576M../commitlog
jblangston:bin jblangston$ nodetool flush
jblangston:bin jblangston$ du -h ../commitlog
576M../commitlog
jblangston:bin jblangston$ nodetool drain
jblangston:bin jblangston$ du -h ../commitlog
576M../commitlog
jblangston:bin jblangston$ pkill java
jblangston:bin jblangston$ du -h ../commitlog
576M../commitlog
jblangston:bin jblangston$ ./cassandra -f | grep Replaying
 INFO 10:03:42,915 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log, 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log
 INFO 10:03:42,922 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566761.log
 INFO 10:03:43,907 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566762.log
 INFO 10:03:43,907 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566763.log
 INFO 10:03:43,907 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566764.log
 INFO 10:03:43,908 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566765.log
 INFO 10:03:43,908 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566766.log
 INFO 10:03:43,908 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566767.log
 INFO 10:03:43,909 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566768.log
 INFO 10:03:43,909 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566769.log
 INFO 10:03:43,909 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566770.log
 INFO 10:03:43,910 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566771.log
 INFO 10:03:43,910 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566772.log
 INFO 10:03:43,911 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566773.log
 INFO 10:03:43,911 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566774.log
 INFO 10:03:43,911 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566775.log
 INFO 10:03:43,912 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566776.log
 INFO 10:03:43,912 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566777.log
 INFO 10:03:43,912 Replaying 
/opt/apache-cassandra-1.2.8/commitlog/CommitLog-2-1377096566778.log
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-5900) Setting bloom filter fp chance to 1.0 causes ClassCastExceptions

2013-08-19 Thread J.B. Langston (JIRA)

J.B. Langston created CASSANDRA-5900:


 Summary: Setting bloom filter fp chance to 1.0 causes 
ClassCastExceptions
 Key: CASSANDRA-5900
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5900
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: J.B. Langston


In 1.2, we introduced the ability to turn SSTables off completely by setting fp 
chance to 1.0.  It looks like there is a bug with this though. When it's set to 
one the following errors occur because AlwaysPresentFilter is not present in 
the switch statement here at 
https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/utils/FilterFactory.java#L91,
 and we default to Murmur3BloomFilter for an unknown type.

Exception in thread main java.lang.ClassCastException: 
org.apache.cassandra.utils.AlwaysPresentFilter cannot be cast to 
org.apache.cassandra.utils.Murmur3BloomFilter
at 
org.apache.cassandra.utils.FilterFactory.serializedSize(FilterFactory.java:91)
at 
org.apache.cassandra.io.sstable.SSTableReader.getBloomFilterSerializedSize(SSTableReader.java:531)
at 
org.apache.cassandra.metrics.ColumnFamilyMetrics$15.value(ColumnFamilyMetrics.java:273)
at 
org.apache.cassandra.metrics.ColumnFamilyMetrics$15.value(ColumnFamilyMetrics.java:268)
at 
org.apache.cassandra.db.ColumnFamilyStore.getBloomFilterDiskSpaceUsed(ColumnFamilyStore.java:1825)



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 132 matches

Mail list logo