[jira] [Commented] (CASSANDRA-8430) Updating a row that has a TTL produce unexpected results

2014-12-10 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240804#comment-14240804
 ] 

Sylvain Lebresne commented on CASSANDRA-8430:
-

bq. Now at first the row has 'abc', 2, 'whatever', then after the update it has 
'abc', 0, 'whatever'.

If that was the case, that would be a bug, setting foo to {{null}} is not 
equivalent to setting it to {{0}}. But  if you say use the java driver and do a 
{{getInt()}} to fetch the value of {{foo}}, that method will return {{0}} for a 
{{null}} value because it return unboxed values (there's a {{isNull()}} method 
to check if it's actually {{null}}). So make sure this is not just what you're 
running into.

bq. It seems there's a difference between insert and update

There is one. An insert kind of set the primary key columns on their own, while 
update doesn't, which means that after an insert a row will continue to exist 
even if you remove all non-PK columns, while it won't with an update. Which is 
exactly what you're observing.

Now, none of that are bugs of Cassandra, so if you have further questions on 
behaviors, would you mind moving the conversation to the user mailing list (if 
only because answers might be profitable to more people there)?

 Updating a row that has a TTL produce unexpected results
 

 Key: CASSANDRA-8430
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8430
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
  Labels: cassandra, ttl
 Fix For: 2.0.11, 2.1.2, 3.0

 Attachments: test.sh


 Reported on stackoverflow: 
 http://stackoverflow.com/questions/27280407/cassandra-ttl-gets-set-to-0-on-primary-key-if-no-ttl-is-specified-on-an-update?newreg=19e8c6757c62474985fef7c3037e8c08
 I can reproduce the issue with 2.0, 2.1 and trunk. I've attached a small 
 script to reproduce the issue with CCM, and here is it's output:
 {code}
 aboudreault@kovarro:~/dev/cstar/so27280407$ ./test.sh 
 Current cluster is now: local
 Insert data with a 5 sec TTL
 INSERT INTO ks.tbl (pk, foo, bar) values (1, 1, 'test') using TTL 5;
  pk | bar  | foo
 +--+-
   1 | test |   1
 (1 rows)
 Update data with no TTL
 UPDATE ks.tbl set bar='change' where pk=1;
 sleep 6 sec
 BUG: Row should be deleted now, but isn't. and foo column has been deleted???
  pk | bar| foo
 ++--
   1 | change | null
 (1 rows)
 Insert data with a 5 sec TTL
 INSERT INTO ks.tbl (pk, foo, bar) values (1, 1, 'test') using TTL 5;
  pk | bar  | foo
 +--+-
   1 | test |   1
 (1 rows)
 Update data with a higher (10) TTL
 UPDATE ks.tbl USING TTL 10 set bar='change' where pk=1;
 sleep 6 sec
 BUG: foo column has been deleted?
  pk | bar| foo
 ++--
   1 | change | null
 (1 rows)
 sleep 5 sec
 Data is deleted now after the second TTL set during the update. Is this a bug 
 or the expected behavior?
 (0 rows)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Alexander Radzin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240844#comment-14240844
 ] 

Alexander Radzin commented on CASSANDRA-8390:
-

I have the same issue with Windows 8. Here is the DiskAccessMode line that I 
found in system.log of cassandra:

{noformat}
INFO  [main] 2014-12-09 16:07:25,985 DatabaseDescriptor.java:203 - 
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
{noformat}

I have several lines like this. 

Here are the conditions that make this problem to be reproduced. 
Our application creates new keyspace every day. The keyspace contains about 60 
tables. The issue happened on production relatively seldom, however it happens 
in testing environment all the time because each test case creates keyspace 
again. I guess that the problem is not specifically in creating keyspace and 
tables because sometimes the problem happens when trying to run {{truncate}}. 

Cassandra DB is running using default settings. The client code looks like the 
following:

{noformat}
Cluster cluster = 
Cluster.builder().addContactPoint(localhost).build();
Session session = cluster.connect();
String year = 2013;
for (int i = 1; i = 12; i++) {
String yearMonth = year + i;
for (String template : cql.split(\\n)) {
String query = String.format(template, 
yearMonth);
System.out.println(query);
session.execute(query);
}
}
{noformat}
Where {{cql}} contains  {{create keyspace}} and a lot of {{create table}} 
statements. 

Interesting fact is that problem _does not appear_ when using asynchronous call:

{noformat}
CollectionResultSetFuture futures = new ArrayList();

Cluster cluster = 
Cluster.builder().addContactPoint(localhost).build();
Session session = cluster.connect();
String year = 2013;
for (int i = 1; i = 1200; i++) {
String yearMonth = year + i;
for (String template : cql.split(\\n)) {
String query = String.format(template, 
yearMonth);
System.out.println(query);
ResultSetFuture future = 
session.executeAsync(query);
futures.add(future);
}
}

Futures.successfulAsList(futures);
{noformat}

Although this can be a temporary workaround I will try to use the problem 
itself is IMHO extremely critical. 

Full source code can be found 
[here|https://gist.github.com/alexradzin/9223fc16e95318e017ec].


 The process cannot access the file because it is being used by another process
 --

 Key: CASSANDRA-8390
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390
 Project: Cassandra
  Issue Type: Bug
Reporter: Ilya Komolkin
Assignee: Joshua McKenzie
 Fix For: 2.1.3


 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - 
 Exception in thread Thread[NonPeriodicTasks:1,5,main]
 org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94)
  ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_71]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  ~[na:1.7.0_71]
 

[jira] [Comment Edited] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Alexander Radzin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240844#comment-14240844
 ] 

Alexander Radzin edited comment on CASSANDRA-8390 at 12/10/14 9:26 AM:
---

I have the same issue with Windows 8. Here is the DiskAccessMode line that I 
found in system.log of cassandra:

{noformat}
INFO  [main] 2014-12-09 16:07:25,985 DatabaseDescriptor.java:203 - 
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
{noformat}

I have several lines like this. 

Important: when this happens client gets {{NoHostAvailableException}} and stops 
working that requires restart of cassandra.

{noformat}
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (no host was tried)
at 
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at 
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)
at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175)
at 
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at 
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
at 
com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest1.cqlSync(CQLTest1.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
host(s) tried for query failed (no host was tried)
at 
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102)
at 
com.datastax.driver.core.SessionManager.execute(SessionManager.java:461)
at 
com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:497)
at 
com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:87)
... 34 more
{noformat}


Here are the conditions that make this problem to be reproduced. 
Our application creates new keyspace every day. The keyspace contains about 60 
tables. The issue happened on production relatively seldom, however it happens 
in testing environment all the time because each test case creates keyspace 
again. I guess that the problem is not specifically in creating keyspace and 
tables because sometimes the problem happens when trying to run {{truncate}}. 

Cassandra DB is running using default settings. The client code looks like the 
following:

{noformat}
Cluster cluster = 
Cluster.builder().addContactPoint(localhost).build();
Session session = 

[jira] [Updated] (CASSANDRA-7947) Change error message when RR times out

2014-12-10 Thread Sam Tunnicliffe (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-7947:
---
Attachment: 7947.txt

Attaching a simple patch which correctly sets the CL on the thrown exception. 
Looks like this is only an issue when read requests following a digest mismatch 
time out, if we timeout waiting on acks for the subsequent mutations, the 
correct CL is communicated.

 Change error message when RR times out
 --

 Key: CASSANDRA-7947
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7947
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Sam Tunnicliffe
Priority: Minor
 Fix For: 2.0.12

 Attachments: 7947.txt


 When a quorum request detects a checksum mismatch, it then reads the data to 
 repair the mismatch by issuing a request at CL.ALL to the same endpoints  
 (SP.fetchRows)  If this request in turn times out, this delivers a TOE to the 
 client with a misleading message that mentions CL.ALL, possibly causing them 
 to think the request has gone cross-DC when it has not, it was just slow due 
 to timing out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Alexander Radzin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240844#comment-14240844
 ] 

Alexander Radzin edited comment on CASSANDRA-8390 at 12/10/14 10:29 AM:


I have the same issue with Windows 8. Here is the DiskAccessMode line that I 
found in system.log of cassandra:

{noformat}
INFO  [main] 2014-12-09 16:07:25,985 DatabaseDescriptor.java:203 - 
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
{noformat}

I have several lines like this. 

Important: when this happens client gets {{NoHostAvailableException}} and stops 
working that requires restart of cassandra.

{noformat}
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (no host was tried)
at 
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at 
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)
at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175)
at 
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at 
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
at 
com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest1.cqlSync(CQLTest1.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
host(s) tried for query failed (no host was tried)
at 
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102)
at 
com.datastax.driver.core.SessionManager.execute(SessionManager.java:461)
at 
com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:497)
at 
com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:87)
... 34 more
{noformat}


Here are the conditions that make this problem to be reproduced. 
Our application creates new keyspace every day. The keyspace contains about 60 
tables. The issue happened on production relatively seldom, however it happens 
in testing environment all the time because each test case creates keyspace 
again. I guess that the problem is not specifically in creating keyspace and 
tables because sometimes the problem happens when trying to run {{truncate}}. 

Cassandra DB is running using default settings. The client code looks like the 
following:

{noformat}
Cluster cluster = 
Cluster.builder().addContactPoint(localhost).build();
Session session = 

[jira] [Created] (CASSANDRA-8453) Ability to override TTL on different data-centers, plus one-way replication

2014-12-10 Thread Jacques-Henri Berthemet (JIRA)
Jacques-Henri Berthemet created CASSANDRA-8453:
--

 Summary: Ability to override TTL on different data-centers, plus 
one-way replication
 Key: CASSANDRA-8453
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8453
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Jacques-Henri Berthemet


Here is my scenario:
I want to have one datacenter specialized for operations DCO and another for 
historical/audit DCH. Replication will be used between DCO and DCH.

When TTL expires on DCO and data is deleted I'd like the data on DCH to be kept 
for other purposes. Ideally a different TTL could be set in DCH.

I guess this also implies that replication should be done only in DCO = DCH 
direction so that data is not re-created. But that's secondary, DCH data is not 
meant to be modified.

Is this kind of feature feasible for future versions of Cassandra? If not, 
would you have some pointers to modify Cassandra in order to achieve this 
functionality?

Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7947) Change error message when RR times out

2014-12-10 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-7947:
-
Reviewer: Aleksey Yeschenko

 Change error message when RR times out
 --

 Key: CASSANDRA-7947
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7947
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Sam Tunnicliffe
Priority: Minor
 Fix For: 2.0.12

 Attachments: 7947.txt


 When a quorum request detects a checksum mismatch, it then reads the data to 
 repair the mismatch by issuing a request at CL.ALL to the same endpoints  
 (SP.fetchRows)  If this request in turn times out, this delivers a TOE to the 
 client with a misleading message that mentions CL.ALL, possibly causing them 
 to think the request has gone cross-DC when it has not, it was just slow due 
 to timing out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8453) Ability to override TTL on different data-centers, plus one-way replication

2014-12-10 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240973#comment-14240973
 ] 

Aleksey Yeschenko commented on CASSANDRA-8453:
--

Not feasible, sorry. Goes against core Cassandra principles.

You could create two separate keyspaces for this data, and write to both. With 
TTL to one of them, without TTL to the other one. Maybe have replication factor 
0 for one of the DCs. That's as close as you are going to get, I'm afraid.

 Ability to override TTL on different data-centers, plus one-way replication
 ---

 Key: CASSANDRA-8453
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8453
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Jacques-Henri Berthemet

 Here is my scenario:
 I want to have one datacenter specialized for operations DCO and another 
 for historical/audit DCH. Replication will be used between DCO and DCH.
 When TTL expires on DCO and data is deleted I'd like the data on DCH to be 
 kept for other purposes. Ideally a different TTL could be set in DCH.
 I guess this also implies that replication should be done only in DCO = DCH 
 direction so that data is not re-created. But that's secondary, DCH data is 
 not meant to be modified.
 Is this kind of feature feasible for future versions of Cassandra? If not, 
 would you have some pointers to modify Cassandra in order to achieve this 
 functionality?
 Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again

2014-12-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240975#comment-14240975
 ] 

Benedict commented on CASSANDRA-8449:
-

Unless we explicitly force all queries to yield a timeout response even if they 
have successfully terminated after the timeout, and we enforce this constraint 
_after_ copying the data to the output buffers (netty and thrift), this is 
guaranteed to return junk data to a user somewhere, sometime. So I am -1 on 
this approach.

 Allow zero-copy reads again
 ---

 Key: CASSANDRA-8449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 3.0


 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads 
 accessing a ByteBuffer when the data was unmapped by compaction.  Currently 
 this code path is only used for uncompressed reads.
 The actual bytes are in fact copied to the client output buffers for both 
 netty and thrift before being sent over the wire, so the only issue really is 
 the time it takes to process the read internally.  
 This patch adds a slow network read test and changes the tidy() method to 
 actually delete a sstable once the readTimeout has elapsed giving plenty of 
 time to serialize the read.
 Removing this copy causes significantly less GC on the read path and improves 
 the tail latencies:
 http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again

2014-12-10 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240995#comment-14240995
 ] 

Aleksey Yeschenko commented on CASSANDRA-8449:
--

We will, but only once we have CASSANDRA-7392.

 Allow zero-copy reads again
 ---

 Key: CASSANDRA-8449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 3.0


 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads 
 accessing a ByteBuffer when the data was unmapped by compaction.  Currently 
 this code path is only used for uncompressed reads.
 The actual bytes are in fact copied to the client output buffers for both 
 netty and thrift before being sent over the wire, so the only issue really is 
 the time it takes to process the read internally.  
 This patch adds a slow network read test and changes the tidy() method to 
 actually delete a sstable once the readTimeout has elapsed giving plenty of 
 time to serialize the read.
 Removing this copy causes significantly less GC on the read path and improves 
 the tail latencies:
 http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again

2014-12-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241004#comment-14241004
 ] 

Benedict commented on CASSANDRA-8449:
-

Depending on how that is implemented. I will go out on a limb and predict it 
will offer no such guarantee, as there will always be a potential race 
condition (easily triggered by e.g. lengthy GC pauses) without enforcing the 
constraint _after_ performing the copy to the transport buffers, which is a 
very specific condition that I don't think is being considered for 
CASSANDRA-7392.

 Allow zero-copy reads again
 ---

 Key: CASSANDRA-8449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 3.0


 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads 
 accessing a ByteBuffer when the data was unmapped by compaction.  Currently 
 this code path is only used for uncompressed reads.
 The actual bytes are in fact copied to the client output buffers for both 
 netty and thrift before being sent over the wire, so the only issue really is 
 the time it takes to process the read internally.  
 This patch adds a slow network read test and changes the tidy() method to 
 actually delete a sstable once the readTimeout has elapsed giving plenty of 
 time to serialize the read.
 Removing this copy causes significantly less GC on the read path and improves 
 the tail latencies:
 http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again

2014-12-10 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241005#comment-14241005
 ] 

Aleksey Yeschenko commented on CASSANDRA-8449:
--

Fair enough.

 Allow zero-copy reads again
 ---

 Key: CASSANDRA-8449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 3.0


 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads 
 accessing a ByteBuffer when the data was unmapped by compaction.  Currently 
 this code path is only used for uncompressed reads.
 The actual bytes are in fact copied to the client output buffers for both 
 netty and thrift before being sent over the wire, so the only issue really is 
 the time it takes to process the read internally.  
 This patch adds a slow network read test and changes the tidy() method to 
 actually delete a sstable once the readTimeout has elapsed giving plenty of 
 time to serialize the read.
 Removing this copy causes significantly less GC on the read path and improves 
 the tail latencies:
 http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread jonathan lacefield (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jonathan lacefield updated CASSANDRA-8447:
--
Description: 
Behavior - If autocompaction is enabled, nodes will become unresponsive due to 
a full Old Gen heap which is not cleared during CMS GC.

Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
enabled on 1 node.  Executed different Cassandra stress loads, using write only 
operations.  Monitored visualvm and jconsole for heap pressure.  Captured 
iostat and dstat for most tests.  Captured heap dump from 50 thread load.  
Hints were disabled for testing on all nodes to alleviate GC noise due to hints 
backing up.

Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
n=19 -rate threads=different threads tested -schema  
replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed

Data load thread count and results:
* 1 thread - Still running but looks like the node can sustain this load 
(approx 500 writes per second per node)
* 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS measured 
in the 60 second range (approx 2k writes per second per node)
* 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
measured in the 60 second range
* 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
measured in the 60 second range  (approx 10k writes per second per node)
* 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
measured in the 60 second range  (approx 20k writes per second per node)
* 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
measured in the 60 second range  (approx 25k writes per second per node)

Note - the observed behavior was the same for all tests except for the single 
threaded test.  The single threaded test does not appear to show this behavior.

Tested different GC and Linux OS settings with a focus on the 50 and 200 thread 
loads.  

JVM settings tested:
#  default, out of the box, env-sh settings
#  10 G Max | 1 G New - default env-sh settings
#  10 G Max | 1 G New - default env-sh settings
#* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
#   20 G Max | 10 G New 
   JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
   JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
   JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
   JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
   JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
   JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
   JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
   JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
   JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
   JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
   JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
   JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
   JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
   JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
   JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
   JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
   JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
   JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
# 20 G Max | 1 G New 
   JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
   JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
   JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
   JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
   JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
   JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
   JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
   JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
   JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
   JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
   JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
   JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
   JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
   JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
   JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
   JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
   JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
   JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking

Linux OS settings tested:
# Disabled Transparent Huge Pages
echo never  /sys/kernel/mm/transparent_hugepage/enabled
echo never  /sys/kernel/mm/transparent_hugepage/defrag
# Enabled Huge Pages
echo 215  /proc/sys/kernel/shmmax (over 20GB for heap)
echo 1536  /proc/sys/vm/nr_hugepages (20GB/2MB page size)
# Disabled NUMA
numa-off in /etc/grub.confdatastax
# Verified all settings documented here were implemented
  
http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html

Attachments:
#  .yaml
#  fio output - results.tar.gz
#  50 thread heap dump - will update new heap dump soon
#  100 thread - visual vm anonymous screenshot - visualvm_screenshot
#  dstat screen shot of with compaction - Node_with_compaction.png
#  dstat screen shot of without compaction -- Node_without_compaction.png
#  gcinspector messages from system.log
# gc.log output 

[jira] [Commented] (CASSANDRA-8429) Stress on trunk fails mixed workload on missing keys

2014-12-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241076#comment-14241076
 ] 

Marcus Eriksson commented on CASSANDRA-8429:


Branch for this here: 
https://github.com/krummas/cassandra/commit/b23b7b3c2e5a800fefd86b0427dcffe3d1c7efb1

Approach is to close the finished files (but keep them as .tmp), make tmplinks 
from those closed files and open sstablereaders over the tmplink files. Then, 
when we actually finish, we rename the tmp files to final files and open 
readers over those.

 Stress on trunk fails mixed workload on missing keys
 

 Key: CASSANDRA-8429
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8429
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 14.04
Reporter: Ariel Weisberg
Assignee: Marcus Eriksson
 Attachments: cluster.conf, run_stress.sh


 Starts as part of merge commit 25be46497a8df46f05ffa102bc645bfd684ea48a
 Stress will say that a key wasn't validated because it isn't returned even 
 though it's loaded. The key will eventually appear and can be queried using 
 cqlsh.
 Reproduce with
 #!/bin/sh
 ROWCOUNT=1000
 SCHEMA='-col n=fixed(1) -schema 
 compaction(strategy=LeveledCompactionStrategy) compression=LZ4Compressor'
 ./cassandra-stress write n=$ROWCOUNT -node xh61 -pop seq=1..$ROWCOUNT no-wrap 
 -rate threads=25 $SCHEMA
 ./cassandra-stress mixed ratio(read=2) n=1 -node xh61 -pop 
 dist=extreme(1..$ROWCOUNT,0.6) -rate threads=25 $SCHEMA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8429) Stress on trunk fails mixed workload on missing keys

2014-12-10 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8429:
---
 Reviewer: Benedict
Since Version: 2.1.2

could you review [~benedict]?

 Stress on trunk fails mixed workload on missing keys
 

 Key: CASSANDRA-8429
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8429
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 14.04
Reporter: Ariel Weisberg
Assignee: Marcus Eriksson
 Attachments: cluster.conf, run_stress.sh


 Starts as part of merge commit 25be46497a8df46f05ffa102bc645bfd684ea48a
 Stress will say that a key wasn't validated because it isn't returned even 
 though it's loaded. The key will eventually appear and can be queried using 
 cqlsh.
 Reproduce with
 #!/bin/sh
 ROWCOUNT=1000
 SCHEMA='-col n=fixed(1) -schema 
 compaction(strategy=LeveledCompactionStrategy) compression=LZ4Compressor'
 ./cassandra-stress write n=$ROWCOUNT -node xh61 -pop seq=1..$ROWCOUNT no-wrap 
 -rate threads=25 $SCHEMA
 ./cassandra-stress mixed ratio(read=2) n=1 -node xh61 -pop 
 dist=extreme(1..$ROWCOUNT,0.6) -rate threads=25 $SCHEMA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8452) Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows

2014-12-10 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241078#comment-14241078
 ] 

Joshua McKenzie commented on CASSANDRA-8452:


Check out v2 on CASSANDRA-6993 as well as Benedict's comment about it.  We 
should probably change the call to isPosixCompliant and compute and store the 
boolean at static init time and just reference that rather than strcmp every 
time we want to check it.

 Add missing systems to FBUtilities.isUnix, add FBUtilities.isWindows
 

 Key: CASSANDRA-8452
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8452
 Project: Cassandra
  Issue Type: Bug
Reporter: Blake Eggleston
Assignee: Blake Eggleston
Priority: Minor
 Fix For: 2.1.3

 Attachments: CASSANDRA-8452.patch


 The isUnix method leaves out a few unix systems, which, after the changes in 
 CASSANDRA-8136, causes some unexpected behavior during shutdown. It would 
 also be clearer if FBUtilities had an isWindows method for branching into 
 Windows specific logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241081#comment-14241081
 ] 

Joshua McKenzie commented on CASSANDRA-8390:


[~alexander_radzin]: have you tried disk_access_mode: standard in your test 
environment to see if it resolves this issue? (see CASSANDRA-6993)

 The process cannot access the file because it is being used by another process
 --

 Key: CASSANDRA-8390
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390
 Project: Cassandra
  Issue Type: Bug
Reporter: Ilya Komolkin
Assignee: Joshua McKenzie
 Fix For: 2.1.3


 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - 
 Exception in thread Thread[NonPeriodicTasks:1,5,main]
 org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94)
  ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_71]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
  ~[na:1.7.0_71]
 at 
 sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
  ~[na:1.7.0_71]
 at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 ... 11 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


cassandra git commit: Remove tmplink files for offline compactions

2014-12-10 Thread marcuse
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 d69728f8a - 29259cb22


Remove tmplink files for offline compactions

Patch by marcuse; reviewed by jmckenzie for CASSANDRA-8321


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/29259cb2
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/29259cb2
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/29259cb2

Branch: refs/heads/cassandra-2.1
Commit: 29259cb22c2ba02d5c2beba6c6512173f8b5b3f9
Parents: d69728f
Author: Marcus Eriksson marc...@apache.org
Authored: Tue Nov 25 11:12:20 2014 +0100
Committer: Marcus Eriksson marc...@apache.org
Committed: Wed Dec 10 14:46:44 2014 +0100

--
 CHANGES.txt |  1 +
 .../cassandra/io/sstable/SSTableRewriter.java   | 31 +--
 .../io/sstable/SSTableRewriterTest.java | 91 +++-
 3 files changed, 79 insertions(+), 44 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/29259cb2/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 3545afc..2e74a15 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.3
+ * Remove tmplink files for offline compactions (CASSANDRA-8321)
  * Reduce maxHintsInProgress (CASSANDRA-8415)
  * BTree updates may call provided update function twice (CASSANDRA-8018)
  * Release sstable references after anticompaction (CASSANDRA-8386)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/29259cb2/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
--
diff --git a/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java 
b/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
index d187e9d..f9d2fe4 100644
--- a/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
@@ -190,9 +190,15 @@ public class SSTableRewriter
 
 for (PairSSTableWriter, SSTableReader w : finishedWriters)
 {
-// we should close the bloom filter if we have not opened an sstable 
reader from this
-// writer (it will get closed when we release the sstable reference 
below):
+// we should close the bloom filter if we have not opened an 
sstable reader from this
+// writer (it will get closed when we release the sstable 
reference below):
 w.left.abort(w.right == null);
+if (isOffline  w.right != null)
+{
+// the pairs get removed from finishedWriters when they are 
closedAndOpened in finish(), the ones left need to be removed here:
+w.right.markObsolete();
+w.right.releaseReference();
+}
 }
 
 // also remove already completed SSTables
@@ -344,7 +350,15 @@ public class SSTableRewriter
 finished.add(newReader);
 
 if (w.right != null)
+{
 w.right.sharesBfWith(newReader);
+if (isOffline)
+{
+// remove the tmplink files if we are offline - no one 
is using them
+w.right.markObsolete();
+w.right.releaseReference();
+}
+}
 // w.right is the tmplink-reader we added when switching 
writer, replace with the real sstable.
 toReplace.add(Pair.create(w.right, newReader));
 }
@@ -356,11 +370,10 @@ public class SSTableRewriter
 it.remove();
 }
 
-for (PairSSTableReader, SSTableReader replace : toReplace)
-replaceEarlyOpenedFile(replace.left, replace.right);
-
 if (!isOffline)
 {
+for (PairSSTableReader, SSTableReader replace : toReplace)
+replaceEarlyOpenedFile(replace.left, replace.right);
 dataTracker.unmarkCompacting(finished);
 }
 return finished;
@@ -382,8 +395,16 @@ public class SSTableRewriter
 {
 SSTableReader newReader = w.left.closeAndOpenReader(maxAge);
 finished.add(newReader);
+
 if (w.right != null)
+{
 w.right.sharesBfWith(newReader);
+if (isOffline)
+{
+w.right.markObsolete();
+w.right.releaseReference();
+}
+}
 // w.right is the tmplink-reader we added when switching 
writer, replace with the real sstable.
 toReplace.add(Pair.create(w.right, newReader));
 }


[1/2] cassandra git commit: Remove tmplink files for offline compactions

2014-12-10 Thread marcuse
Repository: cassandra
Updated Branches:
  refs/heads/trunk 2240455f0 - c64ac4188


Remove tmplink files for offline compactions

Patch by marcuse; reviewed by jmckenzie for CASSANDRA-8321


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/29259cb2
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/29259cb2
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/29259cb2

Branch: refs/heads/trunk
Commit: 29259cb22c2ba02d5c2beba6c6512173f8b5b3f9
Parents: d69728f
Author: Marcus Eriksson marc...@apache.org
Authored: Tue Nov 25 11:12:20 2014 +0100
Committer: Marcus Eriksson marc...@apache.org
Committed: Wed Dec 10 14:46:44 2014 +0100

--
 CHANGES.txt |  1 +
 .../cassandra/io/sstable/SSTableRewriter.java   | 31 +--
 .../io/sstable/SSTableRewriterTest.java | 91 +++-
 3 files changed, 79 insertions(+), 44 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/29259cb2/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 3545afc..2e74a15 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.1.3
+ * Remove tmplink files for offline compactions (CASSANDRA-8321)
  * Reduce maxHintsInProgress (CASSANDRA-8415)
  * BTree updates may call provided update function twice (CASSANDRA-8018)
  * Release sstable references after anticompaction (CASSANDRA-8386)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/29259cb2/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
--
diff --git a/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java 
b/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
index d187e9d..f9d2fe4 100644
--- a/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
@@ -190,9 +190,15 @@ public class SSTableRewriter
 
 for (PairSSTableWriter, SSTableReader w : finishedWriters)
 {
-// we should close the bloom filter if we have not opened an sstable 
reader from this
-// writer (it will get closed when we release the sstable reference 
below):
+// we should close the bloom filter if we have not opened an 
sstable reader from this
+// writer (it will get closed when we release the sstable 
reference below):
 w.left.abort(w.right == null);
+if (isOffline  w.right != null)
+{
+// the pairs get removed from finishedWriters when they are 
closedAndOpened in finish(), the ones left need to be removed here:
+w.right.markObsolete();
+w.right.releaseReference();
+}
 }
 
 // also remove already completed SSTables
@@ -344,7 +350,15 @@ public class SSTableRewriter
 finished.add(newReader);
 
 if (w.right != null)
+{
 w.right.sharesBfWith(newReader);
+if (isOffline)
+{
+// remove the tmplink files if we are offline - no one 
is using them
+w.right.markObsolete();
+w.right.releaseReference();
+}
+}
 // w.right is the tmplink-reader we added when switching 
writer, replace with the real sstable.
 toReplace.add(Pair.create(w.right, newReader));
 }
@@ -356,11 +370,10 @@ public class SSTableRewriter
 it.remove();
 }
 
-for (PairSSTableReader, SSTableReader replace : toReplace)
-replaceEarlyOpenedFile(replace.left, replace.right);
-
 if (!isOffline)
 {
+for (PairSSTableReader, SSTableReader replace : toReplace)
+replaceEarlyOpenedFile(replace.left, replace.right);
 dataTracker.unmarkCompacting(finished);
 }
 return finished;
@@ -382,8 +395,16 @@ public class SSTableRewriter
 {
 SSTableReader newReader = w.left.closeAndOpenReader(maxAge);
 finished.add(newReader);
+
 if (w.right != null)
+{
 w.right.sharesBfWith(newReader);
+if (isOffline)
+{
+w.right.markObsolete();
+w.right.releaseReference();
+}
+}
 // w.right is the tmplink-reader we added when switching 
writer, replace with the real sstable.
 toReplace.add(Pair.create(w.right, newReader));
 }


[2/2] cassandra git commit: Merge branch 'cassandra-2.1' into trunk

2014-12-10 Thread marcuse
Merge branch 'cassandra-2.1' into trunk

Conflicts:
test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c64ac418
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c64ac418
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c64ac418

Branch: refs/heads/trunk
Commit: c64ac41884f328d0868baee31dbb7a6f685f22f8
Parents: 2240455 29259cb
Author: Marcus Eriksson marc...@apache.org
Authored: Wed Dec 10 14:51:34 2014 +0100
Committer: Marcus Eriksson marc...@apache.org
Committed: Wed Dec 10 14:51:34 2014 +0100

--
 CHANGES.txt |  1 +
 .../cassandra/io/sstable/SSTableRewriter.java   | 31 +--
 .../io/sstable/SSTableRewriterTest.java | 91 +++-
 3 files changed, 79 insertions(+), 44 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c64ac418/CHANGES.txt
--
diff --cc CHANGES.txt
index 1e1ec89,2e74a15..0029843
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,44 -1,5 +1,45 @@@
 +3.0
 + * Fix NPE in SelectStatement with empty IN values (CASSANDRA-8419)
 + * Refactor SelectStatement, return IN results in natural order instead
 +   of IN value list order (CASSANDRA-7981)
 + * Support UDTs, tuples, and collections in user-defined
 +   functions (CASSANDRA-7563)
 + * Fix aggregate fn results on empty selection, result column name,
 +   and cqlsh parsing (CASSANDRA-8229)
 + * Mark sstables as repaired after full repair (CASSANDRA-7586)
 + * Extend Descriptor to include a format value and refactor reader/writer 
apis (CASSANDRA-7443)
 + * Integrate JMH for microbenchmarks (CASSANDRA-8151)
 + * Keep sstable levels when bootstrapping (CASSANDRA-7460)
 + * Add Sigar library and perform basic OS settings check on startup 
(CASSANDRA-7838)
 + * Support for aggregation functions (CASSANDRA-4914)
 + * Remove cassandra-cli (CASSANDRA-7920)
 + * Accept dollar quoted strings in CQL (CASSANDRA-7769)
 + * Make assassinate a first class command (CASSANDRA-7935)
 + * Support IN clause on any clustering column (CASSANDRA-4762)
 + * Improve compaction logging (CASSANDRA-7818)
 + * Remove YamlFileNetworkTopologySnitch (CASSANDRA-7917)
 + * Do anticompaction in groups (CASSANDRA-6851)
 + * Support pure user-defined functions (CASSANDRA-7395, 7526, 7562, 7740, 
7781, 7929,
 +   7924, 7812, 8063, 7813)
 + * Permit configurable timestamps with cassandra-stress (CASSANDRA-7416)
 + * Move sstable RandomAccessReader to nio2, which allows using the
 +   FILE_SHARE_DELETE flag on Windows (CASSANDRA-4050)
 + * Remove CQL2 (CASSANDRA-5918)
 + * Add Thrift get_multi_slice call (CASSANDRA-6757)
 + * Optimize fetching multiple cells by name (CASSANDRA-6933)
 + * Allow compilation in java 8 (CASSANDRA-7028)
 + * Make incremental repair default (CASSANDRA-7250)
 + * Enable code coverage thru JaCoCo (CASSANDRA-7226)
 + * Switch external naming of 'column families' to 'tables' (CASSANDRA-4369) 
 + * Shorten SSTable path (CASSANDRA-6962)
 + * Use unsafe mutations for most unit tests (CASSANDRA-6969)
 + * Fix race condition during calculation of pending ranges (CASSANDRA-7390)
 + * Fail on very large batch sizes (CASSANDRA-8011)
 + * Improve concurrency of repair (CASSANDRA-6455, 8208)
 +
 +
  2.1.3
+  * Remove tmplink files for offline compactions (CASSANDRA-8321)
   * Reduce maxHintsInProgress (CASSANDRA-8415)
   * BTree updates may call provided update function twice (CASSANDRA-8018)
   * Release sstable references after anticompaction (CASSANDRA-8386)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/c64ac418/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
--

http://git-wip-us.apache.org/repos/asf/cassandra/blob/c64ac418/test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java
--
diff --cc test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java
index 11030f6,c0a017e..5eae831
--- a/test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java
@@@ -44,10 -43,8 +44,11 @@@ import org.apache.cassandra.db.compacti
  import org.apache.cassandra.db.compaction.ICompactionScanner;
  import org.apache.cassandra.db.compaction.LazilyCompactedRow;
  import org.apache.cassandra.db.compaction.OperationType;
 +import org.apache.cassandra.exceptions.ConfigurationException;
 +import org.apache.cassandra.io.sstable.format.SSTableReader;
 +import org.apache.cassandra.io.sstable.format.SSTableWriter;
 +import org.apache.cassandra.locator.SimpleStrategy;
+ import 

[jira] [Updated] (CASSANDRA-8417) Default base_time_seconds in DTCS is almost always too large

2014-12-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Björn Hegerfors updated CASSANDRA-8417:
---
Attachment: cassandra-trunk-CASSANDRA-8417-basetime60.txt

Sorry about the delayed response. I went with 1 minute. Indeed there's not much 
of a downside with a too small baseTime. A too big baseTime is much more 
damaging for performance. It's very much analogous to min_sstable_size in STCS, 
which would also be bad to set too high, while a very low value ought to not 
hurt performance much, compared to the ideal value (whatever that is). It will 
leave very small SSTable scattered for a while longer, which means that there 
might be more SSTables on disk. But these new SSTables are likely to be in disk 
cache anyway, right? So I'd go with 60 seconds as a better safe than sorry  
measure.

 Default base_time_seconds in DTCS is almost always too large
 

 Key: CASSANDRA-8417
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8417
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Björn Hegerfors
 Fix For: 2.0.12, 2.1.3

 Attachments: cassandra-trunk-CASSANDRA-8417-basetime60.txt


 One hour is a very long time to compact all new inserts together with any 
 reasonable volume at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8453) Ability to override TTL on different data-centers, plus one-way replication

2014-12-10 Thread Jacques-Henri Berthemet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241103#comment-14241103
 ] 

Jacques-Henri Berthemet commented on CASSANDRA-8453:


What if I write a custom AbstractReplicationStrategy (extending 
NetworkTopologyStrategy) that would reset TTL info from the writes received 
from a non-local DC?

 Ability to override TTL on different data-centers, plus one-way replication
 ---

 Key: CASSANDRA-8453
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8453
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Jacques-Henri Berthemet

 Here is my scenario:
 I want to have one datacenter specialized for operations DCO and another 
 for historical/audit DCH. Replication will be used between DCO and DCH.
 When TTL expires on DCO and data is deleted I'd like the data on DCH to be 
 kept for other purposes. Ideally a different TTL could be set in DCH.
 I guess this also implies that replication should be done only in DCO = DCH 
 direction so that data is not re-created. But that's secondary, DCH data is 
 not meant to be modified.
 Is this kind of feature feasible for future versions of Cassandra? If not, 
 would you have some pointers to modify Cassandra in order to achieve this 
 functionality?
 Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8453) Ability to override TTL on different data-centers, plus one-way replication

2014-12-10 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241108#comment-14241108
 ] 

Aleksey Yeschenko commented on CASSANDRA-8453:
--

That is not something a replication strategy can do.

 Ability to override TTL on different data-centers, plus one-way replication
 ---

 Key: CASSANDRA-8453
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8453
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Jacques-Henri Berthemet

 Here is my scenario:
 I want to have one datacenter specialized for operations DCO and another 
 for historical/audit DCH. Replication will be used between DCO and DCH.
 When TTL expires on DCO and data is deleted I'd like the data on DCH to be 
 kept for other purposes. Ideally a different TTL could be set in DCH.
 I guess this also implies that replication should be done only in DCO = DCH 
 direction so that data is not re-created. But that's secondary, DCH data is 
 not meant to be modified.
 Is this kind of feature feasible for future versions of Cassandra? If not, 
 would you have some pointers to modify Cassandra in order to achieve this 
 functionality?
 Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8356) Slice query on a super column family with counters doesn't get all the data

2014-12-10 Thread Philo Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241116#comment-14241116
 ] 

Philo Yang commented on CASSANDRA-8356:
---

Hi, in fact I have the same trouble for missing data on slice query. My cluster 
is 2.1.1, and regular table(create by cql3, no counter). My table is like this:

{noformat}
CREATE TABLE word (
user text,
word text,
alter_time bigint,
(some other column...)
PRIMARY KEY (user, word)
) WITH CLUSTERING ORDER BY (word ASC)
{noformat}

If a row with 26 column whose column names are a,b...z.

Usually I query this table like this:
{noformat}
select * from word where user = 'userid';
{noformat}
and it should return 26 rows in cql.

However, in some rows (most of rows in this table won't lose data) it return 
only part of columns, some row(row for cql, column for cassandra ), will not 
return even in consistency level ALL. The row that doesn't return is fixed for 
querying many times. For example, b, is a row that is missing. If I query like
{noformat}
select * from word where user = 'userid';
or
select * from word where user = 'userid' and word 'a';
or
select * from word where user = 'userid' and word ='b';
or
select * from word where user = 'userid' and word ='b' order by word desc;
or
select * from word where user = 'userid' and word 'z';
{noformat}
'b' is always missing.

But if I query like:
{noformat}
select * from word where user = 'userid' and word ='b';
or
select * from word where user = 'userid' and word ='b';
or
select * from word where user = 'userid' and word ='b' order by word desc;
{noformat}

It will show in result set.




 

 Slice query on a super column family with counters doesn't get all the data
 ---

 Key: CASSANDRA-8356
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8356
 Project: Cassandra
  Issue Type: Bug
Reporter: Nicolas Lalevée
Assignee: Aleksey Yeschenko
 Fix For: 2.0.12


 We've finally been able to upgrade our cluster to 2.0.11, after 
 CASSANDRA-7188 being fixed.
 But now slice queries on a super column family with counters doesn't return 
 all the expected data. We first though because of all the trouble we had that 
 we lost data, but there a way to actually get the data, so nothing is lost; 
 it just that cassandra seems to incorrectly skip it.
 See the following CQL log:
 {noformat}
 cqlsh:Theme desc table theme_view;
 CREATE TABLE theme_view (
   key bigint,
   column1 varint,
   column2 text,
   value counter,
   PRIMARY KEY ((key), column1, column2)
 ) WITH COMPACT STORAGE AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=1.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'SnappyCompressor'};
 cqlsh:Theme select * from theme_view where key = 99421 limit 10;
  key   | column1 | column2| value
 ---+-++---
  99421 | -12 | 2011-03-25 |59
  99421 | -12 | 2011-03-26 | 5
  99421 | -12 | 2011-03-27 | 2
  99421 | -12 | 2011-03-28 |40
  99421 | -12 | 2011-03-29 |14
  99421 | -12 | 2011-03-30 |17
  99421 | -12 | 2011-03-31 | 5
  99421 | -12 | 2011-04-01 |37
  99421 | -12 | 2011-04-02 | 7
  99421 | -12 | 2011-04-03 | 4
 (10 rows)
 cqlsh:Theme select * from theme_view where key = 99421 and column1 = -12 
 limit 10;
  key   | column1 | column2| value
 ---+-++---
  99421 | -12 | 2011-03-25 |59
  99421 | -12 | 2014-05-06 |15
  99421 | -12 | 2014-06-06 | 7
  99421 | -12 | 2014-06-10 |22
  99421 | -12 | 2014-06-11 |34
  99421 | -12 | 2014-06-12 |35
  99421 | -12 | 2014-06-13 |26
  99421 | -12 | 2014-06-14 |16
  99421 | -12 | 2014-06-15 |24
  99421 | -12 | 2014-06-16 |25
 (10 rows)
 {noformat}
 As you can see the second query should return data from 2012, but it is not. 
 Via thrift, we have the exact same bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8453) Ability to override TTL on different data-centers, plus one-way replication

2014-12-10 Thread Jacques-Henri Berthemet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241118#comment-14241118
 ] 

Jacques-Henri Berthemet commented on CASSANDRA-8453:


Do you know which class receives/sends the replication messages from other DCs?

 Ability to override TTL on different data-centers, plus one-way replication
 ---

 Key: CASSANDRA-8453
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8453
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Jacques-Henri Berthemet

 Here is my scenario:
 I want to have one datacenter specialized for operations DCO and another 
 for historical/audit DCH. Replication will be used between DCO and DCH.
 When TTL expires on DCO and data is deleted I'd like the data on DCH to be 
 kept for other purposes. Ideally a different TTL could be set in DCH.
 I guess this also implies that replication should be done only in DCO = DCH 
 direction so that data is not re-created. But that's secondary, DCH data is 
 not meant to be modified.
 Is this kind of feature feasible for future versions of Cassandra? If not, 
 would you have some pointers to modify Cassandra in order to achieve this 
 functionality?
 Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair

2014-12-10 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241147#comment-14241147
 ] 

Alan Boudreault commented on CASSANDRA-8316:


[~krummas] [~yukim] With some test runs, I cannot see the high CPU utilization 
issue again. However, I still see the error message. Also I've noticed an 
important changes between with and without the patch. 

WITHOUT the patch: I can re-run the increment repairs. I might get again the 
error message on the node that initially failed, but things will get OK after 
the initial endpoints that failed are repaired.

WITH the patch: I cannot do an incremental repairs anymore, even after a 
restart. This is what I get trying to run the repairs on my node:

{code}
aboudreault@kovarro:~/dev/cstar/8316$ ccm node1 nodetool -- repair -par -inc
[2014-12-10 09:00:42,767] Starting repair command #1, repairing 3 ranges for 
keyspace r1 (parallelism=PARALLEL, full=false)
[2014-12-10 09:00:48,045] Repair session ee2a78c0-8074-11e4-9b59-bbfe19a8e904 
for range (4611686018427387904,6917529027641081856] finished
[2014-12-10 09:00:48,046] Repair session ef77e050-8074-11e4-9b59-bbfe19a8e904 
for range (2305843009213693952,4611686018427387904] finished
[2014-12-10 09:00:48,048] Repair session f06107d0-8074-11e4-9b59-bbfe19a8e904 
for range (6917529027641081856,-9223372036854775808] finished
[2014-12-10 09:00:48,078] Repair command #1 finished
[2014-12-10 09:00:48,088] Nothing to repair for keyspace 'system'
[2014-12-10 09:00:48,104] Starting repair command #2, repairing 2 ranges for 
keyspace system_traces (parallelism=PARALLEL, full=false)
[2014-12-10 09:00:58,916] Repair failed with error Did not get positive replies 
from all endpoints. List of failed endpoint(s): [127.0.0.2]
aboudreault@kovarro:~/dev/cstar/8316$ ccm node2 nodetool -- repair -par -inc
[2014-12-10 09:01:07,233] Starting repair command #1, repairing 3 ranges for 
keyspace r1 (parallelism=PARALLEL, full=false)
[2014-12-10 09:01:07,239] Repair failed with error Already repairing 
SSTableReader(path='/home/aboudreault/.ccm/local/node2/data/r1/Standard1-c38dd6f0807111e494d8bbfe19a8e904/r1-Standard1-ka-5-Data.db'),
 can not continue.
[2014-12-10 09:01:07,247] Nothing to repair for keyspace 'system'
[2014-12-10 09:01:07,252] Starting repair command #2, repairing 2 ranges for 
keyspace system_traces (parallelism=PARALLEL, full=false)
[2014-12-10 09:01:07,254] Repair failed with error null
{code}

Does this help?


  Did not get positive replies from all endpoints error on incremental repair
 --

 Key: CASSANDRA-8316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.1.2
Reporter: Loic Lambiel
Assignee: Marcus Eriksson
 Fix For: 2.1.3

 Attachments: 0001-patch.patch, 
 CassandraDaemon-2014-11-25-2.snapshot.tar.gz, test.sh


 Hi,
 I've got an issue with incremental repairs on our production 15 nodes 2.1.2 
 (new cluster, not yet loaded, RF=3)
 After having successfully performed an incremental repair (-par -inc) on 3 
 nodes, I started receiving Repair failed with error Did not get positive 
 replies from all endpoints. from nodetool on all remaining nodes :
 [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges 
 for keyspace  (seq=false, full=false)
 [2014-11-14 09:12:47,919] Repair failed with error Did not get positive 
 replies from all endpoints.
 All the nodes are up and running and the local system log shows that the 
 repair commands got started and that's it.
 I've also noticed that soon after the repair, several nodes started having 
 more cpu load indefinitely without any particular reason (no tasks / queries, 
 nothing in the logs). I then restarted C* on these nodes and retried the 
 repair on several nodes, which were successful until facing the issue again.
 I tried to repro on our 3 nodes preproduction cluster without success
 It looks like I'm not the only one having this issue: 
 http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html
 Any idea?
 Thanks
 Loic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again

2014-12-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241151#comment-14241151
 ] 

Ariel Weisberg commented on CASSANDRA-8449:
---

It's not just junk data right? If the file is unmapped it would probably 
segfault.

I think you need to reference count the file.

 Allow zero-copy reads again
 ---

 Key: CASSANDRA-8449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 3.0


 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads 
 accessing a ByteBuffer when the data was unmapped by compaction.  Currently 
 this code path is only used for uncompressed reads.
 The actual bytes are in fact copied to the client output buffers for both 
 netty and thrift before being sent over the wire, so the only issue really is 
 the time it takes to process the read internally.  
 This patch adds a slow network read test and changes the tidy() method to 
 actually delete a sstable once the readTimeout has elapsed giving plenty of 
 time to serialize the read.
 Removing this copy causes significantly less GC on the read path and improves 
 the tail latencies:
 http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair

2014-12-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241153#comment-14241153
 ] 

Marcus Eriksson commented on CASSANDRA-8316:


yep, will have a look, seems that we don't clear out the repair session on this 
failure mode

  Did not get positive replies from all endpoints error on incremental repair
 --

 Key: CASSANDRA-8316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.1.2
Reporter: Loic Lambiel
Assignee: Marcus Eriksson
 Fix For: 2.1.3

 Attachments: 0001-patch.patch, 
 CassandraDaemon-2014-11-25-2.snapshot.tar.gz, test.sh


 Hi,
 I've got an issue with incremental repairs on our production 15 nodes 2.1.2 
 (new cluster, not yet loaded, RF=3)
 After having successfully performed an incremental repair (-par -inc) on 3 
 nodes, I started receiving Repair failed with error Did not get positive 
 replies from all endpoints. from nodetool on all remaining nodes :
 [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges 
 for keyspace  (seq=false, full=false)
 [2014-11-14 09:12:47,919] Repair failed with error Did not get positive 
 replies from all endpoints.
 All the nodes are up and running and the local system log shows that the 
 repair commands got started and that's it.
 I've also noticed that soon after the repair, several nodes started having 
 more cpu load indefinitely without any particular reason (no tasks / queries, 
 nothing in the logs). I then restarted C* on these nodes and retried the 
 repair on several nodes, which were successful until facing the issue again.
 I tried to repro on our 3 nodes preproduction cluster without success
 It looks like I'm not the only one having this issue: 
 http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html
 Any idea?
 Thanks
 Loic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair

2014-12-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241153#comment-14241153
 ] 

Marcus Eriksson edited comment on CASSANDRA-8316 at 12/10/14 2:42 PM:
--

yep, will have a look, seems that we don't clear out the parent repair session 
on this failure mode


was (Author: krummas):
yep, will have a look, seems that we don't clear out the repair session on this 
failure mode

  Did not get positive replies from all endpoints error on incremental repair
 --

 Key: CASSANDRA-8316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.1.2
Reporter: Loic Lambiel
Assignee: Marcus Eriksson
 Fix For: 2.1.3

 Attachments: 0001-patch.patch, 
 CassandraDaemon-2014-11-25-2.snapshot.tar.gz, test.sh


 Hi,
 I've got an issue with incremental repairs on our production 15 nodes 2.1.2 
 (new cluster, not yet loaded, RF=3)
 After having successfully performed an incremental repair (-par -inc) on 3 
 nodes, I started receiving Repair failed with error Did not get positive 
 replies from all endpoints. from nodetool on all remaining nodes :
 [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges 
 for keyspace  (seq=false, full=false)
 [2014-11-14 09:12:47,919] Repair failed with error Did not get positive 
 replies from all endpoints.
 All the nodes are up and running and the local system log shows that the 
 repair commands got started and that's it.
 I've also noticed that soon after the repair, several nodes started having 
 more cpu load indefinitely without any particular reason (no tasks / queries, 
 nothing in the logs). I then restarted C* on these nodes and retried the 
 repair on several nodes, which were successful until facing the issue again.
 I tried to repro on our 3 nodes preproduction cluster without success
 It looks like I'm not the only one having this issue: 
 http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html
 Any idea?
 Thanks
 Loic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again

2014-12-10 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241173#comment-14241173
 ] 

T Jake Luciani commented on CASSANDRA-8449:
---

Hmm yeah, if there was a pause before serializing the message and compaction 
was running you would still segfault.  I was going to initially try using 
reference queue and weak references to track when the byte buffer but any 
.duplicate() would break that model.  [~benedict] perhaps we can use 
CASSANDRA-7705 to manage this?

 

 Allow zero-copy reads again
 ---

 Key: CASSANDRA-8449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 3.0


 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads 
 accessing a ByteBuffer when the data was unmapped by compaction.  Currently 
 this code path is only used for uncompressed reads.
 The actual bytes are in fact copied to the client output buffers for both 
 netty and thrift before being sent over the wire, so the only issue really is 
 the time it takes to process the read internally.  
 This patch adds a slow network read test and changes the tidy() method to 
 actually delete a sstable once the readTimeout has elapsed giving plenty of 
 time to serialize the read.
 Removing this copy causes significantly less GC on the read path and improves 
 the tail latencies:
 http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again

2014-12-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241215#comment-14241215
 ] 

Benedict commented on CASSANDRA-8449:
-

CASSANDRA-7705 is really designed for situations where we know there won't be 
loads in-flight; i'd prefer not to reintroduce excessive long-lifetime 
reference counting onto the read critical path (we don't ref count sstable 
readers anymore, since CASSANDRA-6919).

All we're doing here is delaying when we unmap the file until a time it is 
known to be unused, so we could create a global OpOrder that guards against 
this; all requests that hit the node are guarded by the OpOrder for their 
entire duration, and only once _all_ requests that started prior to _thinking_ 
the data is free do we actually free it. Typically I would not want to use this 
approach for guarding operations that could take arbitrarily long, but really 
all we're sacrificing is virtual address space, so being delayed more than you 
expect (even excessively) should not noticeably impact system performance, as 
the OS can choose to drop those pages on the floor, keeping only the mapping 
overhead.

 Allow zero-copy reads again
 ---

 Key: CASSANDRA-8449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 3.0


 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads 
 accessing a ByteBuffer when the data was unmapped by compaction.  Currently 
 this code path is only used for uncompressed reads.
 The actual bytes are in fact copied to the client output buffers for both 
 netty and thrift before being sent over the wire, so the only issue really is 
 the time it takes to process the read internally.  
 This patch adds a slow network read test and changes the tidy() method to 
 actually delete a sstable once the readTimeout has elapsed giving plenty of 
 time to serialize the read.
 Removing this copy causes significantly less GC on the read path and improves 
 the tail latencies:
 http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8453) Ability to override TTL on different data-centers, plus one-way replication

2014-12-10 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241216#comment-14241216
 ] 

Robert Stupp commented on CASSANDRA-8453:
-

I don't know the exact class name. But I'd strongly recommend not to change 
that behavior in the code. It can and will damage data in the whole cluster 
since all partitions must be the same on all nodes - that's (in simple words) 
the code principle Aleksey mentioned.

 Ability to override TTL on different data-centers, plus one-way replication
 ---

 Key: CASSANDRA-8453
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8453
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Jacques-Henri Berthemet

 Here is my scenario:
 I want to have one datacenter specialized for operations DCO and another 
 for historical/audit DCH. Replication will be used between DCO and DCH.
 When TTL expires on DCO and data is deleted I'd like the data on DCH to be 
 kept for other purposes. Ideally a different TTL could be set in DCH.
 I guess this also implies that replication should be done only in DCO = DCH 
 direction so that data is not re-created. But that's secondary, DCH data is 
 not meant to be modified.
 Is this kind of feature feasible for future versions of Cassandra? If not, 
 would you have some pointers to modify Cassandra in order to achieve this 
 functionality?
 Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Alexander Radzin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241255#comment-14241255
 ] 

Alexander Radzin commented on CASSANDRA-8390:
-

Joshua McKenzie, I tried this with the same result. I validated that the value 
was indeed used by cassandra by changing value to something illegal. This threw 
exception. 

 The process cannot access the file because it is being used by another process
 --

 Key: CASSANDRA-8390
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390
 Project: Cassandra
  Issue Type: Bug
Reporter: Ilya Komolkin
Assignee: Joshua McKenzie
 Fix For: 2.1.3


 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - 
 Exception in thread Thread[NonPeriodicTasks:1,5,main]
 org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94)
  ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_71]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
  ~[na:1.7.0_71]
 at 
 sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
  ~[na:1.7.0_71]
 at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 ... 11 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-8453) Ability to override TTL on different data-centers, plus one-way replication

2014-12-10 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko resolved CASSANDRA-8453.
--
Resolution: Not a Problem

Right.

If you somehow manage to do it your way, on reads the digest would mismatch 
(ttl is part of the digest calculation) and read repair will sync the data. 
Same regarding explicit repair.

Either you go with writing the same data to several separate keyspaces - with 
TTL to one, without TTL to another, and only have the non-ttl keyspace in one 
DC, or, well, there is no other way.

 Ability to override TTL on different data-centers, plus one-way replication
 ---

 Key: CASSANDRA-8453
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8453
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Jacques-Henri Berthemet

 Here is my scenario:
 I want to have one datacenter specialized for operations DCO and another 
 for historical/audit DCH. Replication will be used between DCO and DCH.
 When TTL expires on DCO and data is deleted I'd like the data on DCH to be 
 kept for other purposes. Ideally a different TTL could be set in DCH.
 I guess this also implies that replication should be done only in DCO = DCH 
 direction so that data is not re-created. But that's secondary, DCH data is 
 not meant to be modified.
 Is this kind of feature feasible for future versions of Cassandra? If not, 
 would you have some pointers to modify Cassandra in order to achieve this 
 functionality?
 Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Alexander Radzin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241255#comment-14241255
 ] 

Alexander Radzin edited comment on CASSANDRA-8390 at 12/10/14 4:06 PM:
---

Joshua McKenzie, I tried this with the same result. I validated that the 
parameter was indeed used by cassandra by changing value to something illegal. 
This threw exception. 


was (Author: alexander_radzin):
Joshua McKenzie, I tried this with the same result. I validated that the value 
was indeed used by cassandra by changing value to something illegal. This threw 
exception. 

 The process cannot access the file because it is being used by another process
 --

 Key: CASSANDRA-8390
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390
 Project: Cassandra
  Issue Type: Bug
Reporter: Ilya Komolkin
Assignee: Joshua McKenzie
 Fix For: 2.1.3


 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - 
 Exception in thread Thread[NonPeriodicTasks:1,5,main]
 org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94)
  ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_71]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
  ~[na:1.7.0_71]
 at 
 sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
  ~[na:1.7.0_71]
 at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 ... 11 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

2014-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241301#comment-14241301
 ] 

Björn Hegerfors commented on CASSANDRA-8371:


[~jshook] I don't understand what you're saying about ideal scheduling. There 
might be some confusion here, as Marcus's blog post about DTCS draws a 
simplified picture of how DTCS works. In his picture, the rightmost vertical 
line represents now. And while now certainly moves forward, the other 
vertical lines, denoting window borders, do not actually move with it. That's 
where his description is wrong (I just told him about it). Rather, these 
windows borders are perfectly static, and the passage of time instead unveils 
new time windows. The newest window (which now actually lies _inside_ of) is 
always base_time_seconds in size. Then windows are merged with each other at 
certain points in time. This is an instantaneous thing. Specifically, 
min_threshold windows of the same size are merged into one window at exactly 
the moment when yet another window of that same size is created. Say that 
min_threshold=4 and base_time_seconds=60 (1 minute). Let's say that the last 4 
windows are all 1-minute windows (they certainly don't have to be, there can be 
anywhere between 1 and 4 same-sized windows). At the turn of the next minute, a 
new 1-minute window is created, and the previous ones are from that moment 
considered to be one 4-minute window (there is not moment when there are 5 
1-minute windows).

The windows are the ideal SSTable placements for DTCS. The idea is that every 
window only contains one SSTable, that spant the while time window. In 
practice, this is also very nearly what happens, except that the compaction 
triggered by windows merging is not instantaneous. There are some quirks that 
let more than one SSTable live in one time window. CASSANDRA-8360 wants to 
address that. CASSANDRA-8361 takes it one step further.

It's true that repairs can data in old windows in at later points. Read repairs 
don't mix too well with DTCS for that reason, but anti-entropy repair costs so 
much that an extra compaction at the end makes little difference. I think 
incremental repair should mix nicely with DTCS, but I don't know much about it.

Sorry if you already knew all of this, but in that case, what is you definition 
of perfect scheduling?

 DateTieredCompactionStrategy is always compacting 
 --

 Key: CASSANDRA-8371
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: mck
Assignee: Björn Hegerfors
  Labels: compaction, performance
 Attachments: java_gc_counts_rate-month.png, 
 read-latency-recommenders-adview.png, read-latency.png, 
 sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png


 Running 2.0.11 and having switched a table to 
 [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that 
 disk IO and gc count increase, along with the number of reads happening in 
 the compaction hump of cfhistograms.
 Data, and generally performance, looks good, but compactions are always 
 happening, and pending compactions are building up.
 The schema for this is 
 {code}CREATE TABLE search (
   loginid text,
   searchid timeuuid,
   description text,
   searchkey text,
   searchurl text,
   PRIMARY KEY ((loginid), searchid)
 );{code}
 We're sitting on about 82G (per replica) across 6 nodes in 4 DCs.
 CQL executed against this keyspace, and traffic patterns, can be seen in 
 slides 7+8 of https://prezi.com/b9-aj6p2esft/
 Attached are sstables-per-read and read-latency graphs from cfhistograms, and 
 screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), 
 to DTCS (week ~46).
 These screenshots are also found in the prezi on slides 9-11.
 [~pmcfadin], [~Bj0rn], 
 Can this be a consequence of occasional deleted rows, as is described under 
 (3) in the description of CASSANDRA-6602 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

2014-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241301#comment-14241301
 ] 

Björn Hegerfors edited comment on CASSANDRA-8371 at 12/10/14 4:13 PM:
--

[~jshook] I don't understand what you're saying about ideal scheduling. There 
might be some confusion here, as Marcus's blog post about DTCS draws a 
simplified picture of how DTCS works. In his picture, the rightmost vertical 
line represents now. And while now certainly moves forward, the other 
vertical lines, denoting window borders, should not actually move with it. 
That's where his description is wrong (I just told him about it). Rather, these 
windows borders are perfectly static, and the passage of time instead unveils 
new time windows. The newest window (which now actually lies _inside_ of) is 
always base_time_seconds in size. Then windows are merged with each other at 
certain points in time. This is an instantaneous thing. Specifically, 
min_threshold windows of the same size are merged into one window at exactly 
the moment when yet another window of that same size is created. Say that 
min_threshold=4 and base_time_seconds=60 (1 minute). Let's say that the last 4 
windows are all 1-minute windows (they certainly don't have to be, there can be 
anywhere between 1 and 4 same-sized windows). At the turn of the next minute, a 
new 1-minute window is created, and the previous ones are from that moment 
considered to be one 4-minute window (there is not moment when there are 5 
1-minute windows).

The windows are the ideal SSTable placements for DTCS. The idea is that every 
window only contains one SSTable, that spans the whole time window. In 
practice, this is also very nearly what happens, except that the compaction 
triggered by windows merging is not instantaneous. There are some quirks that 
let more than one SSTable live in one time window. CASSANDRA-8360 wants to 
address that. CASSANDRA-8361 takes it one step further.

It's true that repairs can data in old windows in at later points. Read repairs 
don't mix too well with DTCS for that reason, but anti-entropy repair costs so 
much that an extra compaction at the end makes little difference. I think 
incremental repair should mix nicely with DTCS, but I don't know much about it.

Sorry if you already knew all of this, but in that case, what is you definition 
of perfect scheduling?


was (Author: bj0rn):
[~jshook] I don't understand what you're saying about ideal scheduling. There 
might be some confusion here, as Marcus's blog post about DTCS draws a 
simplified picture of how DTCS works. In his picture, the rightmost vertical 
line represents now. And while now certainly moves forward, the other 
vertical lines, denoting window borders, do not actually move with it. That's 
where his description is wrong (I just told him about it). Rather, these 
windows borders are perfectly static, and the passage of time instead unveils 
new time windows. The newest window (which now actually lies _inside_ of) is 
always base_time_seconds in size. Then windows are merged with each other at 
certain points in time. This is an instantaneous thing. Specifically, 
min_threshold windows of the same size are merged into one window at exactly 
the moment when yet another window of that same size is created. Say that 
min_threshold=4 and base_time_seconds=60 (1 minute). Let's say that the last 4 
windows are all 1-minute windows (they certainly don't have to be, there can be 
anywhere between 1 and 4 same-sized windows). At the turn of the next minute, a 
new 1-minute window is created, and the previous ones are from that moment 
considered to be one 4-minute window (there is not moment when there are 5 
1-minute windows).

The windows are the ideal SSTable placements for DTCS. The idea is that every 
window only contains one SSTable, that spant the while time window. In 
practice, this is also very nearly what happens, except that the compaction 
triggered by windows merging is not instantaneous. There are some quirks that 
let more than one SSTable live in one time window. CASSANDRA-8360 wants to 
address that. CASSANDRA-8361 takes it one step further.

It's true that repairs can data in old windows in at later points. Read repairs 
don't mix too well with DTCS for that reason, but anti-entropy repair costs so 
much that an extra compaction at the end makes little difference. I think 
incremental repair should mix nicely with DTCS, but I don't know much about it.

Sorry if you already knew all of this, but in that case, what is you definition 
of perfect scheduling?

 DateTieredCompactionStrategy is always compacting 
 --

 Key: CASSANDRA-8371
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
 Project: Cassandra
  Issue Type: Bug
  

[jira] [Comment Edited] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

2014-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241301#comment-14241301
 ] 

Björn Hegerfors edited comment on CASSANDRA-8371 at 12/10/14 4:14 PM:
--

[~jshook] I don't understand what you're saying about ideal scheduling. There 
might be some confusion here, as Marcus's blog post about DTCS draws a 
simplified picture of how DTCS works. In his picture, the rightmost vertical 
line represents now. And while now certainly moves forward, the other 
vertical lines, denoting window borders, should not actually move with it. 
That's where his description is wrong (I just told him about it). Rather, these 
windows borders are perfectly static, and the passage of time instead unveils 
new time windows. The newest window (which now actually lies _inside_ of) is 
always base_time_seconds in size. Then windows are merged with each other at 
certain points in time. This is an instantaneous thing. Specifically, 
min_threshold windows of the same size are merged into one window at exactly 
the moment when yet another window of that same size is created. Say that 
min_threshold=4 and base_time_seconds=60 (1 minute). Let's say that the last 4 
windows are all 1-minute windows (they certainly don't have to be, there can be 
anywhere between 1 and 4 same-sized windows). At the turn of the next minute, a 
new 1-minute window is created, and the previous ones are from that moment 
considered to be one 4-minute window (there is not moment when there are 5 
1-minute windows).

The windows are the ideal SSTable placements for DTCS. The idea is that every 
window only contains one SSTable, that spans the whole time window. In 
practice, this is also very nearly what happens, except that the compaction 
triggered by windows merging is not instantaneous. There are some quirks that 
let more than one SSTable live in one time window. CASSANDRA-8360 wants to 
address that. CASSANDRA-8361 takes it one step further.

It's true that repairs can put data in old windows in at later points. Read 
repairs don't mix too well with DTCS for that reason, but anti-entropy repair 
costs so much that an extra compaction at the end makes little difference. I 
think incremental repair should mix nicely with DTCS, but I don't know much 
about it.

Sorry if you already knew all of this, but in that case, what is you definition 
of perfect scheduling?


was (Author: bj0rn):
[~jshook] I don't understand what you're saying about ideal scheduling. There 
might be some confusion here, as Marcus's blog post about DTCS draws a 
simplified picture of how DTCS works. In his picture, the rightmost vertical 
line represents now. And while now certainly moves forward, the other 
vertical lines, denoting window borders, should not actually move with it. 
That's where his description is wrong (I just told him about it). Rather, these 
windows borders are perfectly static, and the passage of time instead unveils 
new time windows. The newest window (which now actually lies _inside_ of) is 
always base_time_seconds in size. Then windows are merged with each other at 
certain points in time. This is an instantaneous thing. Specifically, 
min_threshold windows of the same size are merged into one window at exactly 
the moment when yet another window of that same size is created. Say that 
min_threshold=4 and base_time_seconds=60 (1 minute). Let's say that the last 4 
windows are all 1-minute windows (they certainly don't have to be, there can be 
anywhere between 1 and 4 same-sized windows). At the turn of the next minute, a 
new 1-minute window is created, and the previous ones are from that moment 
considered to be one 4-minute window (there is not moment when there are 5 
1-minute windows).

The windows are the ideal SSTable placements for DTCS. The idea is that every 
window only contains one SSTable, that spans the whole time window. In 
practice, this is also very nearly what happens, except that the compaction 
triggered by windows merging is not instantaneous. There are some quirks that 
let more than one SSTable live in one time window. CASSANDRA-8360 wants to 
address that. CASSANDRA-8361 takes it one step further.

It's true that repairs can data in old windows in at later points. Read repairs 
don't mix too well with DTCS for that reason, but anti-entropy repair costs so 
much that an extra compaction at the end makes little difference. I think 
incremental repair should mix nicely with DTCS, but I don't know much about it.

Sorry if you already knew all of this, but in that case, what is you definition 
of perfect scheduling?

 DateTieredCompactionStrategy is always compacting 
 --

 Key: CASSANDRA-8371
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
 Project: Cassandra
  Issue 

[jira] [Comment Edited] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

2014-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241301#comment-14241301
 ] 

Björn Hegerfors edited comment on CASSANDRA-8371 at 12/10/14 4:14 PM:
--

[~jshook] I don't understand what you're saying about ideal scheduling. There 
might be some confusion here, as Marcus's blog post about DTCS draws a 
simplified picture of how DTCS works. In his picture, the rightmost vertical 
line represents now. And while now certainly moves forward, the other 
vertical lines, denoting window borders, should not actually move with it. 
That's where his description is wrong (I just told him about it). Rather, these 
windows borders are perfectly static, and the passage of time instead unveils 
new time windows. The newest window (which now actually lies _inside_ of) is 
always base_time_seconds in size. Then windows are merged with each other at 
certain points in time. This is an instantaneous thing. Specifically, 
min_threshold windows of the same size are merged into one window at exactly 
the moment when yet another window of that same size is created. Say that 
min_threshold=4 and base_time_seconds=60 (1 minute). Let's say that the last 4 
windows are all 1-minute windows (they certainly don't have to be, there can be 
anywhere between 1 and 4 same-sized windows). At the turn of the next minute, a 
new 1-minute window is created, and the previous ones are from that moment 
considered to be one 4-minute window (there is not moment when there are 5 
1-minute windows).

The windows are the ideal SSTable placements for DTCS. The idea is that every 
window only contains one SSTable, that spans the whole time window. In 
practice, this is also very nearly what happens, except that the compaction 
triggered by windows merging is not instantaneous. There are some quirks that 
let more than one SSTable live in one time window. CASSANDRA-8360 wants to 
address that. CASSANDRA-8361 takes it one step further.

It's true that repairs can put data in old windows in at later points. Read 
repairs don't mix too well with DTCS for that reason, but anti-entropy repair 
costs so much that an extra compaction at the end makes little difference. I 
think incremental repair should mix nicely with DTCS, but I don't know much 
about it.

Sorry if you already knew all of this, but in that case, what is you definition 
of ideal scheduling?


was (Author: bj0rn):
[~jshook] I don't understand what you're saying about ideal scheduling. There 
might be some confusion here, as Marcus's blog post about DTCS draws a 
simplified picture of how DTCS works. In his picture, the rightmost vertical 
line represents now. And while now certainly moves forward, the other 
vertical lines, denoting window borders, should not actually move with it. 
That's where his description is wrong (I just told him about it). Rather, these 
windows borders are perfectly static, and the passage of time instead unveils 
new time windows. The newest window (which now actually lies _inside_ of) is 
always base_time_seconds in size. Then windows are merged with each other at 
certain points in time. This is an instantaneous thing. Specifically, 
min_threshold windows of the same size are merged into one window at exactly 
the moment when yet another window of that same size is created. Say that 
min_threshold=4 and base_time_seconds=60 (1 minute). Let's say that the last 4 
windows are all 1-minute windows (they certainly don't have to be, there can be 
anywhere between 1 and 4 same-sized windows). At the turn of the next minute, a 
new 1-minute window is created, and the previous ones are from that moment 
considered to be one 4-minute window (there is not moment when there are 5 
1-minute windows).

The windows are the ideal SSTable placements for DTCS. The idea is that every 
window only contains one SSTable, that spans the whole time window. In 
practice, this is also very nearly what happens, except that the compaction 
triggered by windows merging is not instantaneous. There are some quirks that 
let more than one SSTable live in one time window. CASSANDRA-8360 wants to 
address that. CASSANDRA-8361 takes it one step further.

It's true that repairs can put data in old windows in at later points. Read 
repairs don't mix too well with DTCS for that reason, but anti-entropy repair 
costs so much that an extra compaction at the end makes little difference. I 
think incremental repair should mix nicely with DTCS, but I don't know much 
about it.

Sorry if you already knew all of this, but in that case, what is you definition 
of perfect scheduling?

 DateTieredCompactionStrategy is always compacting 
 --

 Key: CASSANDRA-8371
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
 Project: Cassandra
  Issue 

[jira] [Updated] (CASSANDRA-8308) Windows: Commitlog access violations on unit tests

2014-12-10 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-8308:
---
Attachment: 8308_v2.txt

v2 attached.  Good catch on truncate on nio - I misread the javadoc on that and 
also assumed they were going for functional parity with RAF.setLength.

I couldn't find an analogue to RAF.setLength in nio; rather than creating a 
single byte ByteBuffer, seeking to DD.getCommitLogSegmentSize(), writing that 
byte, and seeking back - I went ahead and just used RAF.setLength to get our 
size and then used the FileChannel API to map it later as it seems less prone 
to error and opening CLS isn't critical path.  If there's a cleaner or more 
idiomatic way to do that in nio I'm all for it but I couldn't track it down.

I also added another call to CommitLog.instance.resetUnsafe in SchemaLoader 
before we attempt to delete directories as it was failing to delete the 
memory-mapped files.  Not sure why it worked in v1 but it definitely needs it 
now.

Lastly - while I 100% agree the os determination needs to be tightened up (see 
CASSANDRA-8452), I'm not sure how that's related to this patch as none of the 
changes reference that.

 Windows: Commitlog access violations on unit tests
 --

 Key: CASSANDRA-8308
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8308
 Project: Cassandra
  Issue Type: Bug
Reporter: Joshua McKenzie
Assignee: Joshua McKenzie
Priority: Minor
  Labels: Windows
 Fix For: 3.0

 Attachments: 8308_v1.txt, 8308_v2.txt


 We have four unit tests failing on trunk on Windows, all with 
 FileSystemException's related to the SchemaLoader:
 {noformat}
 [junit] Test 
 org.apache.cassandra.db.compaction.DateTieredCompactionStrategyTest FAILED
 [junit] Test org.apache.cassandra.cql3.ThriftCompatibilityTest FAILED
 [junit] Test org.apache.cassandra.io.sstable.SSTableRewriterTest FAILED
 [junit] Test org.apache.cassandra.repair.LocalSyncTaskTest FAILED
 {noformat}
 Example error:
 {noformat}
 [junit] Caused by: java.nio.file.FileSystemException: 
 build\test\cassandra\commitlog;0\CommitLog-5-1415908745965.log: The process 
 cannot access the file because it is being used by another process.
 [junit]
 [junit] at 
 sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
 [junit] at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
 [junit] at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
 [junit] at 
 sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
 [junit] at 
 sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
 [junit] at java.nio.file.Files.delete(Files.java:1079)
 [junit] at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:125)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7032) Improve vnode allocation

2014-12-10 Thread Branimir Lambov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Branimir Lambov updated CASSANDRA-7032:
---
Attachment: TestVNodeAllocation.java

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-10 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241332#comment-14241332
 ] 

Branimir Lambov commented on CASSANDRA-7032:


Ignoring replication for the time being (more on that below), and looking at 
what's the best thing we can do when we have an existing setup and we are 
trying to add a node, I came up with the following approach.

We can only assign new vnodes, which means that we can only _take away_ load 
from other nodes, never add to it. On the one hand this means that 
underutilized nodes are hopeless until the cluster grows enough for their share 
to become normal. On the other it means that the best thing to do (aiming for 
the smallest overutilization, i.e. max deviation from mean) is to take the 
highest-load nodes and spread their load evenly between them and the new node.

Adding a new node gives us vnodes many (_vn_) new tokens to issue, i.e. we can 
decrease the load in at most _vn_ other nodes. We can pick up the _vn_ 
highest-load ones, but some of them may already have a lower load than the 
target spread; we thus select the largest _n = vn_ highest load nodes such 
that the spread load _t_, which is their combined load divided by _n+1_, is 
lower than the load of each individual node. We can then choose how to assign 
_vn_ tokens splitting some of the ranges in these _n_ nodes to reduce the load 
of each node to _t_. This should also leave the new node with a load of _t_.

The attached code implements a simple version of this which improves 
overutilization very quickly with every new node-- a typical simulation looks 
like:
{code}
Random generation of 1000 nodes with 256 tokens each
Size 1000   max 1.24 min 0.80   No replication
Adding 1 node(s) using NoReplicationTokenDistributor
Size 1001   max 1.11 min 0.80   No replication
Adding 9 node(s) using NoReplicationTokenDistributor
Size 1010   max 1.05 min 0.81   No replication
Adding 30 node(s) using NoReplicationTokenDistributor
Size 1040   max 1.02 min 0.83   No replication
Adding 210 node(s) using NoReplicationTokenDistributor
Size 1250   max 1.00 min 1.00   No replication
{code}
It also constructs clusters from empty pretty well.

However, when replication is present the load distribution of this allocation 
does not look good (the added node tends to take much more than it should; one 
reason for this is that it becomes a replica of the token ranges it splits), 
which is not unexpected. I am now trying to see how exactly taking replication 
into account affects the reasoning above. We can still only remove load, but 
the way splitting affects the loads is not that clear any more.

As far as I can see the following simplification of Cassandra's replication 
strategies should suffice for handling the current and planned variations:
* we have units made up of a number of vnodes whose load we want to be able to 
balance (currently unit==node, but in the future the unit could be smaller (a 
disk or core))
* units are bunched up in racks (if racks are not defined, a node is implicitly 
a rack for its units)
* replicas of data must be placed on the closest higher vnodes that belong to 
different racks
* the replication strategy specifies the number of replicas and the set of 
units belonging to each rack

Datacentres are irrelevant as replication is specified within each dc, i.e. we 
can isolate the vnode allocation to the individual dc. If disk/core-level 
allocation is in place, the node boundaries within a rack can be ignored as 
well. Is there anything I'm missing?


[~benedict]: I believe you prefer to split the disk/core workload inside the 
node by assigning a token range (e.g. the vnodes that intersect with a range 
corresponding to _1/n_ of the token ring are to be handled by that disk/core). 
I prefer to just choose _1/n_ of the vnodes, because it lets me directly 
balance them-- do you have any objections to this?


 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The 

[jira] [Commented] (CASSANDRA-8453) Ability to override TTL on different data-centers, plus one-way replication

2014-12-10 Thread Jacques-Henri Berthemet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241336#comment-14241336
 ] 

Jacques-Henri Berthemet commented on CASSANDRA-8453:


OK I understand. Thank you both for those detailed explanations.

Regards,
JH

 Ability to override TTL on different data-centers, plus one-way replication
 ---

 Key: CASSANDRA-8453
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8453
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Jacques-Henri Berthemet

 Here is my scenario:
 I want to have one datacenter specialized for operations DCO and another 
 for historical/audit DCH. Replication will be used between DCO and DCH.
 When TTL expires on DCO and data is deleted I'd like the data on DCH to be 
 kept for other purposes. Ideally a different TTL could be set in DCH.
 I guess this also implies that replication should be done only in DCO = DCH 
 direction so that data is not re-created. But that's secondary, DCH data is 
 not meant to be modified.
 Is this kind of feature feasible for future versions of Cassandra? If not, 
 would you have some pointers to modify Cassandra in order to achieve this 
 functionality?
 Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6993) Windows: remove mmap'ed I/O for index files and force standard file access

2014-12-10 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241342#comment-14241342
 ] 

Joshua McKenzie commented on CASSANDRA-6993:


Very good point.  Looking through the code-base, every place where we're using 
isUnix seems to really mean 'isn't Windows' so I'd be comfortable with that 
distinction for now with the ability to make it more complex/powerful in the 
future if necessary.   Also, right now we're skipping early re-open on files 
based on that check for FBUtilities.isUnix (see CASSANDRA-7365) so I'd prefer 
to get this modification in before 2.1.3 so we can get more coverage / usage of 
the early re-open logic, at least on OSX-based dev machines.

 Windows: remove mmap'ed I/O for index files and force standard file access
 --

 Key: CASSANDRA-6993
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6993
 Project: Cassandra
  Issue Type: Improvement
Reporter: Joshua McKenzie
Assignee: Joshua McKenzie
Priority: Minor
  Labels: Windows
 Fix For: 3.0, 2.1.3

 Attachments: 6993_2.1_v1.txt, 6993_v1.txt, 6993_v2.txt


 Memory-mapped I/O on Windows causes issues with hard-links; we're unable to 
 delete hard-links to open files with memory-mapped segments even using nio.  
 We'll need to push for close to performance parity between mmap'ed I/O and 
 buffered going forward as the buffered / compressed path offers other 
 benefits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7032) Improve vnode allocation

2014-12-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241349#comment-14241349
 ] 

Benedict commented on CASSANDRA-7032:
-

If you mean for V vnode tokens in ascending order [0..V), and e.g. D disks, the 
disks would own one of the token lists in the set { [dV/D..(d+1)V/D) : 0 = d  
D }, and you guarantee that the owned range of each list is balanced with the 
other lists, this seems pretty analogous to the approach I was describing and 
perfectly reasonable.

The main goal is only that once a range or set of vnode tokens has been 
assigned to a given resource (disk, cpu, node, rack, whatever) that resource 
never needs to reassign its tokens.

 Improve vnode allocation
 

 Key: CASSANDRA-7032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Branimir Lambov
  Labels: performance, vnodes
 Fix For: 3.0

 Attachments: TestVNodeAllocation.java, TestVNodeAllocation.java, 
 TestVNodeAllocation.java


 It's been known for a little while that random vnode allocation causes 
 hotspots of ownership. It should be possible to improve dramatically on this 
 with deterministic allocation. I have quickly thrown together a simple greedy 
 algorithm that allocates vnodes efficiently, and will repair hotspots in a 
 randomly allocated cluster gradually as more nodes are added, and also 
 ensures that token ranges are fairly evenly spread between nodes (somewhat 
 tunably so). The allocation still permits slight discrepancies in ownership, 
 but it is bound by the inverse of the size of the cluster (as opposed to 
 random allocation, which strangely gets worse as the cluster size increases). 
 I'm sure there is a decent dynamic programming solution to this that would be 
 even better.
 If on joining the ring a new node were to CAS a shared table where a 
 canonical allocation of token ranges lives after running this (or a similar) 
 algorithm, we could then get guaranteed bounds on the ownership distribution 
 in a cluster. This will also help for CASSANDRA-6696.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6060) Remove internal use of Strings for ks/cf names

2014-12-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241364#comment-14241364
 ] 

Ariel Weisberg commented on CASSANDRA-6060:
---

I am still digging but I am not sure there is much value here.

For prepared statements between client and server there are no ks/cf names.

Here is the breakdown for a minimum size mutation inside the cluster

Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes

4-bytes protocol magic
4-bytes version
4-bytes timestamp
4-bytes verb
4-bytes parameter count
4-bytes payload length prefix
No keyspace name in current versions
2-byte key length
key say 10 bytes
4-byte mutation count

1-byte boolean
16-byte cf id
4-byte count of columns

Per column
2-byte column name length prefix
column name say 8 bytes
1-byte serialization flags
8-byte timestamp
4-byte length prefix
column value say 8 bytes

Total is 158 bytes. Saving 12 bytes on the CF uuid would be 7.5 %. 

For single CF mutations this is not a win. Loading data points 16 bytes at a 
time isn't going to work so hot anyways so people might look into batching at 
that point.

The UUID is not repeated for each cell so it is a one time cost so for 
workloads that modify multiple cells per CF. The one case where the 12-bytes 
becomes significant is single cell updates to multiple CFs in one mutation. 
There the 12-byte overhead converges on 23%.

I am going to look at the read path next, but I kind of expect to find 
something similar. A read is going t o have key overhead and possibly overhead 
for all the other query parameters that should match the simple single cell 
mutation case.

 Remove internal use of Strings for ks/cf names
 --

 Key: CASSANDRA-6060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6060
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 We toss a lot of Strings around internally, including across the network.  
 Once a request has been Prepared, we ought to be able to encode these as int 
 ids.
 Unfortuntely, we moved from int to uuid in CASSANDRA-3794, which was a 
 reasonable move at the time, but a uuid is a lot bigger than an int.  Now 
 that we have CAS we can allow concurrent schema updates while still using 
 sequential int IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8406) Add option to set max_sstable_age in seconds in DTCS

2014-12-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241369#comment-14241369
 ] 

Björn Hegerfors commented on CASSANDRA-8406:


I proposed another approach in my last comment on CASSANDRA-8340 (which, by the 
way, is very tightly coupled to this ticket). The idea is to specify max window 
re-merge instead of max sstable age. That option would mean, very nearly, how 
many times do you want each value to be rewritten?. The good thing about that 
option in this context is that it scales relatively to window size. If small 
time windows are used (low baseTime), then a small max_window_exponent will 
indeed lead to a max SSTable age far lower than a day. Consider min_threshold=4 
and base_time_seconds=60. Then max_window_exponent=3 would create all the way 
up to 64-minute windows, and stop after that. With max_window_exponent=10, the 
largest windows will be ~2 years (actually ~1.995 years, coincidentally).

I can implement this. It would not be difficult. But what do you think? Is this 
option too confusing? Is it a bad thing that changing base_time_seconds also 
changes the max SSTable age (linearly)? And that min_threshold does the same 
(polynomially)? It's just that the number of recompactions is what this is all 
about anyway. So why not be explicit about it?

On a second note, would it make sense for some other behavior than no more 
compactions ever after SSTables get too old? For instance, how about a flag 
that makes DTCS create infinitely many same-size windows preceding the max 
window size? So in my first example, infinite 64-minute windows would be 
produced. In the event of a repair or out-of-order write, a window many days 
old may be touched and a compaction would trigger in that window. I'm not 
suggesting this as a default, but maybe it's useful for something?

 Add option to set max_sstable_age in seconds in DTCS
 

 Key: CASSANDRA-8406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8406
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 2.0.12

 Attachments: 0001-patch.patch


 Using days as the unit for max_sstable_age in DTCS might be too much, add 
 option to set it in seconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4139) Add varint encoding to Messaging service

2014-12-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241388#comment-14241388
 ] 

Ariel Weisberg commented on CASSANDRA-4139:
---

I think variable length integer encoding could be a big space saving in several 
contexts, but there is an argument against varints.

If you want to do zero deserialization/copy varints will fight you because you 
can't random access fields by offset. 

What you can do instead is use generic compression. Counter-intuitive but think 
of the two use cases. I care about bandwidth therefore I need compression 
anyways for non-integer fields, or I don't care about bandwidth so why not 
maximize performance.

Where this becomes important is in handling large messages where you don't want 
parse all of it because you are forwarding or may not consume the entire 
contents. If you have varints and want to be lazy it gets tricky.

I am up for trying it and out and measuring..

 Add varint encoding to Messaging service
 

 Key: CASSANDRA-4139
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4139
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Vijay
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-4139-v1.patch, 
 0001-CASSANDRA-4139-v2.patch, 0001-CASSANDRA-4139-v4.patch, 
 0002-add-bytes-written-metric.patch, 4139-Test.rtf, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-4139-v3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241392#comment-14241392
 ] 

Joshua McKenzie commented on CASSANDRA-8390:


Thanks for the info Alexander - I'll try and reproduce locally w/the gist you 
linked.  Also - thanks for the reproduction!  Those are *very* helpful in cases 
like this.

 The process cannot access the file because it is being used by another process
 --

 Key: CASSANDRA-8390
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390
 Project: Cassandra
  Issue Type: Bug
Reporter: Ilya Komolkin
Assignee: Joshua McKenzie
 Fix For: 2.1.3


 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - 
 Exception in thread Thread[NonPeriodicTasks:1,5,main]
 org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94)
  ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_71]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
  ~[na:1.7.0_71]
 at 
 sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
  ~[na:1.7.0_71]
 at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 ... 11 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8454) Convert cql_tests from cassandra-dtest to CQLTester unit tests

2014-12-10 Thread Philip Thompson (JIRA)
Philip Thompson created CASSANDRA-8454:
--

 Summary: Convert cql_tests from cassandra-dtest to CQLTester unit 
tests
 Key: CASSANDRA-8454
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8454
 Project: Cassandra
  Issue Type: Test
  Components: Tests
Reporter: Philip Thompson
Assignee: Philip Thompson
 Fix For: 3.0, 2.1.3


See the discussion at [this 
mail|http://mail-archives.apache.org/mod_mbox/cassandra-dev/201405.mbox/%3ccaaam9sva7vmxj5sbyyb6aorltu6sssg3rifo42+hedafrxx...@mail.gmail.com%3E#archives].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8437) Track digest mismatch ratio

2014-12-10 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-8437:
--
Fix Version/s: 2.1.3

 Track digest mismatch ratio
 ---

 Key: CASSANDRA-8437
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8437
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benjamin Lerer
Priority: Minor
 Fix For: 2.1.3


 I don't believe we track how often read results in a digest mismatch but we 
 should since that could directly impact read performance in practice.
 Once we have that data, it might be that some workloads (write heavy most 
 likely) ends up with enough mismatches that going to the data read is more 
 efficient in practice. What we do about it it step 2 however, but getting the 
 data is easy enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread jonathan lacefield (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jonathan lacefield updated CASSANDRA-8447:
--
Description: 
Behavior - If autocompaction is enabled, nodes will become unresponsive due to 
a full Old Gen heap which is not cleared during CMS GC.

Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
enabled on 1 node.  Executed different Cassandra stress loads, using write only 
operations.  Monitored visualvm and jconsole for heap pressure.  Captured 
iostat and dstat for most tests.  Captured heap dump from 50 thread load.  
Hints were disabled for testing on all nodes to alleviate GC noise due to hints 
backing up.

Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
n=19 -rate threads=different threads tested -schema  
replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed

Data load thread count and results:
* 1 thread - Still running but looks like the node can sustain this load 
(approx 500 writes per second per node)
* 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS measured 
in the 60 second range (approx 2k writes per second per node)
* 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
measured in the 60 second range
* 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
measured in the 60 second range  (approx 10k writes per second per node)
* 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
measured in the 60 second range  (approx 20k writes per second per node)
* 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
measured in the 60 second range  (approx 25k writes per second per node)

Note - the observed behavior was the same for all tests except for the single 
threaded test.  The single threaded test does not appear to show this behavior.

Tested different GC and Linux OS settings with a focus on the 50 and 200 thread 
loads.  

JVM settings tested:
#  default, out of the box, env-sh settings
#  10 G Max | 1 G New - default env-sh settings
#  10 G Max | 1 G New - default env-sh settings
#* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
#   20 G Max | 10 G New 
   JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
   JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
   JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
   JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
   JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
   JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
   JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
   JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
   JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
   JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
   JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
   JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
   JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
   JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
   JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
   JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
   JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
   JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
# 20 G Max | 1 G New 
   JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
   JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
   JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
   JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
   JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
   JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
   JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
   JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
   JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
   JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
   JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
   JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
   JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
   JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
   JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
   JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
   JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
   JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking

Linux OS settings tested:
# Disabled Transparent Huge Pages
echo never  /sys/kernel/mm/transparent_hugepage/enabled
echo never  /sys/kernel/mm/transparent_hugepage/defrag
# Enabled Huge Pages
echo 215  /proc/sys/kernel/shmmax (over 20GB for heap)
echo 1536  /proc/sys/vm/nr_hugepages (20GB/2MB page size)
# Disabled NUMA
numa-off in /etc/grub.confdatastax
# Verified all settings documented here were implemented
  
http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html

Attachments:
#  .yaml
#  fio output - results.tar.gz
#  50 thread heap dump - 
https://drive.google.com/a/datastax.com/file/d/0B4Imdpu2YrEbMGpCZW5ta2liQ2c/view?usp=sharing
#  100 thread - visual vm anonymous screenshot - visualvm_screenshot
#  dstat screen shot of with compaction - Node_with_compaction.png
#  dstat screen shot of without compaction -- 

[jira] [Commented] (CASSANDRA-7873) Replace AbstractRowResolver.replies with collection with tailored properties

2014-12-10 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241435#comment-14241435
 ] 

Philip Thompson commented on CASSANDRA-7873:


[~slebresne], this did fix the dtests. Thank you.

 Replace AbstractRowResolver.replies with collection with tailored properties
 

 Key: CASSANDRA-7873
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7873
 Project: Cassandra
  Issue Type: Bug
 Environment: OSX and Ubuntu 14.04
Reporter: Philip Thompson
Assignee: Benedict
  Labels: qa-resolved
 Fix For: 3.0

 Attachments: 7873.21.txt, 7873.trunk.txt, 7873.txt, 7873_fixup.txt


 The dtest auth_test.py:TestAuth.system_auth_ks_is_alterable_test is failing 
 on trunk only with the following stack trace:
 {code}
 Unexpected error in node1 node log:
 ERROR [Thrift:1] 2014-09-03 15:48:08,389 CustomTThreadPoolServer.java:219 - 
 Error occurred during processing of message.
 java.util.ConcurrentModificationException: null
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) 
 ~[na:1.7.0_65]
   at java.util.ArrayList$Itr.next(ArrayList.java:831) ~[na:1.7.0_65]
   at 
 org.apache.cassandra.service.RowDigestResolver.resolve(RowDigestResolver.java:71)
  ~[main/:na]
   at 
 org.apache.cassandra.service.RowDigestResolver.resolve(RowDigestResolver.java:28)
  ~[main/:na]
   at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:110) 
 ~[main/:na]
   at 
 org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:144)
  ~[main/:na]
   at 
 org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1228) 
 ~[main/:na]
   at 
 org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1154) 
 ~[main/:na]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:256)
  ~[main/:na]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:212)
  ~[main/:na]
   at org.apache.cassandra.auth.Auth.selectUser(Auth.java:257) ~[main/:na]
   at org.apache.cassandra.auth.Auth.isExistingUser(Auth.java:76) 
 ~[main/:na]
   at org.apache.cassandra.service.ClientState.login(ClientState.java:178) 
 ~[main/:na]
   at 
 org.apache.cassandra.thrift.CassandraServer.login(CassandraServer.java:1486) 
 ~[main/:na]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3579)
  ~[thrift/:na]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3563)
  ~[thrift/:na]
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
 ~[libthrift-0.9.1.jar:0.9.1]
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
 ~[libthrift-0.9.1.jar:0.9.1]
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:201)
  ~[main/:na]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 {code}
 That exception is thrown when the following query is sent:
 {code}
 SELECT strategy_options
   FROM system.schema_keyspaces
   WHERE keyspace_name = 'system_auth'
 {code}
 The test alters the RF of the system_auth keyspace, then shuts down and 
 restarts the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7873) Replace AbstractRowResolver.replies with collection with tailored properties

2014-12-10 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-7873:
---
Labels: qa-resolved  (was: )

 Replace AbstractRowResolver.replies with collection with tailored properties
 

 Key: CASSANDRA-7873
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7873
 Project: Cassandra
  Issue Type: Bug
 Environment: OSX and Ubuntu 14.04
Reporter: Philip Thompson
Assignee: Benedict
  Labels: qa-resolved
 Fix For: 3.0

 Attachments: 7873.21.txt, 7873.trunk.txt, 7873.txt, 7873_fixup.txt


 The dtest auth_test.py:TestAuth.system_auth_ks_is_alterable_test is failing 
 on trunk only with the following stack trace:
 {code}
 Unexpected error in node1 node log:
 ERROR [Thrift:1] 2014-09-03 15:48:08,389 CustomTThreadPoolServer.java:219 - 
 Error occurred during processing of message.
 java.util.ConcurrentModificationException: null
   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) 
 ~[na:1.7.0_65]
   at java.util.ArrayList$Itr.next(ArrayList.java:831) ~[na:1.7.0_65]
   at 
 org.apache.cassandra.service.RowDigestResolver.resolve(RowDigestResolver.java:71)
  ~[main/:na]
   at 
 org.apache.cassandra.service.RowDigestResolver.resolve(RowDigestResolver.java:28)
  ~[main/:na]
   at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:110) 
 ~[main/:na]
   at 
 org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:144)
  ~[main/:na]
   at 
 org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1228) 
 ~[main/:na]
   at 
 org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1154) 
 ~[main/:na]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:256)
  ~[main/:na]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:212)
  ~[main/:na]
   at org.apache.cassandra.auth.Auth.selectUser(Auth.java:257) ~[main/:na]
   at org.apache.cassandra.auth.Auth.isExistingUser(Auth.java:76) 
 ~[main/:na]
   at org.apache.cassandra.service.ClientState.login(ClientState.java:178) 
 ~[main/:na]
   at 
 org.apache.cassandra.thrift.CassandraServer.login(CassandraServer.java:1486) 
 ~[main/:na]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3579)
  ~[thrift/:na]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3563)
  ~[thrift/:na]
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
 ~[libthrift-0.9.1.jar:0.9.1]
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
 ~[libthrift-0.9.1.jar:0.9.1]
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:201)
  ~[main/:na]
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_65]
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_65]
   at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
 {code}
 That exception is thrown when the following query is sent:
 {code}
 SELECT strategy_options
   FROM system.schema_keyspaces
   WHERE keyspace_name = 'system_auth'
 {code}
 The test alters the RF of the system_auth keyspace, then shuts down and 
 restarts the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241452#comment-14241452
 ] 

Jonathan Ellis commented on CASSANDRA-8447:
---

It looks like the heap is full of memtable data.  Is it trying to flush and not 
able to keep up?  Or is it not recognizing that it needs to flush?

/cc [~benedict]

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, results.tar.gz, 
 visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
 # 20 G Max | 1 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS 

[jira] [Commented] (CASSANDRA-4139) Add varint encoding to Messaging service

2014-12-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241456#comment-14241456
 ] 

Benedict commented on CASSANDRA-4139:
-

We aren't bandwidth constrained for any workloads I'm aware of, so what are we 
hoping to achieve here? 

We already apply compression to the stream, so this will likely only help 
bandwidth consumption for individual small payloads where compression cannot be 
expected to yield much. In such scenarios bandwidth is especially unlikely to 
be a constraint.


 Add varint encoding to Messaging service
 

 Key: CASSANDRA-4139
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4139
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Vijay
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-4139-v1.patch, 
 0001-CASSANDRA-4139-v2.patch, 0001-CASSANDRA-4139-v4.patch, 
 0002-add-bytes-written-metric.patch, 4139-Test.rtf, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-4139-v3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241467#comment-14241467
 ] 

Jonathan Ellis commented on CASSANDRA-8447:
---

{noformat}
 INFO [OptionalTasks:1] 2014-12-03 16:33:22,382 MeteredFlusher.java (line 58) 
flushing high-traffic column family CFS(Keyspace='Keyspace1', 
ColumnFamily='Standard1') (estimated 180613400 bytes)
 INFO [OptionalTasks:1] 2014-12-03 16:33:22,383 ColumnFamilyStore.java (line 
794) Enqueuing flush of Memtable-Standard1@1920408967(18066400/180664000 
serialized/live bytes, 410600 ops)
{noformat}

Looks like it's using a liveRatio of 1.0 which is almost certainly broken.  
Need to enable debug logging on Memtable.

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, results.tar.gz, 
 visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
 # 20 G Max | 1 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS 

[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread jonathan lacefield (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241472#comment-14241472
 ] 

jonathan lacefield commented on CASSANDRA-8447:
---

1)  flushing operations appear to be fine, no backed up flushwriters, disk io 
looks acceptable as well
FlushWriter   0 0   2722 0  
   0
2) will enable debug logging on Memtable and update ticket

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, results.tar.gz, 
 visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
 # 20 G Max | 1 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12

[jira] [Commented] (CASSANDRA-8316) Did not get positive replies from all endpoints error on incremental repair

2014-12-10 Thread Alan Boudreault (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241502#comment-14241502
 ] 

Alan Boudreault commented on CASSANDRA-8316:


Just pointing in case this ticket is related: CASSANDRA-8291

  Did not get positive replies from all endpoints error on incremental repair
 --

 Key: CASSANDRA-8316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8316
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.1.2
Reporter: Loic Lambiel
Assignee: Marcus Eriksson
 Fix For: 2.1.3

 Attachments: 0001-patch.patch, 
 CassandraDaemon-2014-11-25-2.snapshot.tar.gz, test.sh


 Hi,
 I've got an issue with incremental repairs on our production 15 nodes 2.1.2 
 (new cluster, not yet loaded, RF=3)
 After having successfully performed an incremental repair (-par -inc) on 3 
 nodes, I started receiving Repair failed with error Did not get positive 
 replies from all endpoints. from nodetool on all remaining nodes :
 [2014-11-14 09:12:36,488] Starting repair command #3, repairing 108 ranges 
 for keyspace  (seq=false, full=false)
 [2014-11-14 09:12:47,919] Repair failed with error Did not get positive 
 replies from all endpoints.
 All the nodes are up and running and the local system log shows that the 
 repair commands got started and that's it.
 I've also noticed that soon after the repair, several nodes started having 
 more cpu load indefinitely without any particular reason (no tasks / queries, 
 nothing in the logs). I then restarted C* on these nodes and retried the 
 repair on several nodes, which were successful until facing the issue again.
 I tried to repro on our 3 nodes preproduction cluster without success
 It looks like I'm not the only one having this issue: 
 http://www.mail-archive.com/user%40cassandra.apache.org/msg39145.html
 Any idea?
 Thanks
 Loic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8248) Possible memory leak

2014-12-10 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-8248:
---
Attachment: 8248_v1.txt

Attaching patch that releases references to SSTR during SSTableWriter.openEarly 
if there's trouble.  See discussion on CASSANDRA-8061 - this came up during 
inspection and could lead to leaks.

 Possible memory leak 
 -

 Key: CASSANDRA-8248
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8248
 Project: Cassandra
  Issue Type: Bug
Reporter: Alexander Sterligov
Assignee: Shawn Kumar
 Attachments: 8248_v1.txt, thread_dump


 Sometimes during repair cassandra starts to consume more memory than expected.
 Total amount of data on node is about 20GB.
 Size of the data directory is 66GC because of snapshots.
 Top reports: 
 {noformat}
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 15724 loadbase  20   0  493g  55g  44g S   28 44.2   4043:24 java
 {noformat}
 At the /proc/15724/maps there are a lot of deleted file maps
 {quote}
 7f63a6102000-7f63a6332000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6332000-7f63a6562000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6562000-7f63a6792000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6792000-7f63a69c2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a69c2000-7f63a6bf2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6bf2000-7f63a6e22000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6e22000-7f63a7052000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7052000-7f63a7282000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7282000-7f63a74b2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a74b2000-7f63a76e2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a76e2000-7f63a7912000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7912000-7f63a7b42000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7b42000-7f63a7d72000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7d72000-7f63a7fa2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7fa2000-7f63a81d2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a81d2000-7f63a8402000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a8402000-7f63a8622000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a8622000-7f63a8842000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 

[jira] [Reopened] (CASSANDRA-8248) Possible memory leak

2014-12-10 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie reopened CASSANDRA-8248:

  Assignee: Joshua McKenzie  (was: Shawn Kumar)

 Possible memory leak 
 -

 Key: CASSANDRA-8248
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8248
 Project: Cassandra
  Issue Type: Bug
Reporter: Alexander Sterligov
Assignee: Joshua McKenzie
 Attachments: 8248_v1.txt, thread_dump


 Sometimes during repair cassandra starts to consume more memory than expected.
 Total amount of data on node is about 20GB.
 Size of the data directory is 66GC because of snapshots.
 Top reports: 
 {noformat}
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 15724 loadbase  20   0  493g  55g  44g S   28 44.2   4043:24 java
 {noformat}
 At the /proc/15724/maps there are a lot of deleted file maps
 {quote}
 7f63a6102000-7f63a6332000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6332000-7f63a6562000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6562000-7f63a6792000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6792000-7f63a69c2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a69c2000-7f63a6bf2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6bf2000-7f63a6e22000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6e22000-7f63a7052000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7052000-7f63a7282000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7282000-7f63a74b2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a74b2000-7f63a76e2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a76e2000-7f63a7912000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7912000-7f63a7b42000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7b42000-7f63a7d72000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7d72000-7f63a7fa2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7fa2000-7f63a81d2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a81d2000-7f63a8402000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a8402000-7f63a8622000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a8622000-7f63a8842000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a8842000-7f63a8a62000 r--s  08:21 9442763
 

[jira] [Updated] (CASSANDRA-8248) Possible memory leak

2014-12-10 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-8248:
---
Reviewer: Benedict

 Possible memory leak 
 -

 Key: CASSANDRA-8248
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8248
 Project: Cassandra
  Issue Type: Bug
Reporter: Alexander Sterligov
Assignee: Joshua McKenzie
 Attachments: 8248_v1.txt, thread_dump


 Sometimes during repair cassandra starts to consume more memory than expected.
 Total amount of data on node is about 20GB.
 Size of the data directory is 66GC because of snapshots.
 Top reports: 
 {noformat}
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 15724 loadbase  20   0  493g  55g  44g S   28 44.2   4043:24 java
 {noformat}
 At the /proc/15724/maps there are a lot of deleted file maps
 {quote}
 7f63a6102000-7f63a6332000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6332000-7f63a6562000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6562000-7f63a6792000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6792000-7f63a69c2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a69c2000-7f63a6bf2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6bf2000-7f63a6e22000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a6e22000-7f63a7052000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7052000-7f63a7282000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7282000-7f63a74b2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a74b2000-7f63a76e2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a76e2000-7f63a7912000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7912000-7f63a7b42000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7b42000-7f63a7d72000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7d72000-7f63a7fa2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a7fa2000-7f63a81d2000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a81d2000-7f63a8402000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a8402000-7f63a8622000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a8622000-7f63a8842000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  (deleted)
 7f63a8842000-7f63a8a62000 r--s  08:21 9442763
 /ssd/cassandra/data/iss/feedback_history-d32bc7e048c011e49b989bc3e8a5a440/iss-feedback_history-tmplink-ka-328671-Index.db
  

[jira] [Commented] (CASSANDRA-8430) Updating a row that has a TTL produce unexpected results

2014-12-10 Thread Andrew Garrett (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241517#comment-14241517
 ] 

Andrew Garrett commented on CASSANDRA-8430:
---

[~slebresne] Thanks for the insight. I'm using Datastax DevCenter for this, 
which to my understanding is a GUI around their Java driver, so that would 
explain that behavior with the 0 vs null thing. And that also explains the 
insert vs update behavior. Thanks, I'm satisfied.

 Updating a row that has a TTL produce unexpected results
 

 Key: CASSANDRA-8430
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8430
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
  Labels: cassandra, ttl
 Fix For: 2.0.11, 2.1.2, 3.0

 Attachments: test.sh


 Reported on stackoverflow: 
 http://stackoverflow.com/questions/27280407/cassandra-ttl-gets-set-to-0-on-primary-key-if-no-ttl-is-specified-on-an-update?newreg=19e8c6757c62474985fef7c3037e8c08
 I can reproduce the issue with 2.0, 2.1 and trunk. I've attached a small 
 script to reproduce the issue with CCM, and here is it's output:
 {code}
 aboudreault@kovarro:~/dev/cstar/so27280407$ ./test.sh 
 Current cluster is now: local
 Insert data with a 5 sec TTL
 INSERT INTO ks.tbl (pk, foo, bar) values (1, 1, 'test') using TTL 5;
  pk | bar  | foo
 +--+-
   1 | test |   1
 (1 rows)
 Update data with no TTL
 UPDATE ks.tbl set bar='change' where pk=1;
 sleep 6 sec
 BUG: Row should be deleted now, but isn't. and foo column has been deleted???
  pk | bar| foo
 ++--
   1 | change | null
 (1 rows)
 Insert data with a 5 sec TTL
 INSERT INTO ks.tbl (pk, foo, bar) values (1, 1, 'test') using TTL 5;
  pk | bar  | foo
 +--+-
   1 | test |   1
 (1 rows)
 Update data with a higher (10) TTL
 UPDATE ks.tbl USING TTL 10 set bar='change' where pk=1;
 sleep 6 sec
 BUG: foo column has been deleted?
  pk | bar| foo
 ++--
   1 | change | null
 (1 rows)
 sleep 5 sec
 Data is deleted now after the second TTL set during the update. Is this a bug 
 or the expected behavior?
 (0 rows)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread jonathan lacefield (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jonathan lacefield updated CASSANDRA-8447:
--
Attachment: memtable_debug

memtable information from system.log

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, 
 results.tar.gz, visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
 # 20 G Max | 1 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS 

[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread jonathan lacefield (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241583#comment-14241583
 ] 

jonathan lacefield commented on CASSANDRA-8447:
---

attached memtable specific log information through cassandra.db debug.  here's 
an excerpt of memorymeter ratio:
DEBUG [MemoryMeter:1] 2014-12-10 10:58:47,932 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
8.85183949414442 (just-counted was 7.703678988288838).  calculation took 9828ms 
for 518760 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:58:54,991 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
8.993666918209787 (just-counted was 7.987333836419574).  calculation took 
6766ms for 344480 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:59:04,165 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
8.820729561952485 (just-counted was 7.641459123904968).  calculation took 
8765ms for 501265 cells

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, 
 results.tar.gz, visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
 # 20 G Max | 1 G New 

[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread jonathan lacefield (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241608#comment-14241608
 ] 

jonathan lacefield commented on CASSANDRA-8447:
---

Collecting memorymeter debug statements.  durations look a bit long
-- this is during healthy peroid
DEBUG [MemoryMeter:1] 2014-12-10 10:41:41,355 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
8.924682938349301 (just-counted was 7.849365876698601).  calculation took 
8306ms for 421490 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:41:42,763 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
5.99974107609572 (just-counted was 1.9994821521914399).  calculation took 
1170ms for 79935 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:41:53,384 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
7.777244370394949 (just-counted was 7.777244370394949).  calculation took 
10491ms for 566260 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:41:56,842 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
6.347211243843944 (just-counted was 2.6944224876878886).  calculation took 
3119ms for 195905 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:42:06,136 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
8.878754207137401 (just-counted was 7.757508414274801).  calculation took 
9022ms for 507230 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:42:11,883 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
6.877887628540337 (just-counted was 3.7557752570806753).  calculation took 
5076ms for 270195 cells



 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, 
 results.tar.gz, visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS 

[jira] [Issue Comment Deleted] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread jonathan lacefield (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jonathan lacefield updated CASSANDRA-8447:
--
Comment: was deleted

(was: Collecting memorymeter debug statements.  durations look a bit long
-- this is during healthy peroid
DEBUG [MemoryMeter:1] 2014-12-10 10:41:41,355 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
8.924682938349301 (just-counted was 7.849365876698601).  calculation took 
8306ms for 421490 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:41:42,763 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
5.99974107609572 (just-counted was 1.9994821521914399).  calculation took 
1170ms for 79935 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:41:53,384 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
7.777244370394949 (just-counted was 7.777244370394949).  calculation took 
10491ms for 566260 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:41:56,842 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
6.347211243843944 (just-counted was 2.6944224876878886).  calculation took 
3119ms for 195905 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:42:06,136 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
8.878754207137401 (just-counted was 7.757508414274801).  calculation took 
9022ms for 507230 cells
DEBUG [MemoryMeter:1] 2014-12-10 10:42:11,883 Memtable.java (line 473) 
CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 
6.877887628540337 (just-counted was 3.7557752570806753).  calculation took 
5076ms for 270195 cells

)

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, 
 results.tar.gz, visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8

[jira] [Created] (CASSANDRA-8455) IndexOutOfBoundsException when building SyntaxError message snippet

2014-12-10 Thread Tyler Hobbs (JIRA)
Tyler Hobbs created CASSANDRA-8455:
--

 Summary: IndexOutOfBoundsException when building SyntaxError 
message snippet
 Key: CASSANDRA-8455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8455
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Tyler Hobbs
Assignee: Benjamin Lerer
Priority: Minor
 Fix For: 2.1.3


It looks like some syntax errors can result in an IndexOutOfBoundsException 
when the error message snippet is being built:

{noformat}
cqlsh create table foo (a int primary key, b int;
ErrorMessage code=2000 [Syntax error in CQL query] message=Failed parsing 
statement: [create table foo (a int primary key, b int;] reason: 
ArrayIndexOutOfBoundsException -1
{noformat}

There isn't any error or stacktrace in the server logs.  It would be good to 
fix that as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4139) Add varint encoding to Messaging service

2014-12-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241644#comment-14241644
 ] 

Ariel Weisberg commented on CASSANDRA-4139:
---

Is bandwidth a constraint for WAN replication? In practice is the default for 
messaging to have compression on? What are people doing in the wild?

I could imagine varint encoding being a win for Cells where the names and 
values are integers and queries are bulk loading or selecting ranges. At the 
storage level it seems like the kind of thing that could beat general purpose 
compression if you know what data type you are dealing with and have a lot of 0 
padded values.

I have heard talk about using a column store and run length encoding approach 
for storage which makes it seem like varint encoding would be the tool of 
choice for storage either.

The code changes don't look bad. It's mostly swapping types for streams and 
changes to calculating serialized size so that it is aware of the impact of 
variable length encoded integers. It could save bandwidth, but it could also be 
slower since you spend more cycles calculating serialized size and 
encoding/decoding integers. If you end up using compression in bandwidth 
sensitive scenarios you may not win much.

Not varint encoding the data going in/out of the database means you only save 
real space proportionally when you have small operations going in/out. The flip 
side is that you can't do that many small ops anyways so you aren't bandwidth 
constrained.

 Add varint encoding to Messaging service
 

 Key: CASSANDRA-4139
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4139
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Vijay
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-4139-v1.patch, 
 0001-CASSANDRA-4139-v2.patch, 0001-CASSANDRA-4139-v4.patch, 
 0002-add-bytes-written-metric.patch, 4139-Test.rtf, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-4139-v3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-4139) Add varint encoding to Messaging service

2014-12-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241644#comment-14241644
 ] 

Ariel Weisberg edited comment on CASSANDRA-4139 at 12/10/14 8:19 PM:
-

Is bandwidth a constraint for WAN replication? In practice is the default for 
messaging to have compression on? What are people doing in the wild?

I could imagine varint encoding being a win for Cells where the names and 
values are integers and queries are bulk loading or selecting ranges. At the 
storage level it seems like the kind of thing that could beat general purpose 
compression if you know what data type you are dealing with and have a lot of 0 
padded values.

I have heard talk about using a column store and run length encoding approach 
for storage which makes it seem like varint encoding wouldn't be the tool of 
choice for storage either.

The code changes don't look bad. It's mostly swapping types for streams and 
changes to calculating serialized size so that it is aware of the impact of 
variable length encoded integers. It could save bandwidth, but it could also be 
slower since you spend more cycles calculating serialized size and 
encoding/decoding integers. If you end up using compression in bandwidth 
sensitive scenarios you may not win much.

Not varint encoding the data going in/out of the database means you only save 
real space proportionally when you have small operations going in/out. The flip 
side is that you can't do that many small ops anyways so you aren't bandwidth 
constrained.


was (Author: aweisberg):
Is bandwidth a constraint for WAN replication? In practice is the default for 
messaging to have compression on? What are people doing in the wild?

I could imagine varint encoding being a win for Cells where the names and 
values are integers and queries are bulk loading or selecting ranges. At the 
storage level it seems like the kind of thing that could beat general purpose 
compression if you know what data type you are dealing with and have a lot of 0 
padded values.

I have heard talk about using a column store and run length encoding approach 
for storage which makes it seem like varint encoding would be the tool of 
choice for storage either.

The code changes don't look bad. It's mostly swapping types for streams and 
changes to calculating serialized size so that it is aware of the impact of 
variable length encoded integers. It could save bandwidth, but it could also be 
slower since you spend more cycles calculating serialized size and 
encoding/decoding integers. If you end up using compression in bandwidth 
sensitive scenarios you may not win much.

Not varint encoding the data going in/out of the database means you only save 
real space proportionally when you have small operations going in/out. The flip 
side is that you can't do that many small ops anyways so you aren't bandwidth 
constrained.

 Add varint encoding to Messaging service
 

 Key: CASSANDRA-4139
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4139
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Vijay
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-4139-v1.patch, 
 0001-CASSANDRA-4139-v2.patch, 0001-CASSANDRA-4139-v4.patch, 
 0002-add-bytes-written-metric.patch, 4139-Test.rtf, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-4139-v3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241665#comment-14241665
 ] 

Joshua McKenzie commented on CASSANDRA-8390:


[~alexander_radzin]: how many runs does it take with the attached cqlSync test 
to reproduce?  Also - are you running 2.1.1 or 2.1.2?

 I'm thus far unable to reproduce on either win7 or win8.1 with cqlSync, with 
or without memory-mapped index files.

Also - is there an antivirus client installed in this test environment?  We've 
seen issues w/file access violations on Windows in the past due to that as well.

 The process cannot access the file because it is being used by another process
 --

 Key: CASSANDRA-8390
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8390
 Project: Cassandra
  Issue Type: Bug
Reporter: Ilya Komolkin
Assignee: Joshua McKenzie
 Fix For: 2.1.3


 21:46:27.810 [NonPeriodicTasks:1] ERROR o.a.c.service.CassandraDaemon - 
 Exception in thread Thread[NonPeriodicTasks:1,5,main]
 org.apache.cassandra.io.FSWriteError: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:135) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:121) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:113) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableDeletingTask.run(SSTableDeletingTask.java:94)
  ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.io.sstable.SSTableReader$6.run(SSTableReader.java:664) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_71]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_71]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_71]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
 Caused by: java.nio.file.FileSystemException: 
 E:\Upsource_12391\data\cassandra\data\kernel\filechangehistory_t-a277b560764611e48c8e4915424c75fe\kernel-filechangehistory_t-ka-33-Index.db:
  The process cannot access the file because it is being used by another 
 process.
  
 at 
 sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) 
 ~[na:1.7.0_71]
 at 
 sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
  ~[na:1.7.0_71]
 at 
 sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
  ~[na:1.7.0_71]
 at java.nio.file.Files.delete(Files.java:1079) ~[na:1.7.0_71]
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:131) 
 ~[cassandra-all-2.1.1.jar:2.1.1]
 ... 11 common frames omitted



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-7061) High accuracy, low overhead local read/write tracing

2014-12-10 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg reassigned CASSANDRA-7061:
-

Assignee: Ariel Weisberg

 High accuracy, low overhead local read/write tracing
 

 Key: CASSANDRA-7061
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7061
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Ariel Weisberg
 Fix For: 3.0


 External profilers are pretty inadequate for getting accurate information at 
 the granularity we're working at: tracing is too high overhead, so measures 
 something completely different, and sampling suffers from bias of attribution 
 due to the way the stack traces are retrieved. Hyperthreading can make this 
 even worse.
 I propose to introduce an extremely low overhead tracing feature that must be 
 enabled with a system property that will trace operations within the node 
 only, so that we can perform various accurate low level analyses of 
 performance. This information will include threading info, so that we can 
 trace hand off delays and actual active time spent processing an operation. 
 With the property disabled there will be no increased burden of tracing, 
 however I hope to keep the total trace burden to less than one microsecond, 
 and any single trace command to a few tens of nanos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8456) Some valid index queries can be considered as invalid

2014-12-10 Thread Benjamin Lerer (JIRA)
Benjamin Lerer created CASSANDRA-8456:
-

 Summary: Some valid index queries can be considered as invalid
 Key: CASSANDRA-8456
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8456
 Project: Cassandra
  Issue Type: Bug
Reporter: Benjamin Lerer
Assignee: Benjamin Lerer


Some secondary index queries are rejected or need ALLOW FILTERING but should 
not. It seems that in certain case {{SelectStatement}} use index filtering for 
clustering column restrictions while it should be using clustering column 
slices.

The following unit tests can be used to reproduce the problem in 3.0
{code}
@Test
public void testMultipleClusteringWithIndex() throws Throwable
{
createTable(CREATE TABLE %s (a int, b int, c int, d int, e int, 
PRIMARY KEY (a, b, c, d)));
createIndex(CREATE INDEX ON %s (b));
createIndex(CREATE INDEX ON %s (e));

execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 0, 
0, 0, 0);
execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 0, 
1, 0, 1);
execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 0, 
1, 1, 2);

execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 1, 
0, 0, 0);
execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 1, 
1, 0, 1);
execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 1, 
1, 1, 2);

execute(INSERT INTO %s (a, b, c, d, e) VALUES (?, ?, ?, ?, ?), 0, 2, 
0, 0, 0);

assertRows(execute(SELECT * FROM %s WHERE (b, c) = (?, ?), 1, 1),
   row(0, 1, 1, 0, 1),
   row(0, 1, 1, 1, 2));
}

@Test
public void testMultiplePartitionKeyAndMultiClusteringWithIndex() throws 
Throwable
{
createTable(CREATE TABLE %s (a int, b int, c int, d int, e int, f int, 
PRIMARY KEY ((a, b), c, d, e)));
createIndex(CREATE INDEX ON %s (c));
createIndex(CREATE INDEX ON %s (f));

execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 
0, 0, 0, 0, 0, 0);
execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 
0, 0, 0, 1, 0, 1);
execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 
0, 0, 0, 1, 1, 2);

execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 
0, 0, 1, 0, 0, 3);
execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 
0, 0, 1, 1, 0, 4);
execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 
0, 0, 1, 1, 1, 5);

execute(INSERT INTO %s (a, b, c, d, e, f) VALUES (?, ?, ?, ?, ?, ?), 
0, 0, 2, 0, 0, 6);

assertRows(execute(SELECT * FROM %s WHERE a = ? AND (c) IN ((?), (?)) 
AND f = ?, 0, 1, 2, 5),
   row(0, 0, 1, 1, 1, 5));

assertRows(execute(SELECT * FROM %s WHERE a = ? AND (c, d) IN ((?, ?)) 
AND f = ?, 0, 1, 1, 5),
   row(0, 0, 1, 1, 1, 5));

assertRows(execute(SELECT * FROM %s WHERE a = ? AND (c) = (?) AND f = 
?, 0, 1, 5),
   row(0, 0, 1, 1, 1, 5));
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8449) Allow zero-copy reads again

2014-12-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241737#comment-14241737
 ] 

Jonathan Ellis commented on CASSANDRA-8449:
---

Right, even after 7392 a timeout approach is dangerous.  

bq. Typically I would not want to use this approach for guarding operations 
that could take arbitrarily long, but really all we're sacrificing is virtual 
address space

Can you spell that out for me?  Isn't the existing use of OpOrder technically 
arbitrarily long due to GC for instance?

 Allow zero-copy reads again
 ---

 Key: CASSANDRA-8449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8449
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: T Jake Luciani
Priority: Minor
  Labels: performance
 Fix For: 3.0


 We disabled zero-copy reads in CASSANDRA-3179 due to in flight reads 
 accessing a ByteBuffer when the data was unmapped by compaction.  Currently 
 this code path is only used for uncompressed reads.
 The actual bytes are in fact copied to the client output buffers for both 
 netty and thrift before being sent over the wire, so the only issue really is 
 the time it takes to process the read internally.  
 This patch adds a slow network read test and changes the tidy() method to 
 actually delete a sstable once the readTimeout has elapsed giving plenty of 
 time to serialize the read.
 Removing this copy causes significantly less GC on the read path and improves 
 the tail latencies:
 http://cstar.datastax.com/graph?stats=c0c8ce16-7fea-11e4-959d-42010af0688fmetric=gc_countoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=109.34ymin=0ymax=5.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[4/6] cassandra git commit: merge from 2.0

2014-12-10 Thread jbellis
merge from 2.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/27c67ad8
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/27c67ad8
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/27c67ad8

Branch: refs/heads/trunk
Commit: 27c67ad851651cb49c9d1cae7d478b831e372aaf
Parents: 29259cb 5784309
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Dec 10 15:19:48 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Dec 10 15:19:48 2014 -0600

--
 CHANGES.txt| 1 +
 .../db/compaction/DateTieredCompactionStrategyOptions.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/27c67ad8/CHANGES.txt
--
diff --cc CHANGES.txt
index 2e74a15,385af01..25e0f47
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,24 -1,5 +1,25 @@@
 -2.0.12:
 +2.1.3
 + * Remove tmplink files for offline compactions (CASSANDRA-8321)
 + * Reduce maxHintsInProgress (CASSANDRA-8415)
 + * BTree updates may call provided update function twice (CASSANDRA-8018)
 + * Release sstable references after anticompaction (CASSANDRA-8386)
 + * Handle abort() in SSTableRewriter properly (CASSANDRA-8320)
 + * Fix high size calculations for prepared statements (CASSANDRA-8231)
 + * Centralize shared executors (CASSANDRA-8055)
 + * Fix filtering for CONTAINS (KEY) relations on frozen collection
 +   clustering columns when the query is restricted to a single
 +   partition (CASSANDRA-8203)
 + * Do more aggressive entire-sstable TTL expiry checks (CASSANDRA-8243)
 + * Add more log info if readMeter is null (CASSANDRA-8238)
 + * add check of the system wall clock time at startup (CASSANDRA-8305)
 + * Support for frozen collections (CASSANDRA-7859)
 + * Fix overflow on histogram computation (CASSANDRA-8028)
 + * Have paxos reuse the timestamp generation of normal queries 
(CASSANDRA-7801)
 + * Fix incremental repair not remove parent session on remote (CASSANDRA-8291)
 + * Improve JBOD disk utilization (CASSANDRA-7386)
 + * Log failed host when preparing incremental repair (CASSANDRA-8228)
 +Merged from 2.0:
+  * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417)
   * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346)
   * Throw correct exception when trying to bind a keyspace or table
 name (CASSANDRA-6952)



[5/6] cassandra git commit: merge from 2.0

2014-12-10 Thread jbellis
merge from 2.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/27c67ad8
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/27c67ad8
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/27c67ad8

Branch: refs/heads/cassandra-2.1
Commit: 27c67ad851651cb49c9d1cae7d478b831e372aaf
Parents: 29259cb 5784309
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Dec 10 15:19:48 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Dec 10 15:19:48 2014 -0600

--
 CHANGES.txt| 1 +
 .../db/compaction/DateTieredCompactionStrategyOptions.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/27c67ad8/CHANGES.txt
--
diff --cc CHANGES.txt
index 2e74a15,385af01..25e0f47
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,24 -1,5 +1,25 @@@
 -2.0.12:
 +2.1.3
 + * Remove tmplink files for offline compactions (CASSANDRA-8321)
 + * Reduce maxHintsInProgress (CASSANDRA-8415)
 + * BTree updates may call provided update function twice (CASSANDRA-8018)
 + * Release sstable references after anticompaction (CASSANDRA-8386)
 + * Handle abort() in SSTableRewriter properly (CASSANDRA-8320)
 + * Fix high size calculations for prepared statements (CASSANDRA-8231)
 + * Centralize shared executors (CASSANDRA-8055)
 + * Fix filtering for CONTAINS (KEY) relations on frozen collection
 +   clustering columns when the query is restricted to a single
 +   partition (CASSANDRA-8203)
 + * Do more aggressive entire-sstable TTL expiry checks (CASSANDRA-8243)
 + * Add more log info if readMeter is null (CASSANDRA-8238)
 + * add check of the system wall clock time at startup (CASSANDRA-8305)
 + * Support for frozen collections (CASSANDRA-7859)
 + * Fix overflow on histogram computation (CASSANDRA-8028)
 + * Have paxos reuse the timestamp generation of normal queries 
(CASSANDRA-7801)
 + * Fix incremental repair not remove parent session on remote (CASSANDRA-8291)
 + * Improve JBOD disk utilization (CASSANDRA-7386)
 + * Log failed host when preparing incremental repair (CASSANDRA-8228)
 +Merged from 2.0:
+  * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417)
   * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346)
   * Throw correct exception when trying to bind a keyspace or table
 name (CASSANDRA-6952)



[1/6] cassandra git commit: Default DTCS base_time_seconds changed to 60 patch by Björn Hegerfors; reviewed by jbellis for CASSANDRA-8417

2014-12-10 Thread jbellis
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.0 77df5578a - 578430952
  refs/heads/cassandra-2.1 29259cb22 - 27c67ad85
  refs/heads/trunk c64ac4188 - 6ce8b3fcb


Default DTCS base_time_seconds changed to 60
patch by Björn Hegerfors; reviewed by jbellis for CASSANDRA-8417


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/57843095
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/57843095
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/57843095

Branch: refs/heads/cassandra-2.0
Commit: 578430952789bbc2dc7d9b17f4f4b41495d0757f
Parents: 77df557
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Dec 10 15:19:11 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Dec 10 15:19:11 2014 -0600

--
 CHANGES.txt| 1 +
 .../db/compaction/DateTieredCompactionStrategyOptions.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/57843095/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 3c651ff..385af01 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.0.12:
+ * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417)
  * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346)
  * Throw correct exception when trying to bind a keyspace or table
name (CASSANDRA-6952)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/57843095/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
--
diff --git 
a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
 
b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
index 9fed3e0..ddc8dc7 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
@@ -26,7 +26,7 @@ public final class DateTieredCompactionStrategyOptions
 {
 protected static final TimeUnit DEFAULT_TIMESTAMP_RESOLUTION = 
TimeUnit.MICROSECONDS;
 protected static final long DEFAULT_MAX_SSTABLE_AGE_DAYS = 365;
-protected static final long DEFAULT_BASE_TIME_SECONDS = 60 * 60;
+protected static final long DEFAULT_BASE_TIME_SECONDS = 60;
 protected static final String TIMESTAMP_RESOLUTION_KEY = 
timestamp_resolution;
 protected static final String MAX_SSTABLE_AGE_KEY = max_sstable_age_days;
 protected static final String BASE_TIME_KEY = base_time_seconds;



[6/6] cassandra git commit: Merge branch 'cassandra-2.1' into trunk

2014-12-10 Thread jbellis
Merge branch 'cassandra-2.1' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6ce8b3fc
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6ce8b3fc
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6ce8b3fc

Branch: refs/heads/trunk
Commit: 6ce8b3fcbbd5f6638ee635fbc395541afdb5eef8
Parents: c64ac41 27c67ad
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Dec 10 15:19:54 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Dec 10 15:19:54 2014 -0600

--
 CHANGES.txt| 1 +
 .../db/compaction/DateTieredCompactionStrategyOptions.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/6ce8b3fc/CHANGES.txt
--



[2/6] cassandra git commit: Default DTCS base_time_seconds changed to 60 patch by Björn Hegerfors; reviewed by jbellis for CASSANDRA-8417

2014-12-10 Thread jbellis
Default DTCS base_time_seconds changed to 60
patch by Björn Hegerfors; reviewed by jbellis for CASSANDRA-8417


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/57843095
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/57843095
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/57843095

Branch: refs/heads/cassandra-2.1
Commit: 578430952789bbc2dc7d9b17f4f4b41495d0757f
Parents: 77df557
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Dec 10 15:19:11 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Dec 10 15:19:11 2014 -0600

--
 CHANGES.txt| 1 +
 .../db/compaction/DateTieredCompactionStrategyOptions.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/57843095/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 3c651ff..385af01 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.0.12:
+ * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417)
  * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346)
  * Throw correct exception when trying to bind a keyspace or table
name (CASSANDRA-6952)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/57843095/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
--
diff --git 
a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
 
b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
index 9fed3e0..ddc8dc7 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
@@ -26,7 +26,7 @@ public final class DateTieredCompactionStrategyOptions
 {
 protected static final TimeUnit DEFAULT_TIMESTAMP_RESOLUTION = 
TimeUnit.MICROSECONDS;
 protected static final long DEFAULT_MAX_SSTABLE_AGE_DAYS = 365;
-protected static final long DEFAULT_BASE_TIME_SECONDS = 60 * 60;
+protected static final long DEFAULT_BASE_TIME_SECONDS = 60;
 protected static final String TIMESTAMP_RESOLUTION_KEY = 
timestamp_resolution;
 protected static final String MAX_SSTABLE_AGE_KEY = max_sstable_age_days;
 protected static final String BASE_TIME_KEY = base_time_seconds;



[3/6] cassandra git commit: Default DTCS base_time_seconds changed to 60 patch by Björn Hegerfors; reviewed by jbellis for CASSANDRA-8417

2014-12-10 Thread jbellis
Default DTCS base_time_seconds changed to 60
patch by Björn Hegerfors; reviewed by jbellis for CASSANDRA-8417


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/57843095
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/57843095
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/57843095

Branch: refs/heads/trunk
Commit: 578430952789bbc2dc7d9b17f4f4b41495d0757f
Parents: 77df557
Author: Jonathan Ellis jbel...@apache.org
Authored: Wed Dec 10 15:19:11 2014 -0600
Committer: Jonathan Ellis jbel...@apache.org
Committed: Wed Dec 10 15:19:11 2014 -0600

--
 CHANGES.txt| 1 +
 .../db/compaction/DateTieredCompactionStrategyOptions.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/57843095/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 3c651ff..385af01 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.0.12:
+ * Default DTCS base_time_seconds changed to 60 (CASSANDRA-8417)
  * Refuse Paxos operation with more than one pending endpoint (CASSANDRA-8346)
  * Throw correct exception when trying to bind a keyspace or table
name (CASSANDRA-6952)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/57843095/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
--
diff --git 
a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
 
b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
index 9fed3e0..ddc8dc7 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
@@ -26,7 +26,7 @@ public final class DateTieredCompactionStrategyOptions
 {
 protected static final TimeUnit DEFAULT_TIMESTAMP_RESOLUTION = 
TimeUnit.MICROSECONDS;
 protected static final long DEFAULT_MAX_SSTABLE_AGE_DAYS = 365;
-protected static final long DEFAULT_BASE_TIME_SECONDS = 60 * 60;
+protected static final long DEFAULT_BASE_TIME_SECONDS = 60;
 protected static final String TIMESTAMP_RESOLUTION_KEY = 
timestamp_resolution;
 protected static final String MAX_SSTABLE_AGE_KEY = max_sstable_age_days;
 protected static final String BASE_TIME_KEY = base_time_seconds;



[jira] [Resolved] (CASSANDRA-6060) Remove internal use of Strings for ks/cf names

2014-12-10 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-6060.
---
   Resolution: Won't Fix
Fix Version/s: (was: 3.0)

All right, let's wontfix this.

 Remove internal use of Strings for ks/cf names
 --

 Key: CASSANDRA-6060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6060
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance

 We toss a lot of Strings around internally, including across the network.  
 Once a request has been Prepared, we ought to be able to encode these as int 
 ids.
 Unfortuntely, we moved from int to uuid in CASSANDRA-3794, which was a 
 reasonable move at the time, but a uuid is a lot bigger than an int.  Now 
 that we have CAS we can allow concurrent schema updates while still using 
 sequential int IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8457) nio MessagingService

2014-12-10 Thread Jonathan Ellis (JIRA)
Jonathan Ellis created CASSANDRA-8457:
-

 Summary: nio MessagingService
 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
 Fix For: 3.0


Thread-per-peer (actually two each incoming and outbound) is a big contributor 
to context switching, especially for larger clusters.  Let's look at switching 
to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-6060) Remove internal use of Strings for ks/cf names

2014-12-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241364#comment-14241364
 ] 

Ariel Weisberg edited comment on CASSANDRA-6060 at 12/10/14 9:30 PM:
-

I am still digging but I am not sure there is much value here.

For prepared statements between client and server there are no ks/cf names.

Here is the breakdown for a minimum size mutation inside the cluster

Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes

4-bytes protocol magic
4-bytes version
4-bytes timestamp
4-bytes verb
4-bytes parameter count
4-bytes payload length prefix
No keyspace name in current versions
2-byte key length
key say 10 bytes
4-byte mutation count

1-byte boolean
16-byte cf id
4-byte count of columns

Per column
2-byte column name length prefix
column name say 8 bytes
1-byte serialization flags
8-byte timestamp
4-byte length prefix
column value say 8 bytes

Total is 158 bytes. Saving 12 bytes on the CF uuid would be 7.5 %. 

For single CF mutations this is not a win. Loading data points 16 bytes at a 
time isn't going to work so hot anyways so people might look into batching at 
that point.

The UUID is not repeated for each cell so it is a one time cost for workloads 
that modify multiple cells per CF. The one case where the 12-bytes becomes 
significant is single cell updates to multiple CFs in one mutation. There the 
12-byte overhead converges on 23%.

I am going to look at the read path next, but I kind of expect to find 
something similar. A read is going t o have key overhead and possibly overhead 
for all the other query parameters that should match the simple single cell 
mutation case.


was (Author: aweisberg):
I am still digging but I am not sure there is much value here.

For prepared statements between client and server there are no ks/cf names.

Here is the breakdown for a minimum size mutation inside the cluster

Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes

4-bytes protocol magic
4-bytes version
4-bytes timestamp
4-bytes verb
4-bytes parameter count
4-bytes payload length prefix
No keyspace name in current versions
2-byte key length
key say 10 bytes
4-byte mutation count

1-byte boolean
16-byte cf id
4-byte count of columns

Per column
2-byte column name length prefix
column name say 8 bytes
1-byte serialization flags
8-byte timestamp
4-byte length prefix
column value say 8 bytes

Total is 158 bytes. Saving 12 bytes on the CF uuid would be 7.5 %. 

For single CF mutations this is not a win. Loading data points 16 bytes at a 
time isn't going to work so hot anyways so people might look into batching at 
that point.

The UUID is not repeated for each cell so it is a one time cost so for 
workloads that modify multiple cells per CF. The one case where the 12-bytes 
becomes significant is single cell updates to multiple CFs in one mutation. 
There the 12-byte overhead converges on 23%.

I am going to look at the read path next, but I kind of expect to find 
something similar. A read is going t o have key overhead and possibly overhead 
for all the other query parameters that should match the simple single cell 
mutation case.

 Remove internal use of Strings for ks/cf names
 --

 Key: CASSANDRA-6060
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6060
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance

 We toss a lot of Strings around internally, including across the network.  
 Once a request has been Prepared, we ought to be able to encode these as int 
 ids.
 Unfortuntely, we moved from int to uuid in CASSANDRA-3794, which was a 
 reasonable move at the time, but a uuid is a lot bigger than an int.  Now 
 that we have CAS we can allow concurrent schema updates while still using 
 sequential int IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8457) nio MessagingService

2014-12-10 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-8457:
--
Labels: performance  (was: )

 nio MessagingService
 

 Key: CASSANDRA-8457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Ariel Weisberg
  Labels: performance
 Fix For: 3.0


 Thread-per-peer (actually two each incoming and outbound) is a big 
 contributor to context switching, especially for larger clusters.  Let's look 
 at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8418) Queries that require allow filtering are working without it

2014-12-10 Thread Benjamin Lerer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241781#comment-14241781
 ] 

Benjamin Lerer commented on CASSANDRA-8418:
---

All my apologies. I was wrong the DTest was right. The queries do not need 
{{ALLOW FILTERING}} because as {{time1}} is a clustering column the secondary 
index code can use a {{SliceQueryFilter}} instead of doing some post filtering 
on the results returned by the index.

 Queries that require allow filtering are working without it
 ---

 Key: CASSANDRA-8418
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8418
 Project: Cassandra
  Issue Type: Bug
Reporter: Philip Thompson
Assignee: Benjamin Lerer
Priority: Minor
 Fix For: 2.0.12, 2.1.3


 The trunk dtest {{cql_tests.py:TestCQL.composite_index_with_pk_test}} has 
 begun failing after the changes to CASSANDRA-7981. 
 With the schema {code}CREATE TABLE blogs (
 blog_id int,
 time1 int,
 time2 int,
 author text,
 content text,
 PRIMARY KEY (blog_id, time1, time2){code}
 and {code}CREATE INDEX ON blogs(author){code}, then the query
 {code}SELECT blog_id, content FROM blogs WHERE time1  0 AND 
 author='foo'{code} now requires ALLOW FILTERING, but did not before the 
 refactor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241785#comment-14241785
 ] 

Jonathan Ellis commented on CASSANDRA-8447:
---

Is that still reporting serialized/live size as the same on flush?

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, 
 results.tar.gz, visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
 # 20 G Max | 1 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS 

[jira] [Updated] (CASSANDRA-8418) Queries that require allow filtering are working without it

2014-12-10 Thread Benjamin Lerer (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-8418:
--
Attachment: CASSANDRA-8418.txt

The patch fix the code of trunk and the unit tests.

 Queries that require allow filtering are working without it
 ---

 Key: CASSANDRA-8418
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8418
 Project: Cassandra
  Issue Type: Bug
Reporter: Philip Thompson
Assignee: Benjamin Lerer
Priority: Minor
 Fix For: 2.0.12, 2.1.3

 Attachments: CASSANDRA-8418.txt


 The trunk dtest {{cql_tests.py:TestCQL.composite_index_with_pk_test}} has 
 begun failing after the changes to CASSANDRA-7981. 
 With the schema {code}CREATE TABLE blogs (
 blog_id int,
 time1 int,
 time2 int,
 author text,
 content text,
 PRIMARY KEY (blog_id, time1, time2){code}
 and {code}CREATE INDEX ON blogs(author){code}, then the query
 {code}SELECT blog_id, content FROM blogs WHERE time1  0 AND 
 author='foo'{code} now requires ALLOW FILTERING, but did not before the 
 refactor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4139) Add varint encoding to Messaging service

2014-12-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241790#comment-14241790
 ] 

Jonathan Ellis commented on CASSANDRA-4139:
---

bq. Is bandwidth a constraint for WAN replication? In practice is the default 
for messaging to have compression on?

Often, yes.  internode_compression has defaulted to all for a while now.  
Most people probably leave it at that; the rest change it to dc.

 Add varint encoding to Messaging service
 

 Key: CASSANDRA-4139
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4139
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Vijay
Assignee: Ariel Weisberg
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-4139-v1.patch, 
 0001-CASSANDRA-4139-v2.patch, 0001-CASSANDRA-4139-v4.patch, 
 0002-add-bytes-written-metric.patch, 4139-Test.rtf, 
 ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-4139-v3.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread jonathan lacefield (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241902#comment-14241902
 ] 

jonathan lacefield commented on CASSANDRA-8447:
---

they are still very close, yes.
INFO [OptionalTasks:1] 2014-12-10 11:11:56,736 ColumnFamilyStore.java (line 
794) Enqueuing flush of Memtable-Standard1@467482876(21395220/213952200 
serialized/live bytes, 486255 ops)
 INFO [OptionalTasks:1] 2014-12-10 11:11:58,746 ColumnFamilyStore.java (line 
794) Enqueuing flush of Memtable-Standard1@550824252(20002840/200028400 
serialized/live bytes, 454610 ops)
 INFO [OptionalTasks:1] 2014-12-10 11:12:00,765 ColumnFamilyStore.java (line 
794) Enqueuing flush of Memtable-Standard1@1776946438(19270460/192704600 
serialized/live bytes, 437965 ops)
 INFO [OptionalTasks:1] 2014-12-10 11:12:02,777 ColumnFamilyStore.java (line 
794) Enqueuing flush of Memtable-Standard1@2007866469(20061800/200618000 
serialized/live bytes, 455950 ops)
 INFO [OptionalTasks:1] 2014-12-10 11:12:04,946 ColumnFamilyStore.java (line 
794) Enqueuing flush of Memtable-Standard1@458183382(19050680/190506800 
serialized/live bytes, 432970 ops)
 INFO [OptionalTasks:1] 2014-12-10 11:12:06,961 ColumnFamilyStore.java (line 
794) Enqueuing flush of Memtable-Standard1@2027660149(23800920/238009200 
serialized/live bytes, 540930 ops)
 INFO [OptionalTasks:1] 2014-12-10 11:12:09,237 ColumnFamilyStore.java (line 
794) Enqueuing flush of Memtable-Standard1@841856891(21873060/218730600 
serialized/live bytes, 497115 ops)

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, 
 results.tar.gz, visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS 

[jira] [Commented] (CASSANDRA-8390) The process cannot access the file because it is being used by another process

2014-12-10 Thread Alexander Radzin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241928#comment-14241928
 ] 

Alexander Radzin commented on CASSANDRA-8390:
-

I have cassandra 2.1.2, windowss 8.1. Typically it takes 1.5 iterations of main 
loop to reproduce the problem with cqlSync. As a windows 8 user I have Windows 
Defender. It is turned on.

I have just ran the test again and it passed. Then I changed the year from 2013 
to 2014 and ran the test again. It failed when arrived to month 11. 

{noformat}
CREATE TABLE measure_201411.index_bcon_page_load_aggregation (partition ascii, 
attr ascii, value varchar, time timeuuid, bloom blob, PRIMARY KEY (partition, 
attr, value, time)) WITH compaction = { 'class' : 
'SizeTieredCompactionStrategy', 'min_threshold' : 40, 'max_threshold' : 45 } 
AND gc_grace_seconds = 0 AND memtable_flush_period_in_ms = 30;
CREATE TABLE measure_201411.bcon_page_event_aggregation (partition ascii, time 
timeuuid, data blob, PRIMARY KEY (partition, time)) WITH compaction = { 'class' 
: 'SizeTieredCompactionStrategy', 'min_threshold' : 40, 'max_threshold' : 45 } 
AND gc_grace_seconds = 0 AND memtable_flush_period_in_ms = 30;

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (tried: localhost/127.0.0.1:9042 
(com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Error 
writing: Closed channel))
at 
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
at 
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:259)
at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:175)
at 
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at 
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
at 
com.clarisite.clingine.dataaccesslayer.cassandra.CQLTest.cqlSync(CQLTest.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
com.intellij.junit4.JUnit4TestRunnerUtil$IgnoreIgnoredTestJUnit4ClassRunner.runChild(JUnit4TestRunnerUtil.java:269)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:202)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:121)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All 
host(s) tried for query failed (tried: localhost/127.0.0.1:9042 
(com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Error 
writing: Closed channel))
at 
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:102)
at 
com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:176)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   

[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242160#comment-14242160
 ] 

Jonathan Ellis commented on CASSANDRA-8447:
---

... realized that the live bytes have an extra zero.  So it's actually 10.0 
liveRatio, which is what it defaults to when it hasn't been computed on a fresh 
memtable.

 Nodes stuck in CMS GC cycle with very little traffic when compaction is 
 enabled
 ---

 Key: CASSANDRA-8447
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cluster size - 4 nodes
 Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
 (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
 OS - RHEL 6.5
 jvm - oracle 1.7.0_71
 Cassandra version 2.0.11
Reporter: jonathan lacefield
 Attachments: Node_with_compaction.png, Node_without_compaction.png, 
 cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, 
 results.tar.gz, visualvm_screenshot


 Behavior - If autocompaction is enabled, nodes will become unresponsive due 
 to a full Old Gen heap which is not cleared during CMS GC.
 Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
 enabled on 1 node.  Executed different Cassandra stress loads, using write 
 only operations.  Monitored visualvm and jconsole for heap pressure.  
 Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
 load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
 to hints backing up.
 Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
 n=19 -rate threads=different threads tested -schema  
 replication\(factor=3\)  keyspace=Keyspace1 -node all nodes listed
 Data load thread count and results:
 * 1 thread - Still running but looks like the node can sustain this load 
 (approx 500 writes per second per node)
 * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range (approx 2k writes per second per node)
 * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range
 * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 10k writes per second per node)
 * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 20k writes per second per node)
 * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
 measured in the 60 second range  (approx 25k writes per second per node)
 Note - the observed behavior was the same for all tests except for the single 
 threaded test.  The single threaded test does not appear to show this 
 behavior.
 Tested different GC and Linux OS settings with a focus on the 50 and 200 
 thread loads.  
 JVM settings tested:
 #  default, out of the box, env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #  10 G Max | 1 G New - default env-sh settings
 #* JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50
 #   20 G Max | 10 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
 # 20 G Max | 1 G New 
JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=8
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=6
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=3
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=12
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=12
JVM_OPTS=$JVM_OPTS 

[jira] [Commented] (CASSANDRA-8447) Nodes stuck in CMS GC cycle with very little traffic when compaction is enabled

2014-12-10 Thread Philo Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242183#comment-14242183
 ] 

Philo Yang commented on CASSANDRA-8447:
---

I have the same trouble that full gc can not reduce the size of old gen.  Days 
ago I post this problem to the maillist, people think it will be solved by 
tuning the gc setting, however it doesn't work for me. After seeing this issue 
I think it may be a bug? I can offer some information in my cluster, hope I can 
help you find the bug if it is a bug. Of course, the reason of my trouble may 
be different with [~jlacefie].

The unresponsive only appears in some nodes, in other words, some nodes 
unresponsive several times a day, the other nodes never. When there is no 
trouble, the load for all nodes are same, so I think the unresponsive is not 
because the heavy load.

Before the node being unresponsive, it will be easily to find by jstat that 
after CMS gc, old gen is still very large (in usual it will be only less than 
1GB after full gc but when the trouble comes it will be still more than 4GB 
after CMS gc). And there may be a compaction that stucks for many minutes in 
99% even 99.99% like this:
pending tasks: 1
   compaction type   keyspace  table   completed   totalunit   
progress
Compaction keyspace   table   354680703   354710642   bytes 
99.99%
But I'm not sure the trouble is always with the compaction stuck because I 
don't follow all the unresponsive.

I use jmap to print the object that holds in heap, I don't know if it is 
helpful to you:
num #instances #bytes  class name
--
   1:  11899268 3402792016  [B
   2:  23734819 1139271312  java.nio.HeapByteBuffer
   3:  11140273  306165600  [Ljava.nio.ByteBuffer;
   4:   9484838  227636112  
org.apache.cassandra.db.composites.CompoundComposite
   5:   8220604  197294496  
org.apache.cassandra.db.composites.BoundedComposite
   6: 27187   69131928  [J
   7:   1673344   53547008  
org.apache.cassandra.db.composites.CompoundSparseCellName
   8:   1540101   49283232  org.apache.cassandra.db.BufferCell
   9:  3158   45471360  
[Lorg.apache.cassandra.db.composites.Composite;
  10:  2527   27865040  [I
  11:251797   20236456  [Ljava.lang.Object;
  12:417752   12899976  [C
  13:263201   10528040  org.apache.cassandra.db.BufferExpiringCell
  14:322324   10314368  
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
  15:322237   10311584  
com.googlecode.concurrentlinkedhashmap.ConcurrentHashMapV8$Node
  16:417331   10015944  java.lang.String
  17: 863688891280  [Lorg.apache.cassandra.db.Cell;
  18:3499178398008  org.apache.cassandra.cql3.ColumnIdentifier
  19:2041618166440  java.util.TreeMap$Entry
  20:3223247735776  
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
  21:3223247735776  org.apache.cassandra.cache.KeyCacheKey
  22:3172747614576  java.lang.Double
  23:3171547611696  org.apache.cassandra.db.RowIndexEntry
  24:3146427551408  
java.util.concurrent.ConcurrentSkipListMap$Node
  25: 525607316584  constMethodKlass
  26:2921367011264  java.lang.Long
  27: 525606740064  methodKlass
  28:  52905937512  constantPoolKlass
  29:1602813846744  org.apache.cassandra.db.BufferDecoratedKey
  30:1557773738648  
java.util.concurrent.ConcurrentSkipListMap$Index
  31:  52903642232  instanceKlassKlass
  32:1502843606816  org.apache.cassandra.db.AtomicBTreeColumns
  33:1502613606264  
org.apache.cassandra.db.AtomicBTreeColumns$Holder
  34: 878613514440  
org.apache.cassandra.db.ArrayBackedSortedColumns
  35: 877683510720  
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask
  36:  46343372096  constantPoolCacheKlass
  37: 842623370480  java.util.Collections$SingletonMap
  38:  62432778728  methodDataKlass
  39:1734902775840  org.apache.cassandra.dht.LongToken
  40: 820002624000  java.util.RegularEnumSet
  41: 819812623392  org.apache.cassandra.net.MessageIn
  42: 819802623360  org.apache.cassandra.net.MessageDeliveryTask
  43:1025112460264  
java.util.concurrent.ConcurrentLinkedQueue$Node
  44: 949012277624  org.apache.cassandra.db.DeletionInfo
  45: 938372252088  
java.util.concurrent.Executors$RunnableAdapter
  46:140525