[jira] [Updated] (CASSANDRA-12859) Column-level permissions

2016-10-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-12859:
--
Description: 
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the {{GRANT}} statement. The relevant permission 
types are: {{MODIFY}} (for {{UPDATE}} and {{INSERT}}) and {{SELECT}}.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.

h4. Additional day-one requirements
# Reflect the column-level permissions in statements of type 
{{LIST ALL PERMISSIONS OF someuser;}}
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.

h4. Documentation
* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
* Feedback request: any others?


  was:
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the {{GRANT}} statement. The relevant permission 
types are: {{MODIFY}} (for {{UPDATE}} and {{INSERT}}) and {{SELECT}}.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.
Additional day-one requirements
# Reflect the column-level permissions in statements of type 
{{LIST ALL PERMISSIONS OF someuser;}}
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.

h4. Documentation
* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
* Feedback request: any others?



> Column-level permissions
> 
>
> Key: CASSANDRA-12859
>   

[jira] [Updated] (CASSANDRA-12859) Column-level permissions

2016-10-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-12859:
--
Description: 
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the {{GRANT}} statement. The relevant permission 
types are: {{MODIFY}} (for {{UPDATE}} and {{INSERT}}) and {{SELECT}}.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.
Additional day-one requirements
# Reflect the column-level permissions in statements of type 
{{LIST ALL PERMISSIONS OF someuser;}}
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.

h4. Documentation
* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
* Feedback request: any others?


  was:
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the GRANT statement. The relevant permission types 
are: MODIFY (for UPDATE and INSERT) and SELECT.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.
Additional day-one requirements
# Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.

h4. Documentation
* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
* Feedback request: any others?



> Column-level permissions
> 
>
> Key: CASSANDRA-12859
> URL: 

[jira] [Updated] (CASSANDRA-12859) Column-level permissions

2016-10-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-12859:
--
Description: 
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the GRANT statement. The relevant permission types 
are: MODIFY (for UPDATE and INSERT) and SELECT.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.
Additional day-one requirements
# Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.
h4. Documentation
#* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
#* Feedback request: any others?


  was:
Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
-   http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
-   https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
Main day-one requirements
1.  Extend CQL (Cassandra Query Language) to be able to optionally specify 
a list of individual columns, in the GRANT statement. The relevant permission 
types are: MODIFY (for UPDATE and INSERT) and SELECT.
2.  Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
3.  Enforce the column access restrictions during execution. Details:
  a.Should fit with the existing permission propagation down a role chain.
  b.Proposed message format when a user’s roles give access to the queried 
table but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
  c.Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
  d.Error code is the same as for table access denial: 2100.
Additional day-one requirements
4.  Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
5.  Performance should not degrade in any significant way.
6.  Backwards compatibility
  a.Permission enforcement for DBs created before the upgrade should 
continue to work with the same behavior after upgrading to a version that 
allows column-level permissions.
  b.Previous CQL syntax will remain valid, and have the same effect as 
before.
7.  Documentation
  o 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
  o Feedback request: any others?



> Column-level permissions
> 
>
> Key: 

[jira] [Updated] (CASSANDRA-12859) Column-level permissions

2016-10-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-12859:
--
Description: 
Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
-   http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
-   https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
Main day-one requirements
1.  Extend CQL (Cassandra Query Language) to be able to optionally specify 
a list of individual columns, in the GRANT statement. The relevant permission 
types are: MODIFY (for UPDATE and INSERT) and SELECT.
2.  Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
3.  Enforce the column access restrictions during execution. Details:
  a.Should fit with the existing permission propagation down a role chain.
  b.Proposed message format when a user’s roles give access to the queried 
table but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
  c.Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
  d.Error code is the same as for table access denial: 2100.
Additional day-one requirements
4.  Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
5.  Performance should not degrade in any significant way.
6.  Backwards compatibility
  a.Permission enforcement for DBs created before the upgrade should 
continue to work with the same behavior after upgrading to a version that 
allows column-level permissions.
  b.Previous CQL syntax will remain valid, and have the same effect as 
before.
7.  Documentation
  o 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
  o Feedback request: any others?


  was:
Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx
https://ibm.box.com/s/ithyzt0bhlcfb49dl5x6us0c887p1ovw

Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
-   http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
-   https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
Main day-one requirements
1.  Extend CQL (Cassandra Query Language) to be able to optionally specify 
a list of individual columns, in the GRANT statement. The relevant permission 
types are: MODIFY (for UPDATE and INSERT) and SELECT.
2.  Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
3.  Enforce the column access restrictions during execution. Details:
  a.Should fit with the existing permission propagation down a role chain.
  b.Proposed message format when a user’s roles give access to the queried 
table but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
  c.Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
  d.Error code is the same as for table access denial: 2100.
Additional day-one requirements
4.  Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
5.  Performance should not degrade in any significant way.
6.  Backwards compatibility
  a.Permission enforcement for DBs created before the upgrade should 
continue to work with the same behavior after upgrading to a version that 
allows column-level permissions.
  b.Previous CQL syntax will remain valid, and have the same effect as 
before.
7.  Documentation
  o 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
  o  

[jira] [Updated] (CASSANDRA-12859) Column-level permissions

2016-10-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-12859:
--
Description: 
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the GRANT statement. The relevant permission types 
are: MODIFY (for UPDATE and INSERT) and SELECT.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.
Additional day-one requirements
# Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.

h4. Documentation
* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
* Feedback request: any others?


  was:
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the GRANT statement. The relevant permission types 
are: MODIFY (for UPDATE and INSERT) and SELECT.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.
Additional day-one requirements
# Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.
h4. Documentation
* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
* Feedback request: any others?



> Column-level permissions
> 
>
> Key: CASSANDRA-12859
> URL: 

[jira] [Updated] (CASSANDRA-12859) Column-level permissions

2016-10-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-12859:
--
Description: 
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the GRANT statement. The relevant permission types 
are: MODIFY (for UPDATE and INSERT) and SELECT.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.
Additional day-one requirements
# Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.
h4. Documentation
* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
* Feedback request: any others?


  was:
h4. Here is a draft of: 
Cassandra Proposal - Column-level permissions.docx (attached)

h4. Quoting the 'Overview' section:

The purpose of this proposal is to add column-level (field-level) permissions 
to Cassandra. It is my intent to soon start implementing this feature in a 
fork, and to submit a pull request once it’s ready.
h4. Motivation
Cassandra already supports permissions on keyspace and table (column family) 
level. Sources:
* http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
* https://cassandra.apache.org/doc/latest/cql/security.html#data-control

At IBM, we have use cases in the area of big data analytics where column-level 
access permissions are also a requirement. All industry RDBMS products are 
supporting this level of permission control, and regulators are expecting it 
from all data-based systems.
h4. Main day-one requirements
# Extend CQL (Cassandra Query Language) to be able to optionally specify a list 
of individual columns, in the GRANT statement. The relevant permission types 
are: MODIFY (for UPDATE and INSERT) and SELECT.
# Persist the optional information in the appropriate system table 
‘system_auth.role_permissions’.
# Enforce the column access restrictions during execution. Details:
#* Should fit with the existing permission propagation down a role chain.
#* Proposed message format when a user’s roles give access to the queried table 
but not to all of the selected, inserted, or updated columns:
  "User %s has no %s permission on column %s of table %s"
#* Error will report only the first checked column. 
Nice to have: list all inaccessible columns.
#* Error code is the same as for table access denial: 2100.
Additional day-one requirements
# Reflect the column-level permissions in statements of type 
LIST ALL PERMISSIONS OF someuser;
# Performance should not degrade in any significant way.
# Backwards compatibility
#* Permission enforcement for DBs created before the upgrade should continue to 
work with the same behavior after upgrading to a version that allows 
column-level permissions.
#* Previous CQL syntax will remain valid, and have the same effect as before.
h4. Documentation
#* 
https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
#* Feedback request: any others?



> Column-level permissions
> 
>
> Key: CASSANDRA-12859
> URL: 

[jira] [Updated] (CASSANDRA-12859) Column-level permissions

2016-10-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-12859:
--
Attachment: Cassandra Proposal - Column-level permissions.docx

> Column-level permissions
> 
>
> Key: CASSANDRA-12859
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12859
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core, CQL
>Reporter: Boris Melamed
> Attachments: Cassandra Proposal - Column-level permissions.docx
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Here is a draft of: 
> Cassandra Proposal - Column-level permissions.docx
> https://ibm.box.com/s/ithyzt0bhlcfb49dl5x6us0c887p1ovw
> Quoting the 'Overview' section:
> The purpose of this proposal is to add column-level (field-level) permissions 
> to Cassandra. It is my intent to soon start implementing this feature in a 
> fork, and to submit a pull request once it’s ready.
> Motivation
> Cassandra already supports permissions on keyspace and table (column family) 
> level. Sources:
> - http://www.datastax.com/dev/blog/role-based-access-control-in-cassandra
> - https://cassandra.apache.org/doc/latest/cql/security.html#data-control
> At IBM, we have use cases in the area of big data analytics where 
> column-level access permissions are also a requirement. All industry RDBMS 
> products are supporting this level of permission control, and regulators are 
> expecting it from all data-based systems.
> Main day-one requirements
> 1.Extend CQL (Cassandra Query Language) to be able to optionally specify 
> a list of individual columns, in the GRANT statement. The relevant permission 
> types are: MODIFY (for UPDATE and INSERT) and SELECT.
> 2.Persist the optional information in the appropriate system table 
> ‘system_auth.role_permissions’.
> 3.Enforce the column access restrictions during execution. Details:
>   a.  Should fit with the existing permission propagation down a role chain.
>   b.  Proposed message format when a user’s roles give access to the queried 
> table but not to all of the selected, inserted, or updated columns:
>   "User %s has no %s permission on column %s of table %s"
>   c.  Error will report only the first checked column. 
> Nice to have: list all inaccessible columns.
>   d.  Error code is the same as for table access denial: 2100.
> Additional day-one requirements
> 4.Reflect the column-level permissions in statements of type 
> LIST ALL PERMISSIONS OF someuser;
> 5.Performance should not degrade in any significant way.
> 6.Backwards compatibility
>   a.  Permission enforcement for DBs created before the upgrade should 
> continue to work with the same behavior after upgrading to a version that 
> allows column-level permissions.
>   b.  Previous CQL syntax will remain valid, and have the same effect as 
> before.
> 7.Documentation
>   o   
> https://cassandra.apache.org/doc/latest/cql/security.html#grammar-token-permission
>   o   Feedback request: any others?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2016-07-06 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364183#comment-15364183
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

[~pauloricardomg] can we change the resolution to something other than 
"duplicate" to avoid confusion? I still run across people who hit this issue, 
and seems it was resolved with an upgrade to 2.1.11.

> OOM on bootstrap after long GC pause
> 
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: GCpath.txt, heap_dump.png, system.log.10-05, 
> thread_dump.log, threads.txt
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-30 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219151#comment-15219151
 ] 

Robbie Strickland commented on CASSANDRA-9666:
--

Let me just add that I would be happy to revisit DTCS and revise my opinion if 
someone can point me to a user who's running the new version at scale and is 
pleased with the results. If, as [~jbellis] suggested, it's possible to default 
DTCS such that it behaves like TWCS (and demonstrate that it does), that seems 
like a reasonable approach.  But there's definitely some PTSD for those of us 
who experienced tremendous pain, downtime, and data loss as a result of DTCS.  
On the contrary, TWCS has proven to be a simple and effective alternative for 
myself and many others.  In my mind, that puts the burden of proof on DTCS 
rather than vice versa.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: 

[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-30 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218615#comment-15218615
 ] 

Robbie Strickland commented on CASSANDRA-9666:
--

+1 to both [~kmatthias]'s and [~intjonathan]'s comments.  Both perfectly 
capture our experience, and are the reason why I am a very vocal proponent of 
TWCS.  DTCS nearly killed us with its neverending compaction scheme, and the 
symptoms can take time to materialize, which can fool load tests and lead 
unsuspecting users into believing that it's working when it's really a ticking 
time bomb.  I am writing my next edition of Cassandra High Availability right 
now, and I fully intend to strongly recommend against DTCS in favor of TWCS 
whether it's part of the codebase or not.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-25 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212527#comment-15212527
 ] 

Robbie Strickland edited comment on CASSANDRA-9666 at 3/25/16 10:46 PM:


We run TWCS at a sustained 2M writes/sec on just shy of 30TB that rolls through 
the cluster every few days. It does a great job keeping up after 6ish months of 
heavy pounding.





was (Author: rstrickland):
We run TWCS at a sustained 2M writes/sec on just shy of 30TB that rolls
through the cluster every few days. It does a great job keeping up after
6ish months of heavy pounding.




> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated 

[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-25 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212527#comment-15212527
 ] 

Robbie Strickland commented on CASSANDRA-9666:
--

We run TWCS at a sustained 2M writes/sec on just shy of 30TB that rolls
through the cluster every few days. It does a great job keeping up after
6ish months of heavy pounding.




> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was 

[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2016-01-21 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110728#comment-15110728
 ] 

Robbie Strickland commented on CASSANDRA-9666:
--

Has anyone done a comparison of TWCS with similarly configured DTCS?  We also 
have had tremendous improvements in performance and reliability moving from 
(old) DTCS to TWCS, so I'd be very interested in seeing results from such a 
test.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring 

[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2016-01-21 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1582#comment-1582
 ] 

Robbie Strickland commented on CASSANDRA-9666:
--

I could argue that TWCS is what most people want and expect, and the "weird 
tiering" and extra switches in DTCS are neutral at best and harmful at worst.  
Maybe the question should be, does DTCS provide anything extra that real users 
want?  And is the complexity worth it?  Perhaps, like BOP and supercolumns, it 
should be included for backward compatibility but discouraged in favor of TWCS. 
 Again like BOP or supercolumns, just because something can be done doesn't 
mean it should.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, 

[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2015-10-26 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974716#comment-14974716
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

As a workaround I was able to simply restart the node with {{auto_bootstrap}} 
set to false, which allowed it to successfully join.  Obviously there appear to 
be multiple issues here, as the behavior in 2.1.7 and 2.1.11 is different with 
an otherwise identical setup.

> OOM on bootstrap after long GC pause
> 
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: GCpath.txt, heap_dump.png, system.log.10-05, 
> thread_dump.log, threads.txt
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2015-10-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969215#comment-14969215
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Also, for reference, tpstats shows nothing in the queues:

{noformat}
ubuntu@eventcass4x087:~$ nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 0   85431226 0  
   0
ReadStage 0 0  0 0  
   0
RequestResponseStage  0 0 48 0  
   0
ReadRepairStage   0 0  0 0  
   0
CounterMutationStage  0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
HintedHandoff 0 0 29 0  
   0
GossipStage   0 0 565556 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
CommitLogArchiver 0 0  0 0  
   0
CompactionExecutor0 0  12774 0  
   0
ValidationExecutor0 0  0 0  
   0
MigrationStage0 0  0 0  
   0
AntiEntropyStage  0 0  0 0  
   0
PendingRangeCalculator0 0  3 0  
   0
Sampler   0 0  0 0  
   0
MemtableFlushWriter   0 0   7157 0  
   0
MemtablePostFlush 0 0  10083 0  
   0
MemtableReclaimMemory 0 0   9340 0  
   0

Message type   Dropped
READ 0
RANGE_SLICE  0
_TRACE   0
MUTATION 0
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE 0
PAGED_RANGE  0
READ_REPAIR  0
{noformat}

> OOM on bootstrap after long GC pause
> 
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: GCpath.txt, heap_dump.png, system.log.10-05, 
> thread_dump.log, threads.txt
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2015-10-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969125#comment-14969125
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

I decided to try upgrading to 2.1.11 to see if the issue was resolved by 
CASSANDRA-9681.  The node has been joining for over 24 hours, even though it 
appears to have finished streaming after about 6 hours:

{{noformat}}
ubuntu@eventcass4x087:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 7047c510-7732-11e5-a7e7-63f53bbd2778
Receiving 171 files, 95313491312 bytes total. Already received 171 
files, 95313491312 bytes total
Receiving 165 files, 78860134041 bytes total. Already received 165 
files, 78860134041 bytes total
Receiving 158 files, 77709354374 bytes total. Already received 158 
files, 77709354374 bytes total
Receiving 184 files, 106710570690 bytes total. Already received 184 
files, 106710570690 bytes total
Receiving 136 files, 35699286217 bytes total. Already received 136 
files, 35699286217 bytes total
Receiving 169 files, 53498180215 bytes total. Already received 169 
files, 53498180215 bytes total
Receiving 197 files, 129020987979 bytes total. Already received 197 
files, 129020987979 bytes total
Receiving 196 files, 113904035360 bytes total. Already received 196 
files, 113904035360 bytes total
Receiving 172 files, 47685647028 bytes total. Already received 172 
files, 47685647028 bytes total
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 1  0
Responses   n/a 0   83743675
{{noformat}}

It doesn't appear to still be building indexes either:

{{noformat}}
ubuntu@eventcass4x087:~$ nodetool compactionstats
pending tasks: 2
   compaction typekeyspace  table   completed   total   
 unit   progress
Compaction   prod_analytics_events   wuevents   163704673   201033961   
bytes 81.43%
Active compaction remaining time :n/a
{{noformat}}

So I'm not sure why it's still joining.  Any thoughts?

> OOM on bootstrap after long GC pause
> 
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: GCpath.txt, heap_dump.png, system.log.10-05, 
> thread_dump.log, threads.txt
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2015-10-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969125#comment-14969125
 ] 

Robbie Strickland edited comment on CASSANDRA-10449 at 10/22/15 1:08 PM:
-

I decided to try upgrading to 2.1.11 to see if the issue was resolved by 
CASSANDRA-9681.  The node has been joining for over 24 hours, even though it 
appears to have finished streaming after about 6 hours:

{noformat}
ubuntu@eventcass4x087:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 7047c510-7732-11e5-a7e7-63f53bbd2778
Receiving 171 files, 95313491312 bytes total. Already received 171 
files, 95313491312 bytes total
Receiving 165 files, 78860134041 bytes total. Already received 165 
files, 78860134041 bytes total
Receiving 158 files, 77709354374 bytes total. Already received 158 
files, 77709354374 bytes total
Receiving 184 files, 106710570690 bytes total. Already received 184 
files, 106710570690 bytes total
Receiving 136 files, 35699286217 bytes total. Already received 136 
files, 35699286217 bytes total
Receiving 169 files, 53498180215 bytes total. Already received 169 
files, 53498180215 bytes total
Receiving 197 files, 129020987979 bytes total. Already received 197 
files, 129020987979 bytes total
Receiving 196 files, 113904035360 bytes total. Already received 196 
files, 113904035360 bytes total
Receiving 172 files, 47685647028 bytes total. Already received 172 
files, 47685647028 bytes total
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 1  0
Responses   n/a 0   83743675
{noformat}

It doesn't appear to still be building indexes either:

{noformat}
ubuntu@eventcass4x087:~$ nodetool compactionstats
pending tasks: 2
   compaction typekeyspace  table   completed   total   
 unit   progress
Compaction   prod_analytics_events   wuevents   163704673   201033961   
bytes 81.43%
Active compaction remaining time :n/a
{noformat}

So I'm not sure why it's still joining.  Any thoughts?


was (Author: rstrickland):
I decided to try upgrading to 2.1.11 to see if the issue was resolved by 
CASSANDRA-9681.  The node has been joining for over 24 hours, even though it 
appears to have finished streaming after about 6 hours:

{{noformat}}
ubuntu@eventcass4x087:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 7047c510-7732-11e5-a7e7-63f53bbd2778
Receiving 171 files, 95313491312 bytes total. Already received 171 
files, 95313491312 bytes total
Receiving 165 files, 78860134041 bytes total. Already received 165 
files, 78860134041 bytes total
Receiving 158 files, 77709354374 bytes total. Already received 158 
files, 77709354374 bytes total
Receiving 184 files, 106710570690 bytes total. Already received 184 
files, 106710570690 bytes total
Receiving 136 files, 35699286217 bytes total. Already received 136 
files, 35699286217 bytes total
Receiving 169 files, 53498180215 bytes total. Already received 169 
files, 53498180215 bytes total
Receiving 197 files, 129020987979 bytes total. Already received 197 
files, 129020987979 bytes total
Receiving 196 files, 113904035360 bytes total. Already received 196 
files, 113904035360 bytes total
Receiving 172 files, 47685647028 bytes total. Already received 172 
files, 47685647028 bytes total
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 1  0
Responses   n/a 0   83743675
{{noformat}}

It doesn't appear to still be building indexes either:

{{noformat}}
ubuntu@eventcass4x087:~$ nodetool compactionstats
pending tasks: 2
   compaction typekeyspace  table   completed   total   
 unit   progress
Compaction   prod_analytics_events   wuevents   163704673   201033961   
bytes 81.43%
Active compaction remaining time :n/a
{{noformat}}

So I'm not sure why it's still joining.  Any thoughts?

> OOM on bootstrap after long GC pause
> 
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: GCpath.txt, heap_dump.png, system.log.10-05, 
> thread_dump.log, threads.txt
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default 

[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2015-10-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946624#comment-14946624
 ] 

Robbie Strickland edited comment on CASSANDRA-10449 at 10/22/15 1:10 PM:
-

I increased max heap to 96GB and tried again.  Now doing netstats shows 
progress ground to a halt:

9pm:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
Receiving 139 files, 36548040412 bytes total. Already received 139 
files, 36548040412 bytes total
Receiving 171 files, 6431853 bytes total. Already received 171 
files, 6431853 bytes total
Receiving 147 files, 78458709168 bytes total. Already received 79 
files, 55003961646 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
 955162267/4105438496 bytes(23%) received from idx:0/x.x.x.x
Receiving 141 files, 36700837768 bytes total. Already received 141 
files, 36700837768 bytes total
Receiving 176 files, 79676288976 bytes total. Already received 98 
files, 55932809644 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
 174070078/7326235809 bytes(2%) received from idx:0/x.x.x.x
Receiving 170 files, 85920995638 bytes total. Already received 94 
files, 54985226700 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
 4875660361/22821083384 bytes(21%) received from idx:0/x.x.x.x
Receiving 174 files, 87064163973 bytes total. Already received 91 
files, 53930233899 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
 17064156850/25823860172 bytes(66%) received from idx:0/x.x.x.x
Receiving 164 files, 46351636573 bytes total. Already received 164 
files, 46351636573 bytes total
Receiving 158 files, 62899520151 bytes total. Already received 158 
files, 62899520151 bytes total
Receiving 164 files, 48771232182 bytes total. Already received 164 
files, 48771232182 bytes total
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a19 56
Responses   n/a 0   35515795
{noformat}

6am:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
Receiving 139 files, 36548040412 bytes total. Already received 139 
files, 36548040412 bytes total
Receiving 171 files, 6431853 bytes total. Already received 171 
files, 6431853 bytes total
Receiving 147 files, 78458709168 bytes total. Already received 79 
files, 55003961646 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
 955162267/4105438496 bytes(23%) received from idx:0/x.x.x.x
Receiving 141 files, 36700837768 bytes total. Already received 141 
files, 36700837768 bytes total
Receiving 176 files, 79676288976 bytes total. Already received 98 
files, 55932809644 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
 174070078/7326235809 bytes(2%) received from idx:0/x.x.x.x
Receiving 170 files, 85920995638 bytes total. Already received 94 
files, 54985226700 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
 4875660361/22821083384 bytes(21%) received from idx:0/x.x.x.x
Receiving 174 files, 87064163973 bytes total. Already received 91 
files, 53930233899 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
 17064156850/25823860172 bytes(66%) received from idx:0/x.x.x.x
Receiving 164 files, 46351636573 bytes total. Already received 164 
files, 46351636573 bytes total
Receiving 158 files, 62899520151 bytes total. Already received 158 
files, 62899520151 bytes total
Receiving 164 files, 48771232182 bytes total. Already received 164 
files, 48771232182 bytes total
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commands 

[jira] [Updated] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2015-10-16 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-10449:
--
Attachment: heap_dump.png

I've attached a screen shot of the heap dump.

> OOM on bootstrap after long GC pause
> 
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: heap_dump.png, system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2015-10-16 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961343#comment-14961343
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Yes, sorry I was working on getting it to S3.  You can get it 
[here|https://s3.amazonaws.com/twc-analytics-public/java_1445001330.hprof].

> OOM on bootstrap after long GC pause
> 
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: heap_dump.png, system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap after long GC pause

2015-10-16 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961661#comment-14961661
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Yes, 16GB.

> OOM on bootstrap after long GC pause
> 
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: heap_dump.png, system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057
 ] 

Robbie Strickland edited comment on CASSANDRA-10449 at 10/15/15 3:24 PM:
-

I discovered that an index on one of the tables has a wide row, and I'm 
wondering if that could be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.


was (Author: rstrickland):
I discovered that an index on one of the tables has a wide row, and I'm 
assuming that to be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057
 ] 

Robbie Strickland edited comment on CASSANDRA-10449 at 10/15/15 3:25 PM:
-

I discovered that an index on one of the tables has a wide row, and I'm 
wondering if that could be the root of the issue:

Example from one node:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.


was (Author: rstrickland):
I discovered that an index on one of the tables has a wide row, and I'm 
wondering if that could be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959057#comment-14959057
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

I discovered that an index on one of the tables has a wide row, and I'm 
assuming that to be the root of the issue:

Example:
{noformat}
Compacted partition minimum bytes: 125
Compacted partition maximum bytes: 10299432635
Compacted partition mean bytes: 253692309
{noformat}

This seems like a problem in general for indexes, where the original data model 
may be well distributed but the index may have unpredictable distribution.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959217#comment-14959217
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Ok [~mishail] I will re-run with heap dump enabled (we had it turned off for 
some reason) and post it.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-12 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954097#comment-14954097
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Per [~zznate]'s suggestion I also tried setting compaction throughput to 0, 
with no effect. He also said I should try taking one of the large sstables and 
trying to use sstableloader on it.  I will do that tomorrow.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10497) NPE on removeUnfinishedCompactionLeftovers after sstablesplit

2015-10-09 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-10497:
--
Description: 
After stopping the node and running {{sstablesplit}} on a single table, 
restarting Cassandra results in an NPE:

{noformat}
INFO  [SSTableBatchOpen:2] 2015-10-09 13:15:38,745 SSTableReader.java:471 - 
Opening 
/var/lib/cassandra/xvdd/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-514
 (175 bytes)
INFO  [main] 2015-10-09 13:15:38,747 AutoSavingCache.java:146 - reading saved 
cache 
/var/lib/cassandra/xvdb/cache/system-schema_keyspaces-b0f2235744583cdb9631c43e59ce3676-KeyCache-b.db
ERROR [main] 2015-10-09 13:15:39,114 CassandraDaemon.java:541 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:641)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:633)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
... 3 common frames omitted
{noformat}

The node would only come back up after deleting all the files in 
compactions_in_progress.

  was:
After stopping the node and running {{sstablesplit}} on a single table, 
restarting Cassandra results in an NPE:

{noformat}
INFO  [SSTableBatchOpen:2] 2015-10-09 13:15:38,745 SSTableReader.java:471 - 
Opening 
/var/lib/cassandra/xvdd/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-514
 (175 bytes)
INFO  [main] 2015-10-09 13:15:38,747 AutoSavingCache.java:146 - reading saved 
cache 
/var/lib/cassandra/xvdb/cache/system-schema_keyspaces-b0f2235744583cdb9631c43e59ce3676-KeyCache-b.db
ERROR [main] 2015-10-09 13:15:39,114 CassandraDaemon.java:541 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:641)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:633)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
... 3 common frames omitted
{noformat}


> NPE on removeUnfinishedCompactionLeftovers after sstablesplit
> -
>
> Key: CASSANDRA-10497
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10497
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, Tools
> Environment: Ubuntu 14.04
>Reporter: Robbie Strickland
> Attachments: npe_system.log
>
>
> After stopping the node and running {{sstablesplit}} on a single table, 
> restarting Cassandra results in an NPE:
> {noformat}
> INFO  [SSTableBatchOpen:2] 2015-10-09 13:15:38,745 SSTableReader.java:471 - 
> Opening 
> /var/lib/cassandra/xvdd/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-514
>  (175 bytes)
> INFO  [main] 2015-10-09 13:15:38,747 AutoSavingCache.java:146 - reading saved 
> cache 
> /var/lib/cassandra/xvdb/cache/system-schema_keyspaces-b0f2235744583cdb9631c43e59ce3676-KeyCache-b.db
> ERROR [main] 2015-10-09 13:15:39,114 CassandraDaemon.java:541 - Exception 
> encountered during startup
> org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
> at 
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:641)
>  ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) 
> [apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
>

[jira] [Created] (CASSANDRA-10497) NPE on removeUnfinishedCompactionLeftovers after sstablesplit

2015-10-09 Thread Robbie Strickland (JIRA)
Robbie Strickland created CASSANDRA-10497:
-

 Summary: NPE on removeUnfinishedCompactionLeftovers after 
sstablesplit
 Key: CASSANDRA-10497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10497
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
 Environment: Ubuntu 14.04
Reporter: Robbie Strickland
 Attachments: npe_system.log

After running {{sstablesplit}} on a single table, restarting Cassandra results 
in an NPE:

{noformat}
INFO  [SSTableBatchOpen:2] 2015-10-09 13:15:38,745 SSTableReader.java:471 - 
Opening 
/var/lib/cassandra/xvdd/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-514
 (175 bytes)
INFO  [main] 2015-10-09 13:15:38,747 AutoSavingCache.java:146 - reading saved 
cache 
/var/lib/cassandra/xvdb/cache/system-schema_keyspaces-b0f2235744583cdb9631c43e59ce3676-KeyCache-b.db
ERROR [main] 2015-10-09 13:15:39,114 CassandraDaemon.java:541 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:641)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:633)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
... 3 common frames omitted
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10497) NPE on removeUnfinishedCompactionLeftovers after sstablesplit

2015-10-09 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-10497:
--
Description: 
After stopping the node and running {{sstablesplit}} on a single table, 
restarting Cassandra results in an NPE:

{noformat}
INFO  [SSTableBatchOpen:2] 2015-10-09 13:15:38,745 SSTableReader.java:471 - 
Opening 
/var/lib/cassandra/xvdd/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-514
 (175 bytes)
INFO  [main] 2015-10-09 13:15:38,747 AutoSavingCache.java:146 - reading saved 
cache 
/var/lib/cassandra/xvdb/cache/system-schema_keyspaces-b0f2235744583cdb9631c43e59ce3676-KeyCache-b.db
ERROR [main] 2015-10-09 13:15:39,114 CassandraDaemon.java:541 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:641)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:633)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
... 3 common frames omitted
{noformat}

  was:
After running {{sstablesplit}} on a single table, restarting Cassandra results 
in an NPE:

{noformat}
INFO  [SSTableBatchOpen:2] 2015-10-09 13:15:38,745 SSTableReader.java:471 - 
Opening 
/var/lib/cassandra/xvdd/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-514
 (175 bytes)
INFO  [main] 2015-10-09 13:15:38,747 AutoSavingCache.java:146 - reading saved 
cache 
/var/lib/cassandra/xvdb/cache/system-schema_keyspaces-b0f2235744583cdb9631c43e59ce3676-KeyCache-b.db
ERROR [main] 2015-10-09 13:15:39,114 CassandraDaemon.java:541 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:641)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) 
[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:633)
 ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
... 3 common frames omitted
{noformat}


> NPE on removeUnfinishedCompactionLeftovers after sstablesplit
> -
>
> Key: CASSANDRA-10497
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10497
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core, Tools
> Environment: Ubuntu 14.04
>Reporter: Robbie Strickland
> Attachments: npe_system.log
>
>
> After stopping the node and running {{sstablesplit}} on a single table, 
> restarting Cassandra results in an NPE:
> {noformat}
> INFO  [SSTableBatchOpen:2] 2015-10-09 13:15:38,745 SSTableReader.java:471 - 
> Opening 
> /var/lib/cassandra/xvdd/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-514
>  (175 bytes)
> INFO  [main] 2015-10-09 13:15:38,747 AutoSavingCache.java:146 - reading saved 
> cache 
> /var/lib/cassandra/xvdb/cache/system-schema_keyspaces-b0f2235744583cdb9631c43e59ce3676-KeyCache-b.db
> ERROR [main] 2015-10-09 13:15:39,114 CassandraDaemon.java:541 - Exception 
> encountered during startup
> org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
> at 
> org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:641)
>  ~[apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:302) 
> [apache-cassandra-2.1.7-SNAPSHOT.jar:2.1.7-SNAPSHOT]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524)
>  

[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-09 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950528#comment-14950528
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

After making numerous GC tweaks, including going back to default CMS settings, 
symptoms remain the same.  Would appreciate any additional pointers, as I'm 
grasping at straws now.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-08 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948566#comment-14948566
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

Unfortunately increasing streaming_socket_timeout_in_ms and 
memtable_flush_writers resulted in OOMing again instead of hanging.  It seems 
to be hanging when it gets to larger sstables (30GB+).  I will poke around some 
more today.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-08 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948566#comment-14948566
 ] 

Robbie Strickland edited comment on CASSANDRA-10449 at 10/8/15 2:00 PM:


Unfortunately increasing streaming_socket_timeout_in_ms and 
memtable_flush_writers resulted in OOMing again instead of hanging.  It seems 
to be hanging/OOMing when it gets to larger sstables (30GB+).  I will poke 
around some more today.


was (Author: rstrickland):
Unfortunately increasing streaming_socket_timeout_in_ms and 
memtable_flush_writers resulted in OOMing again instead of hanging.  It seems 
to be hanging when it gets to larger sstables (30GB+).  I will poke around some 
more today.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-07 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947486#comment-14947486
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

I am going to try again after increasing streaming_socket_timeout_in_ms and 
memtable_flush_writers.  I had not touched these values, so it's possible that 
was hurting me.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-07 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946624#comment-14946624
 ] 

Robbie Strickland commented on CASSANDRA-10449:
---

I increased max heap to 96GB and tried again.  Now doing netstats shows 
progress ground to a halt:

9pm:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
/52.1.155.147 (using /10.239.209.15)
Receiving 139 files, 36548040412 bytes total. Already received 139 
files, 36548040412 bytes total
/52.2.9.34 (using /10.239.209.17)
Receiving 171 files, 6431853 bytes total. Already received 171 
files, 6431853 bytes total
/52.0.152.88 (using /10.239.209.44)
Receiving 147 files, 78458709168 bytes total. Already received 79 
files, 55003961646 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
 955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88
/52.2.0.164 (using /10.239.209.16)
Receiving 141 files, 36700837768 bytes total. Already received 141 
files, 36700837768 bytes total
/54.152.177.161 (using /10.239.209.93)
/54.172.174.48 (using /10.239.209.49)
Receiving 176 files, 79676288976 bytes total. Already received 98 
files, 55932809644 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
 174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48
/52.2.75.82 (using /10.239.208.88)
/54.165.111.69 (using /10.239.209.47)
Receiving 170 files, 85920995638 bytes total. Already received 94 
files, 54985226700 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-265-Data.db
 4875660361/22821083384 bytes(21%) received from idx:0/54.165.111.69
/52.6.136.30 (using /10.239.209.45)
Receiving 174 files, 87064163973 bytes total. Already received 91 
files, 53930233899 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-157-Data.db
 17064156850/25823860172 bytes(66%) received from idx:0/52.6.136.30
/52.7.14.201 (using /10.239.209.46)
Receiving 164 files, 46351636573 bytes total. Already received 164 
files, 46351636573 bytes total
/52.2.30.66 (using /10.239.209.18)
Receiving 158 files, 62899520151 bytes total. Already received 158 
files, 62899520151 bytes total
/54.175.138.33 (using /10.239.209.96)
/54.88.44.178 (using /10.239.209.91)
/52.2.109.194 (using /10.239.208.89)
/54.172.81.117 (using /10.239.209.95)
/54.172.103.46 (using /10.239.209.48)
Receiving 164 files, 48771232182 bytes total. Already received 164 
files, 48771232182 bytes total
/54.164.172.164 (using /10.239.209.94)
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a19 56
Responses   n/a 0   35515795
{noformat}

6am:

{noformat}
ubuntu@eventcass4x024:~$ nodetool netstats | grep -v 100%
Mode: JOINING
Bootstrap 45d8dec0-6c12-11e5-90ef-f7a8e02e59c0
/52.1.155.147 (using /10.239.209.15)
Receiving 139 files, 36548040412 bytes total. Already received 139 
files, 36548040412 bytes total
/52.2.9.34 (using /10.239.209.17)
Receiving 171 files, 6431853 bytes total. Already received 171 
files, 6431853 bytes total
/52.0.152.88 (using /10.239.209.44)
Receiving 147 files, 78458709168 bytes total. Already received 79 
files, 55003961646 bytes total

/var/lib/cassandra/xvdd/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-295-Data.db
 955162267/4105438496 bytes(23%) received from idx:0/52.0.152.88
/52.2.0.164 (using /10.239.209.16)
Receiving 141 files, 36700837768 bytes total. Already received 141 
files, 36700837768 bytes total
/54.152.177.161 (using /10.239.209.93)
/54.172.174.48 (using /10.239.209.49)
Receiving 176 files, 79676288976 bytes total. Already received 98 
files, 55932809644 bytes total

/var/lib/cassandra/xvdb/data/prod_analytics_events/wuevents-ffa99ad05af911e596f05987bbaaffad/prod_analytics_events-wuevents-tmp-ka-329-Data.db
 174070078/7326235809 bytes(2%) received from idx:0/54.172.174.48
/52.2.75.82 (using /10.239.208.88)
/54.165.111.69 (using /10.239.209.47)
Receiving 170 files, 85920995638 bytes total. Already received 94 
files, 54985226700 bytes total


[jira] [Updated] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-07 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-10449:
--
Attachment: thread_dump.log

I've attached a thread dump taken after the streaming hangs.

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05, thread_dump.log
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-06 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-10449:
--
Attachment: system.log.10-05

> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
> Fix For: 2.1.x
>
> Attachments: system.log.10-05
>
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-05 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-10449:
--
Environment: Ubuntu 14.04, AWS  (was: Ubuntu 14.04)
Description: 
I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
provision additional nodes, but bootstrapping OOMs every time after about 10 
hours with a sudden long GC pause:

{noformat}
INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
...
ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
CassandraDaemon.java:223 - Exception in thread 
Thread[MemtableFlushWriter:454,5,main]
java.lang.OutOfMemoryError: Java heap space
{noformat}

I have tried increasing max heap to 48G just to get through the bootstrap, to 
no avail.

  was:
I have a 20-node cluster with vnodes (default of 256) and 500-700GB per node.  
SSTable counts are <10 per table.  I am attempting to provision additional 
nodes, but bootstrapping OOMs every time after about 10 hours with a sudden 
long GC pause:

{noformat}
INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
...
ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
CassandraDaemon.java:223 - Exception in thread 
Thread[MemtableFlushWriter:454,5,main]
java.lang.OutOfMemoryError: Java heap space
{noformat}

I have tried increasing max heap to 48G just to get through the bootstrap, to 
no avail.


> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04, AWS
>Reporter: Robbie Strickland
>  Labels: gc
>
> I have a 20-node cluster (i2.4xlarge) with vnodes (default of 256) and 
> 500-700GB per node.  SSTable counts are <10 per table.  I am attempting to 
> provision additional nodes, but bootstrapping OOMs every time after about 10 
> hours with a sudden long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-05 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-10449:
--
Description: 
I have a 20-node cluster with vnodes (default of 256) and 500-700GB per node.  
SSTable counts are <10 per table.  I am attempting to provision additional 
nodes, but bootstrapping OOMs every time after about 10 hours with a sudden 
long GC pause:

{noformat}
INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
...
ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
CassandraDaemon.java:223 - Exception in thread 
Thread[MemtableFlushWriter:454,5,main]
java.lang.OutOfMemoryError: Java heap space
{noformat}

I have tried increasing max heap to 48G just to get through the bootstrap, to 
no avail.

  was:
I have a 20-node cluster with vnodes (default of 256) and 500-700GB per node.  
SSTable counts are <10 per node.  I am attempting to provision additional 
nodes, but bootstrapping OOMs every time after about 10 hours with a sudden 
long GC pause:

{noformat}
INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
...
ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
CassandraDaemon.java:223 - Exception in thread 
Thread[MemtableFlushWriter:454,5,main]
java.lang.OutOfMemoryError: Java heap space
{noformat}

I have tried increasing max heap to 48G just to get through the bootstrap, to 
no avail.


> OOM on bootstrap due to long GC pause
> -
>
> Key: CASSANDRA-10449
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04
>Reporter: Robbie Strickland
>  Labels: gc
>
> I have a 20-node cluster with vnodes (default of 256) and 500-700GB per node. 
>  SSTable counts are <10 per table.  I am attempting to provision additional 
> nodes, but bootstrapping OOMs every time after about 10 hours with a sudden 
> long GC pause:
> {noformat}
> INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
> Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
> ...
> ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[MemtableFlushWriter:454,5,main]
> java.lang.OutOfMemoryError: Java heap space
> {noformat}
> I have tried increasing max heap to 48G just to get through the bootstrap, to 
> no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10449) OOM on bootstrap due to long GC pause

2015-10-05 Thread Robbie Strickland (JIRA)
Robbie Strickland created CASSANDRA-10449:
-

 Summary: OOM on bootstrap due to long GC pause
 Key: CASSANDRA-10449
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10449
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04
Reporter: Robbie Strickland


I have a 20-node cluster with vnodes (default of 256) and 500-700GB per node.  
SSTable counts are <10 per node.  I am attempting to provision additional 
nodes, but bootstrapping OOMs every time after about 10 hours with a sudden 
long GC pause:

{noformat}
INFO  [Service Thread] 2015-10-05 23:33:33,373 GCInspector.java:252 - G1 Old 
Generation GC in 1586126ms.  G1 Old Gen: 49213756976 -> 49072277176;
...
ERROR [MemtableFlushWriter:454] 2015-10-05 23:33:33,380 
CassandraDaemon.java:223 - Exception in thread 
Thread[MemtableFlushWriter:454,5,main]
java.lang.OutOfMemoryError: Java heap space
{noformat}

I have tried increasing max heap to 48G just to get through the bootstrap, to 
no avail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9938) Significant GC pauses

2015-07-30 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9938:
-
Description: 
We have an 18-node analytics cluster, running 2.1.7 patched with 
CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
especially in old gen, and little space is reclaimed.  Eventually these nodes 
OOM:

{code}
ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
forcefully due to: java.lang.OutOfMemoryError: Java heap space
{code}

We use G1 with the following settings:
Max heap = 16G
New size = 1.6G
+UseTLAB
+ResizeTLAB
+PerfDisableSharedMem
-UseBiasedLocking

The nodes in question have average load profiles for the cluster, and caches 
are disabled on all tables.  There is no obvious difference with the 
problematic nodes.  Unfortunately we're currently getting an assertion error 
when trying to get a heap dump, or I would post that.


  was:
We have an 18-node analytics cluster, running 2.1.7 patched with 
CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
especially in old gen.  Eventually these nodes OOM:

{code}
ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
forcefully due to: java.lang.OutOfMemoryError: Java heap space
{code}

We use G1 with the following settings:
Max heap = 16G
New size = 1.6G
+UseTLAB
+ResizeTLAB
+PerfDisableSharedMem
-UseBiasedLocking

The nodes in question have average load profiles for the cluster, and caches 
are disabled on all tables.  There is no obvious difference with the 
problematic nodes.  Unfortunately we're currently getting an assertion error 
when trying to get a heap dump, or I would post that.



 Significant GC pauses
 -

 Key: CASSANDRA-9938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9938
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04, Java 1.8.0u45
Reporter: Robbie Strickland
  Labels: gc
 Attachments: gc_log.txt


 We have an 18-node analytics cluster, running 2.1.7 patched with 
 CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
 especially in old gen, and little space is reclaimed.  Eventually these nodes 
 OOM:
 {code}
 ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
 forcefully due to: java.lang.OutOfMemoryError: Java heap space
 {code}
 We use G1 with the following settings:
 Max heap = 16G
 New size = 1.6G
 +UseTLAB
 +ResizeTLAB
 +PerfDisableSharedMem
 -UseBiasedLocking
 The nodes in question have average load profiles for the cluster, and caches 
 are disabled on all tables.  There is no obvious difference with the 
 problematic nodes.  Unfortunately we're currently getting an assertion error 
 when trying to get a heap dump, or I would post that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9938) Significant GC pauses

2015-07-30 Thread Robbie Strickland (JIRA)
Robbie Strickland created CASSANDRA-9938:


 Summary: Significant GC pauses
 Key: CASSANDRA-9938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9938
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04, Java 1.8.0u45
Reporter: Robbie Strickland
 Attachments: gc_log.txt

We have an 18-node analytics cluster, running 2.1.7 patched with 
CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
especially in old gen.  Eventually these nodes OOM:

{code}
ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
forcefully due to: java.lang.OutOfMemoryError: Java heap space
{code}

We use G1 with the following settings:
Max heap = 16G
New size = 1.6G
+UseTLAB
+ResizeTLAB
+PerfDisableSharedMem
-UseBiasedLocking

The nodes in question have average load profiles for the cluster, and caches 
are disabled on all tables.  There is no obvious difference with the 
problematic nodes.  Unfortunately we're currently getting an assertion error 
when trying to get a heap dump, or I would post that.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9938) Significant GC pauses

2015-07-30 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9938:
-
Description: 
We have an 18-node analytics cluster, running 2.1.7 patched with 
CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
especially in old gen, and little space is reclaimed.  Eventually these nodes 
OOM:

{code}
ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
forcefully due to: java.lang.OutOfMemoryError: Java heap space
{code}

We use G1 with the following settings:
Max heap = 16G
New size = 1.6G
+UseTLAB
+ResizeTLAB
+PerfDisableSharedMem
-UseBiasedLocking

The nodes in question have average load profiles for the cluster, and caches 
are disabled on all tables.  There is no obvious difference with the 
problematic nodes, and no other clear signs of trouble.  Unfortunately we're 
currently getting an assertion error when trying to get a heap dump, or I would 
post that.


  was:
We have an 18-node analytics cluster, running 2.1.7 patched with 
CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
especially in old gen, and little space is reclaimed.  Eventually these nodes 
OOM:

{code}
ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
forcefully due to: java.lang.OutOfMemoryError: Java heap space
{code}

We use G1 with the following settings:
Max heap = 16G
New size = 1.6G
+UseTLAB
+ResizeTLAB
+PerfDisableSharedMem
-UseBiasedLocking

The nodes in question have average load profiles for the cluster, and caches 
are disabled on all tables.  There is no obvious difference with the 
problematic nodes.  Unfortunately we're currently getting an assertion error 
when trying to get a heap dump, or I would post that.



 Significant GC pauses
 -

 Key: CASSANDRA-9938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9938
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04, Java 1.8.0u45
Reporter: Robbie Strickland
  Labels: gc
 Attachments: gc_log.txt


 We have an 18-node analytics cluster, running 2.1.7 patched with 
 CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
 especially in old gen, and little space is reclaimed.  Eventually these nodes 
 OOM:
 {code}
 ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
 forcefully due to: java.lang.OutOfMemoryError: Java heap space
 {code}
 We use G1 with the following settings:
 Max heap = 16G
 New size = 1.6G
 +UseTLAB
 +ResizeTLAB
 +PerfDisableSharedMem
 -UseBiasedLocking
 The nodes in question have average load profiles for the cluster, and caches 
 are disabled on all tables.  There is no obvious difference with the 
 problematic nodes, and no other clear signs of trouble.  Unfortunately we're 
 currently getting an assertion error when trying to get a heap dump, or I 
 would post that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9938) Significant GC pauses

2015-07-30 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9938:
-
Attachment: gc_log.txt

 Significant GC pauses
 -

 Key: CASSANDRA-9938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9938
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04, Java 1.8.0u45
Reporter: Robbie Strickland
  Labels: gc
 Attachments: gc_log.txt


 We have an 18-node analytics cluster, running 2.1.7 patched with 
 CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
 especially in old gen.  Eventually these nodes OOM:
 {code}
 ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
 forcefully due to: java.lang.OutOfMemoryError: Java heap space
 {code}
 We use G1 with the following settings:
 Max heap = 16G
 New size = 1.6G
 +UseTLAB
 +ResizeTLAB
 +PerfDisableSharedMem
 -UseBiasedLocking
 The nodes in question have average load profiles for the cluster, and caches 
 are disabled on all tables.  There is no obvious difference with the 
 problematic nodes.  Unfortunately we're currently getting an assertion error 
 when trying to get a heap dump, or I would post that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9938) Significant GC pauses

2015-07-30 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648213#comment-14648213
 ] 

Robbie Strickland commented on CASSANDRA-9938:
--

I found the culprit.  We had a very wide row due to an application error.

 Significant GC pauses
 -

 Key: CASSANDRA-9938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9938
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04, Java 1.8.0u45
Reporter: Robbie Strickland
  Labels: gc
 Attachments: gc_log.txt


 We have an 18-node analytics cluster, running 2.1.7 patched with 
 CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
 especially in old gen, and little space is reclaimed.  Eventually these nodes 
 OOM:
 {code}
 ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
 forcefully due to: java.lang.OutOfMemoryError: Java heap space
 {code}
 We use G1 with the following settings:
 Max heap = 16G
 New size = 1.6G
 +UseTLAB
 +ResizeTLAB
 +PerfDisableSharedMem
 -UseBiasedLocking
 The nodes in question have average load profiles for the cluster, and caches 
 are disabled on all tables.  There is no obvious difference with the 
 problematic nodes, and no other clear signs of trouble.  Unfortunately we're 
 currently getting an assertion error when trying to get a heap dump, or I 
 would post that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9938) Significant GC pauses

2015-07-30 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9938:
-
Fix Version/s: (was: 2.1.x)

 Significant GC pauses
 -

 Key: CASSANDRA-9938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9938
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04, Java 1.8.0u45
Reporter: Robbie Strickland
  Labels: gc
 Attachments: gc_log.txt


 We have an 18-node analytics cluster, running 2.1.7 patched with 
 CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
 especially in old gen, and little space is reclaimed.  Eventually these nodes 
 OOM:
 {code}
 ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
 forcefully due to: java.lang.OutOfMemoryError: Java heap space
 {code}
 We use G1 with the following settings:
 Max heap = 16G
 New size = 1.6G
 +UseTLAB
 +ResizeTLAB
 +PerfDisableSharedMem
 -UseBiasedLocking
 The nodes in question have average load profiles for the cluster, and caches 
 are disabled on all tables.  There is no obvious difference with the 
 problematic nodes, and no other clear signs of trouble.  Unfortunately we're 
 currently getting an assertion error when trying to get a heap dump, or I 
 would post that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9938) Significant GC pauses

2015-07-30 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland resolved CASSANDRA-9938.
--
Resolution: Invalid

 Significant GC pauses
 -

 Key: CASSANDRA-9938
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9938
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04, Java 1.8.0u45
Reporter: Robbie Strickland
  Labels: gc
 Fix For: 2.1.x

 Attachments: gc_log.txt


 We have an 18-node analytics cluster, running 2.1.7 patched with 
 CASSANDRA-9662.  On a couple of the nodes we are seeing very long GC pauses, 
 especially in old gen, and little space is reclaimed.  Eventually these nodes 
 OOM:
 {code}
 ERROR [SharedPool-Worker-167] 2015-07-30 00:36:20,746 
 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
 forcefully due to: java.lang.OutOfMemoryError: Java heap space
 {code}
 We use G1 with the following settings:
 Max heap = 16G
 New size = 1.6G
 +UseTLAB
 +ResizeTLAB
 +PerfDisableSharedMem
 -UseBiasedLocking
 The nodes in question have average load profiles for the cluster, and caches 
 are disabled on all tables.  There is no obvious difference with the 
 problematic nodes, and no other clear signs of trouble.  Unfortunately we're 
 currently getting an assertion error when trying to get a heap dump, or I 
 would post that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-29 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646274#comment-14646274
 ] 

Robbie Strickland commented on CASSANDRA-9914:
--

I'm going to say this was resolved by CASSANDRA-9662.

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
Assignee: Marcus Eriksson
 Fix For: 2.1.x

 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-29 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland resolved CASSANDRA-9914.
--
Resolution: Duplicate

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
Assignee: Marcus Eriksson
 Fix For: 2.1.x

 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9662) compactionManager reporting wrong pendingtasks

2015-07-29 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9662:
-
Reproduced In: 2.1.8, 2.0.16  (was: 2.0.16)

 compactionManager reporting wrong pendingtasks
 --

 Key: CASSANDRA-9662
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9662
 Project: Cassandra
  Issue Type: Bug
  Components: API
 Environment: OS: Amazon Linux AMI release 2015.03
 Cassandra: 2.0.16
 JVM: Java HotSpot(TM) 64-Bit Server VM (25.40-b25, mixed mode)
 Java: version 1.8.0_40, vendor Oracle Corporation
 CPU: 8 core Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Memory: 32G
Reporter: Tony Xu
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 2.1.9, 2.0.17

 Attachments: node1.jpg

   Original Estimate: 168h
  Remaining Estimate: 168h

 Yesterday I upgraded my Cassandra cluster from 2.0.14 to 2.0.16, after 
 upgrade, I am start seeing some strange behaviours of PendingTasks 
 reporting.
 The Cassandra repository I am using is datastax, steps I performed for 
 upgrade:
 yum update -y cassandra20
 The upgrade went fine, after upgrade cluster is operating okay. nodetool 
 info and nodetool status results looked fine. nodetool version is 
 reporting the correct version.
 But our monitoring system start reporting some crazy pendingtasks. For 
 example, pending taks for node1 sometimes jump from 0 to 15K for about 1 
 minute, then drop back to 0. This issue keeps occurring, didn't have this 
 issue with 2.0.14. Our monitoring system is checking the value of MBeans - 
 CompactionManager - PendingTasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-29 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646083#comment-14646083
 ] 

Robbie Strickland commented on CASSANDRA-9914:
--

[~krummas] I'm now running 2.1.8 patched with CASSANDRA-9662, and early 
indications are positive. I'll let it sit for a while to make sure.

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
Assignee: Marcus Eriksson
 Fix For: 2.1.x

 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Attachment: high_pending_compactions.txt

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
 Attachments: high_pending_compactions.txt


 We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
 each node.  It's showing millions of pending compaction tasks (but no actual 
 work in progress), and the CPUs are pegged on all three nodes.  The task 
 count goes down rapidly, but then jumps back up again seconds later.  All 
 tables are set to STCS.  The issue persists after restart, but takes a few 
 minutes before it becomes a problem.  SSTable counts are below 10 for every 
 table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Description: 
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again seconds later.  All tables are 
set to STCS.  The issue persists after restart, but takes a few minutes before 
it becomes a problem.  SSTable counts are below 10 for every table.  We're also 
seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with 
[CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
observed the same behavior.

  was:
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again.  All tables are set to STCS.  
The issue persists after restart, but takes a few minutes before it becomes a 
problem.  SSTable counts are below 10 for every table.  We're also seeing 20s 
Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with 
[CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
observed the same behavior.


 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland

 We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
 each node.  It's showing millions of pending compaction tasks (but no actual 
 work in progress), and the CPUs are pegged on all three nodes.  The task 
 count goes down rapidly, but then jumps back up again seconds later.  All 
 tables are set to STCS.  The issue persists after restart, but takes a few 
 minutes before it becomes a problem.  SSTable counts are below 10 for every 
 table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with 
 [CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
 observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)
Robbie Strickland created CASSANDRA-9914:


 Summary: Millions of fake pending compaction tasks + high CPU
 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland


We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again.  All tables are set to STCS.  
The issue persists after restart, but takes a few minutes before it becomes a 
problem.  SSTable counts are below 10 for every table.  We're also seeing 20s 
Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with 
[CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Description: 
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again seconds later.  All tables are 
set to STCS.  The issue persists after restart, but takes a few minutes before 
it becomes a problem.  SSTable counts are below 10 for every table.  We're also 
seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, but we 
observed the same behavior.

  was:
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again seconds later.  All tables are 
set to STCS.  The issue persists after restart, but takes a few minutes before 
it becomes a problem.  SSTable counts are below 10 for every table.  We're also 
seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with 
[CASSANDRA-9637|https://issues.apache.org/jira/browse/CASSANDRA-9637], but we 
observed the same behavior.


 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland

 We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
 each node.  It's showing millions of pending compaction tasks (but no actual 
 work in progress), and the CPUs are pegged on all three nodes.  The task 
 count goes down rapidly, but then jumps back up again seconds later.  All 
 tables are set to STCS.  The issue persists after restart, but takes a few 
 minutes before it becomes a problem.  SSTable counts are below 10 for every 
 table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Attachment: cass_high_cpu.png

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
 each node.  It's showing millions of pending compaction tasks (but no actual 
 work in progress), and the CPUs are pegged on all three nodes.  The task 
 count goes down rapidly, but then jumps back up again seconds later.  All 
 tables are set to STCS.  The issue persists after restart, but takes a few 
 minutes before it becomes a problem.  SSTable counts are below 10 for every 
 table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9914:
-
Description: 
We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* and 
about 10GB of data on each node.  It's showing millions of pending compaction 
tasks (but no actual work in progress), and the CPUs are pegged on all three 
nodes.  The task count goes down rapidly, but then jumps back up again seconds 
later.  All tables are set to STCS.  The issue persists after restart, but 
takes a few minutes before it becomes a problem.  SSTable counts are below 10 
for every table.  We're also seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, but we 
observed the same behavior.

  was:
We have a 3-node test cluster with *zero traffic* and about 10GB of data on 
each node.  It's showing millions of pending compaction tasks (but no actual 
work in progress), and the CPUs are pegged on all three nodes.  The task count 
goes down rapidly, but then jumps back up again seconds later.  All tables are 
set to STCS.  The issue persists after restart, but takes a few minutes before 
it becomes a problem.  SSTable counts are below 10 for every table.  We're also 
seeing 20s Old Gen GC pauses about every 2-3 mins.

This started happening after bulk loading some old data.  We started seeing 
very long GC pauses (sometimes 30 min or more) that would bring down the nodes. 
 We then truncated this table, which resulted in the current behavior.  We 
attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, but we 
observed the same behavior.


 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645037#comment-14645037
 ] 

Robbie Strickland commented on CASSANDRA-9914:
--

[~yukim] Possibly, but I don't think so.  The symptoms in CASSANDRA-9662 are 
momentary, whereas this is persists indefinitely (and gets worse over time) 
with much higher task counts.

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
Assignee: Marcus Eriksson
 Fix For: 2.1.x

 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645037#comment-14645037
 ] 

Robbie Strickland edited comment on CASSANDRA-9914 at 7/28/15 8:57 PM:
---

[~yukim] Possibly, but I don't think so.  The symptoms in CASSANDRA-9662 are 
momentary, whereas this persists indefinitely (and gets worse over time) with 
much higher task counts.


was (Author: rstrickland):
[~yukim] Possibly, but I don't think so.  The symptoms in CASSANDRA-9662 are 
momentary, whereas this is persists indefinitely (and gets worse over time) 
with much higher task counts.

 Millions of fake pending compaction tasks + high CPU
 

 Key: CASSANDRA-9914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: CentOS
Reporter: Robbie Strickland
Assignee: Marcus Eriksson
 Fix For: 2.1.x

 Attachments: cass_high_cpu.png, high_pending_compactions.txt


 We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
 and about 10GB of data on each node.  It's showing millions of pending 
 compaction tasks (but no actual work in progress), and the CPUs are pegged on 
 all three nodes.  The task count goes down rapidly, but then jumps back up 
 again seconds later.  All tables are set to STCS.  The issue persists after 
 restart, but takes a few minutes before it becomes a problem.  SSTable counts 
 are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
 every 2-3 mins.
 This started happening after bulk loading some old data.  We started seeing 
 very long GC pauses (sometimes 30 min or more) that would bring down the 
 nodes.  We then truncated this table, which resulted in the current behavior. 
  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
 but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9666) Provide an alternative to DTCS

2015-07-10 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622822#comment-14622822
 ] 

Robbie Strickland commented on CASSANDRA-9666:
--

I'd like to second the changes [~krummas] suggested at the very least, as I 
agree that it's a saner scheme than tiers.  DTCS basically never stops 
compacting under normal conditions, and for most use cases there's little 
real benefit in compacting older data into larger sstables.  

 Provide an alternative to DTCS
 --

 Key: CASSANDRA-9666
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeff Jirsa
Assignee: Jeff Jirsa
 Fix For: 2.1.x, 2.2.x


 DTCS is great for time series data, but it comes with caveats that make it 
 difficult to use in production (typical operator behaviors such as bootstrap, 
 removenode, and repair have MAJOR caveats as they relate to 
 max_sstable_age_days, and hints/read repair break the selection algorithm).
 I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
 the tiered nature of DTCS in order to address some of DTCS' operational 
 shortcomings. I believe it is necessary to propose an alternative rather than 
 simply adjusting DTCS, because it fundamentally removes the tiered nature in 
 order to remove the parameter max_sstable_age_days - the result is very very 
 different, even if it is heavily inspired by DTCS. 
 Specifically, rather than creating a number of windows of ever increasing 
 sizes, this strategy allows an operator to choose the window size, compact 
 with STCS within the first window of that size, and aggressive compact down 
 to a single sstable once that window is no longer current. The window size is 
 a combination of unit (minutes, hours, days) and size (1, etc), such that an 
 operator can expect all data using a block of that size to be compacted 
 together (that is, if your unit is hours, and size is 6, you will create 
 roughly 4 sstables per day, each one containing roughly 6 hours of data). 
 The result addresses a number of the problems with 
 DateTieredCompactionStrategy:
 - At the present time, DTCS’s first window is compacted using an unusual 
 selection criteria, which prefers files with earlier timestamps, but ignores 
 sizes. In TimeWindowCompactionStrategy, the first window data will be 
 compacted with the well tested, fast, reliable STCS. All STCS options can be 
 passed to TimeWindowCompactionStrategy to configure the first window’s 
 compaction behavior.
 - HintedHandoff may put old data in new sstables, but it will have little 
 impact other than slightly reduced efficiency (sstables will cover a wider 
 range, but the old timestamps will not impact sstable selection criteria 
 during compaction)
 - ReadRepair may put old data in new sstables, but it will have little impact 
 other than slightly reduced efficiency (sstables will cover a wider range, 
 but the old timestamps will not impact sstable selection criteria during 
 compaction)
 - Small, old sstables resulting from streams of any kind will be swiftly and 
 aggressively compacted with the other sstables matching their similar 
 maxTimestamp, without causing sstables in neighboring windows to grow in size.
 - The configuration options are explicit and straightforward - the tuning 
 parameters leave little room for error. The window is set in common, easily 
 understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
 minute/hour/day options are granular enough for users keeping data for hours, 
 and users keeping data for years. 
 - There is no explicitly configurable max sstable age, though sstables will 
 naturally stop compacting once new data is written in that window. 
 - Streaming operations can create sstables with old timestamps, and they'll 
 naturally be joined together with sstables in the same time bucket. This is 
 true for bootstrap/repair/sstableloader/removenode. 
 - It remains true that if old data and new data is written into the memtable 
 at the same time, the resulting sstables will be treated as if they were new 
 sstables, however, that no longer negatively impacts the compaction 
 strategy’s selection criteria for older windows. 
 Patch provided for both 2.1 ( 
 https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 ) and 2.2 ( 
 https://github.com/jeffjirsa/cassandra/commits/twcs )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-24 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599684#comment-14599684
 ] 

Robbie Strickland commented on CASSANDRA-9607:
--

2.1.7 with the patch works great.  Thanks!

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.x, 2.2.x

 Attachments: GC_state.png, cassandra.yaml, client_blocked_thread.png, 
 cpu_profile.png, dump.tdump, load.png, log.zip, schema.zip, vm_monitor.png


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596022#comment-14596022
 ] 

Robbie Strickland commented on CASSANDRA-9607:
--

[~tjake] I'll go wrangle someone from devops to get me set up to do that.  BTW, 
we did just downgrade the production cluster from 2.1.5 to 2.1.4, and our Spark 
jobs all work now.  Two weeks of downtime.  Yay.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.x, 2.2.0 rc2

 Attachments: cassandra.yaml, load.png, log.zip, schema.zip


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-22 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9607:
-
Attachment: vm_monitor.png
GC_state.png
dump.tdump
cpu_profile.png
client_blocked_thread.png

I was able to bring the data down to my local machine and replicate the issue 
on a fresh install while profiling.  I've attached screen shots of the session, 
as well as the thread state on the client while it's happening.  You can see 
the server blocking on select and the client blocking on accept, which of 
course causes both ends to become unresponsive.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.x, 2.2.x

 Attachments: GC_state.png, cassandra.yaml, client_blocked_thread.png, 
 cpu_profile.png, dump.tdump, load.png, log.zip, schema.zip, vm_monitor.png


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-22 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9607:
-
Since Version: 2.1.5  (was: 2.1.6)

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.x, 2.2.x

 Attachments: GC_state.png, cassandra.yaml, client_blocked_thread.png, 
 cpu_profile.png, dump.tdump, load.png, log.zip, schema.zip, vm_monitor.png


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596605#comment-14596605
 ] 

Robbie Strickland edited comment on CASSANDRA-9607 at 6/22/15 8:43 PM:
---

I was able to bring the data down to my local machine and replicate the issue 
on a fresh 2.1.5 install while profiling.  I've attached screen shots of the 
session, as well as the thread state on the client while it's happening.  You 
can see the server blocking on select and the client blocking on accept, which 
of course causes both ends to become unresponsive.


was (Author: rstrickland):
I was able to bring the data down to my local machine and replicate the issue 
on a fresh install while profiling.  I've attached screen shots of the session, 
as well as the thread state on the client while it's happening.  You can see 
the server blocking on select and the client blocking on accept, which of 
course causes both ends to become unresponsive.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.x, 2.2.x

 Attachments: GC_state.png, cassandra.yaml, client_blocked_thread.png, 
 cpu_profile.png, dump.tdump, load.png, log.zip, schema.zip, vm_monitor.png


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595975#comment-14595975
 ] 

Robbie Strickland commented on CASSANDRA-9607:
--

I did some further testing on this, to at least isolate the affected versions 
and nail down steps to reproduce.  I started with a test cluster, 3 x 
i2.4xlarge instances running 2.1.4 and Spark 1.2.2.  I loaded a few sstables 
(using sstableloader) from one of my problematic production tables, and each 
test involved the following simple statement in the Spark shell:

{code}
sc.cassandraTable(prod_analytics_events, 
profileevents).take(100).foreach(println)
{code}

I then upgraded to 2.1.6, loaded about 100G of data, and downgraded back to 
2.1.4.  The net result is that 2.1.4 works with both the small and larger data 
sets, while 2.1.5 and 2.1.6 work only with only the smaller data sets.  The 
larger set caused the hang in both cases.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.x, 2.2.0 rc2

 Attachments: cassandra.yaml, load.png, log.zip, schema.zip


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595975#comment-14595975
 ] 

Robbie Strickland edited comment on CASSANDRA-9607 at 6/22/15 2:25 PM:
---

I did some further testing on this, to at least isolate the affected versions 
and nail down steps to reproduce.  I started with a test cluster, 3 x 
i2.4xlarge instances running 2.1.4 and Spark 1.2.2.  I loaded a few sstables 
(using sstableloader) from one of my problematic production tables, and each 
test involved the following simple statement in the Spark shell:

{code}
sc.cassandraTable(mykeyspace, mytable).take(100).foreach(println)
{code}

I then upgraded to 2.1.6, loaded about 100G of data, and downgraded back to 
2.1.4.  The net result is that 2.1.4 works with both the small and larger data 
sets, while 2.1.5 and 2.1.6 work only with only the smaller data sets (the 
2.1.5 result comes from my production cluster).  The larger set caused the hang 
in both cases.


was (Author: rstrickland):
I did some further testing on this, to at least isolate the affected versions 
and nail down steps to reproduce.  I started with a test cluster, 3 x 
i2.4xlarge instances running 2.1.4 and Spark 1.2.2.  I loaded a few sstables 
(using sstableloader) from one of my problematic production tables, and each 
test involved the following simple statement in the Spark shell:

{code}
sc.cassandraTable(prod_analytics_events, 
profileevents).take(100).foreach(println)
{code}

I then upgraded to 2.1.6, loaded about 100G of data, and downgraded back to 
2.1.4.  The net result is that 2.1.4 works with both the small and larger data 
sets, while 2.1.5 and 2.1.6 work only with only the smaller data sets (the 
2.1.5 result comes from my production cluster).  The larger set caused the hang 
in both cases.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.x, 2.2.0 rc2

 Attachments: cassandra.yaml, load.png, log.zip, schema.zip


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-22 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595975#comment-14595975
 ] 

Robbie Strickland edited comment on CASSANDRA-9607 at 6/22/15 2:24 PM:
---

I did some further testing on this, to at least isolate the affected versions 
and nail down steps to reproduce.  I started with a test cluster, 3 x 
i2.4xlarge instances running 2.1.4 and Spark 1.2.2.  I loaded a few sstables 
(using sstableloader) from one of my problematic production tables, and each 
test involved the following simple statement in the Spark shell:

{code}
sc.cassandraTable(prod_analytics_events, 
profileevents).take(100).foreach(println)
{code}

I then upgraded to 2.1.6, loaded about 100G of data, and downgraded back to 
2.1.4.  The net result is that 2.1.4 works with both the small and larger data 
sets, while 2.1.5 and 2.1.6 work only with only the smaller data sets (the 
2.1.5 result comes from my production cluster).  The larger set caused the hang 
in both cases.


was (Author: rstrickland):
I did some further testing on this, to at least isolate the affected versions 
and nail down steps to reproduce.  I started with a test cluster, 3 x 
i2.4xlarge instances running 2.1.4 and Spark 1.2.2.  I loaded a few sstables 
(using sstableloader) from one of my problematic production tables, and each 
test involved the following simple statement in the Spark shell:

{code}
sc.cassandraTable(prod_analytics_events, 
profileevents).take(100).foreach(println)
{code}

I then upgraded to 2.1.6, loaded about 100G of data, and downgraded back to 
2.1.4.  The net result is that 2.1.4 works with both the small and larger data 
sets, while 2.1.5 and 2.1.6 work only with only the smaller data sets.  The 
larger set caused the hang in both cases.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Tyler Hobbs
Priority: Critical
 Fix For: 2.1.x, 2.2.0 rc2

 Attachments: cassandra.yaml, load.png, log.zip, schema.zip


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-19 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593716#comment-14593716
 ] 

Robbie Strickland commented on CASSANDRA-9607:
--

This is interesting, because we are currently troubleshooting these exact 
symptoms on our analytics cluster when querying large tables using Spark.  We 
had suspected sstable corruption, since some tables do work.  But size appears 
to matter.  Further, as I read your comment we had just finished loading one of 
the problematic tables into a test cluster running 2.1.4, and the same Spark 
job runs problem free.  I am quite sure there's a correlation here.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Assignee: Benedict
Priority: Critical
 Attachments: cassandra.yaml, load.png, log.zip


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-18 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591977#comment-14591977
 ] 

Robbie Strickland commented on CASSANDRA-9607:
--

Yes it does.  It seems from our observation to be related to 1) our heavy use 
of DTCS combined with 
[CASSANDRA-9549|https://issues.apache.org/jira/browse/CASSANDRA-9549] and 2) 
severe GC pauses (such that it was GCing constantly).  

We were able to make things stable by moving to G1 with the following 
modifications:

{{code}}
#JVM_OPTS=$JVM_OPTS -Xmn${HEAP_NEWSIZE}
JVM_OPTS=$JVM_OPTS -XX:+UseG1GC
JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=1000
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
{{code}}

This is still a work in progress, but it has allowed us to reach a stable state.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7
Reporter: Study Hsueh
Priority: Critical
 Attachments: load.png


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-18 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591977#comment-14591977
 ] 

Robbie Strickland edited comment on CASSANDRA-9607 at 6/18/15 3:32 PM:
---

Yes it does.  It seems from our observation to be related to 1) our heavy use 
of DTCS combined with 
[CASSANDRA-9549|https://issues.apache.org/jira/browse/CASSANDRA-9549] and 2) 
severe GC pauses (such that it was GCing constantly).  

We were able to make things stable by moving to G1 with the following 
modifications:

{code}
#JVM_OPTS=$JVM_OPTS -Xmn${HEAP_NEWSIZE}
JVM_OPTS=$JVM_OPTS -XX:+UseG1GC
JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=1000
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
{code}

This is still a work in progress, but it has allowed us to reach a stable state.


was (Author: rstrickland):
Yes it does.  It seems from our observation to be related to 1) our heavy use 
of DTCS combined with 
[CASSANDRA-9549|https://issues.apache.org/jira/browse/CASSANDRA-9549] and 2) 
severe GC pauses (such that it was GCing constantly).  

We were able to make things stable by moving to G1 with the following 
modifications:

{{code}}
#JVM_OPTS=$JVM_OPTS -Xmn${HEAP_NEWSIZE}
JVM_OPTS=$JVM_OPTS -XX:+UseG1GC
JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=1000
JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
{{code}}

This is still a work in progress, but it has allowed us to reach a stable state.

 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7
Reporter: Study Hsueh
Priority: Critical
 Attachments: load.png


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9607) Get high load after upgrading from 2.1.3 to cassandra 2.1.6

2015-06-18 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-9607:
-
Environment: 
OS: 
CentOS 6 * 4
Ubuntu 14.04 * 2

JDK: Oracle JDK 7, Oracle JDK 8

VM: Azure VM Standard A3 * 6
RAM: 7 GB
Cores: 4

  was:
OS: 
CentOS 6 * 4
Ubuntu 14.04 * 2

JDK: Oracle JDK 7

VM: Azure VM Standard A3 * 6
RAM: 7 GB
Cores: 4


 Get high load after upgrading from 2.1.3 to cassandra 2.1.6
 ---

 Key: CASSANDRA-9607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9607
 Project: Cassandra
  Issue Type: Bug
 Environment: OS: 
 CentOS 6 * 4
 Ubuntu 14.04 * 2
 JDK: Oracle JDK 7, Oracle JDK 8
 VM: Azure VM Standard A3 * 6
 RAM: 7 GB
 Cores: 4
Reporter: Study Hsueh
Priority: Critical
 Attachments: load.png


 After upgrading cassandra version from 2.1.3 to 2.1.6, the average load of my 
 cassandra cluster grows from 0.x~1.x to 3.x~6.x. 
 What kind of additional information should I provide for this problem?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9549) Memory leak

2015-06-15 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586460#comment-14586460
 ] 

Robbie Strickland commented on CASSANDRA-9549:
--

We also experience this issue on 2.1.5, and also running DTCS.

 Memory leak 
 

 Key: CASSANDRA-9549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9549
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Cassandra 2.1.5. 9 node cluster in EC2 (m1.large nodes, 
 2 cores 7.5G memory, 800G platter for cassandra data, root partition and 
 commit log are on SSD EBS with sufficient IOPS), 3 nodes/availablity zone, 1 
 replica/zone
 JVM: /usr/java/jdk1.8.0_40/jre/bin/java 
 JVM Flags besides CP: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar 
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
 -XX:ThreadPriorityPolicy=42 -Xms2G -Xmx2G -Xmn200M 
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:+UseTLAB -XX:CompileCommandFile=/etc/cassandra/conf/hotspot_compiler 
 -XX:CMSWaitDuration=1 -XX:+CMSParallelInitialMarkEnabled 
 -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=1 -XX:+UseCondCardMark 
 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.rmi.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dlogback.configurationFile=logback.xml -Dcassandra.logdir=/var/log/cassandra 
 -Dcassandra.storagedir= -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid 
 Kernel: Linux 2.6.32-504.16.2.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ivar Thorson
Priority: Critical
 Fix For: 2.1.x

 Attachments: c4_system.log, c7fromboot.zip, cassandra.yaml, 
 cpu-load.png, memoryuse.png, ref-java-errors.jpeg, suspect.png, two-loads.png


 We have been experiencing a severe memory leak with Cassandra 2.1.5 that, 
 over the period of a couple of days, eventually consumes all of the available 
 JVM heap space, putting the JVM into GC hell where it keeps trying CMS 
 collection but can't free up any heap space. This pattern happens for every 
 node in our cluster and is requiring rolling cassandra restarts just to keep 
 the cluster running. We have upgraded the cluster per Datastax docs from the 
 2.0 branch a couple of months ago and have been using the data from this 
 cluster for more than a year without problem.
 As the heap fills up with non-GC-able objects, the CPU/OS load average grows 
 along with it. Heap dumps reveal an increasing number of 
 java.util.concurrent.ConcurrentLinkedQueue$Node objects. We took heap dumps 
 over a 2 day period, and watched the number of Node objects go from 4M, to 
 19M, to 36M, and eventually about 65M objects before the node stops 
 responding. The screen capture of our heap dump is from the 19M measurement.
 Load on the cluster is minimal. We can see this effect even with only a 
 handful of writes per second. (See attachments for Opscenter snapshots during 
 very light loads and heavier loads). Even with only 5 reads a sec we see this 
 behavior.
 Log files show repeated errors in Ref.java:181 and Ref.java:279 and LEAK 
 detected messages:
 {code}
 ERROR [CompactionExecutor:557] 2015-06-01 18:27:36,978 Ref.java:279 - Error 
 when closing class 
 org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@1302301946:/data1/data/ourtablegoeshere-ka-1150
 java.util.concurrent.RejectedExecutionException: Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@32680b31 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@573464d6[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1644]
 {code}
 {code}
 ERROR [Reference-Reaper:1] 2015-06-01 18:27:37,083 Ref.java:181 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@74b5df92) to class 
 org.apache.cassandra.io.sstable.SSTableReader$DescriptorTypeTidy@2054303604:/data2/data/ourtablegoeshere-ka-1151
  was not released before the reference was garbage collected
 {code}
 This might be related to [CASSANDRA-8723]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8061) tmplink files are not removed

2015-06-12 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584007#comment-14584007
 ] 

Robbie Strickland commented on CASSANDRA-8061:
--

[~benedict] I can confirm that I made an invalid assumption that the issue was 
related, but in fact the files are transient and the issue is CASSANDRA-9850.

 tmplink files are not removed
 -

 Key: CASSANDRA-8061
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8061
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Linux
Reporter: Gianluca Borello
Assignee: Joshua McKenzie
 Fix For: 2.1.x

 Attachments: 8061_v1.txt, 8248-thread_dump.txt


 After installing 2.1.0, I'm experiencing a bunch of tmplink files that are 
 filling my disk. I found https://issues.apache.org/jira/browse/CASSANDRA-7803 
 and that is very similar, and I confirm it happens both on 2.1.0 as well as 
 from the latest commit on the cassandra-2.1 branch 
 (https://github.com/apache/cassandra/commit/aca80da38c3d86a40cc63d9a122f7d45258e4685
  from the cassandra-2.1)
 Even starting with a clean keyspace, after a few hours I get:
 {noformat}
 $ sudo find /raid0 | grep tmplink | xargs du -hs
 2.7G  
 /raid0/cassandra/data/draios/protobuf1-ccc6dce04beb11e4abf997b38fbf920b/draios-protobuf1-tmplink-ka-4515-Data.db
 13M   
 /raid0/cassandra/data/draios/protobuf1-ccc6dce04beb11e4abf997b38fbf920b/draios-protobuf1-tmplink-ka-4515-Index.db
 1.8G  
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-1788-Data.db
 12M   
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-1788-Index.db
 5.2M  
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-2678-Index.db
 822M  
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-2678-Data.db
 7.3M  
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3283-Index.db
 1.2G  
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3283-Data.db
 6.7M  
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3951-Index.db
 1.1G  
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-3951-Data.db
 11M   
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-4799-Index.db
 1.7G  
 /raid0/cassandra/data/draios/protobuf_by_agent1-cd071a304beb11e4abf997b38fbf920b/draios-protobuf_by_agent1-tmplink-ka-4799-Data.db
 812K  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-234-Index.db
 122M  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-208-Data.db
 744K  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-739-Index.db
 660K  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-193-Index.db
 796K  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-230-Index.db
 137M  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-230-Data.db
 161M  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-269-Data.db
 139M  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-234-Data.db
 940K  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-786-Index.db
 936K  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-269-Index.db
 161M  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-786-Data.db
 672K  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-197-Index.db
 113M  
 /raid0/cassandra/data/draios/mounted_fs_by_agent1-d7bf3e304beb11e4abf997b38fbf920b/draios-mounted_fs_by_agent1-tmplink-ka-193-Data.db
 116M  
 

[jira] [Commented] (CASSANDRA-8061) tmplink files are not removed

2015-06-12 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583927#comment-14583927
 ] 

Robbie Strickland commented on CASSANDRA-8061:
--

I can also verify I am seeing this after upgrading to 2.1.6.  It breaks 
nodetool cfstats with an AssertionError:

{noformat}
error: 
/var/lib/cassandra/xvdb/data/prod_analytics_events/locationupdateevents-52f73af0fd5111e489f75b9deb90b453/prod_analytics_events-locationupdateevents-tmplink-ka-1460-Data.db
-- StackTrace --
java.lang.AssertionError: 
/var/lib/cassandra/xvdb/data/prod_analytics_events/locationupdateevents-52f73af0fd5111e489f75b9deb90b453/prod_analytics_events-locationupdateevents-tmplink-ka-1460-Data.db
at 
org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:270)
at 
org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:296)
at 
org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:290)
at 
com.yammer.metrics.reporting.JmxReporter$Gauge.getValue(JmxReporter.java:63)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:83)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:206)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:647)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1443)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:637)
at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$241(TCPTransport.java:683)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$1/602091790.run(Unknown
 Source)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

 tmplink files are not removed
 -

 Key: CASSANDRA-8061
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8061
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Linux
Reporter: Gianluca Borello
Assignee: Joshua McKenzie
 Fix For: 2.1.x

 Attachments: 8061_v1.txt, 8248-thread_dump.txt


 After installing 2.1.0, I'm experiencing a bunch of tmplink files that are 
 filling my disk. I found https://issues.apache.org/jira/browse/CASSANDRA-7803 
 and that is very similar, and I confirm it 

[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes

2015-03-27 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383640#comment-14383640
 ] 

Robbie Strickland commented on CASSANDRA-8717:
--

FWIW, I spoke with several other teams at Spark Summit last week that would
really like this patch for the same reason.



 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Assignee: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 3.0

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes

2015-02-02 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301452#comment-14301452
 ] 

Robbie Strickland commented on CASSANDRA-8717:
--

[~iamaleksey] Have you looked at the patch?  There's barely anything to it, and 
yet it opens up the door for guys like Stratio to plug in more advanced index 
implementations without breaking anything (i.e. no need for their fork, which 
is a good thing).  Plus who knows when 3.0 will go mainstream?  I think you 
should reconsider, or at least get some other input.

 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Assignee: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 3.0

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8717) Top-k queries with custom secondary indexes

2015-02-02 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301395#comment-14301395
 ] 

Robbie Strickland commented on CASSANDRA-8717:
--

Prior to this patch being submitted, I went through this same exercise and 
patched 2.1 mainline with these changes.  I couldn't see where it broke 
anything, and it allows users to drop in Stratio's (or their own) custom index 
implementation.  This is a big win!

 Top-k queries with custom secondary indexes
 ---

 Key: CASSANDRA-8717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8717
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Andrés de la Peña
Priority: Minor
  Labels: 2i, secondary_index, sort, sorting, top-k
 Fix For: 2.1.3

 Attachments: 0001-Add-support-for-top-k-queries-in-2i.patch


 As presented in [Cassandra Summit Europe 
 2014|https://www.youtube.com/watch?v=Hg5s-hXy_-M], secondary indexes can be 
 modified to support general top-k queries with minimum changes in Cassandra 
 codebase. This way, custom 2i implementations could provide relevance search, 
 sorting by columns, etc.
 Top-k queries retrieve the k best results for a certain query. That implies 
 querying the k best rows in each token range and then sort them in order to 
 obtain the k globally best rows. 
 For doing that, we propose two additional methods in class 
 SecondaryIndexSearcher:
 {code:java}
 public boolean requiresFullScan(ListIndexExpression clause)
 {
 return false;
 }
 public ListRow sort(ListIndexExpression clause, ListRow rows)
 {
 return rows;
 }
 {code}
 The first one indicates if a query performed in the index requires querying 
 all the nodes in the ring. It is necessary in top-k queries because we do not 
 know which node are the best results. The second method specifies how to sort 
 all the partial node results according to the query. 
 Then we add two similar methods to the class AbstractRangeCommand:
 {code:java}
 this.searcher = 
 Keyspace.open(keyspace).getColumnFamilyStore(columnFamily).indexManager.searcher(rowFilter);
 public boolean requiresFullScan() {
 return searcher == null ? false : searcher.requiresFullScan(rowFilter);
 }
 public ListRow combine(ListRow rows)
 {
 return searcher == null ? trim(rows) : trim(searcher.sort(rowFilter, 
 rows));
 }
 {code}
 Finnally, we modify StorageProxy#getRangeSlice to use the previous method, as 
 shown in the attached patch.
 We think that the proposed approach provides very useful functionality with 
 minimum impact in current codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-4476) Support 2ndary index queries with only non-EQ clauses

2014-10-14 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland reassigned CASSANDRA-4476:


Assignee: Robbie Strickland

 Support 2ndary index queries with only non-EQ clauses
 -

 Key: CASSANDRA-4476
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4476
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Reporter: Sylvain Lebresne
Assignee: Robbie Strickland
Priority: Minor
  Labels: cql
 Fix For: 2.1.1


 Currently, a query that uses 2ndary indexes must have at least one EQ clause 
 (on an indexed column). Given that indexed CFs are local (and use 
 LocalPartitioner that order the row by the type of the indexed column), we 
 should extend 2ndary indexes to allow querying indexed columns even when no 
 EQ clause is provided.
 As far as I can tell, the main problem to solve for this is to update 
 KeysSearcher.highestSelectivityPredicate(). I.e. how do we estimate the 
 selectivity of non-EQ clauses? I note however that if we can do that estimate 
 reasonably accurately, this might provide better performance even for index 
 queries that both EQ and non-EQ clauses, because some non-EQ clauses may have 
 a much better selectivity than EQ ones (say you index both the user country 
 and birth date, for SELECT * FROM users WHERE country = 'US' AND birthdate  
 'Jan 2009' AND birtdate  'July 2009', you'd better use the birthdate index 
 first).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-16 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7930:
-
Attachment: cassandra-2.0-v6.txt

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
  Labels: bootcamp, jmx
 Attachments: cassandra-2.0-v2.txt, cassandra-2.0-v3.txt, 
 cassandra-2.0-v4.txt, cassandra-2.0-v5.txt, cassandra-2.0-v6.txt, 
 cassandra-2.0.txt, cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-16 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135855#comment-14135855
 ] 

Robbie Strickland commented on CASSANDRA-7930:
--

I simply backported the metrics class to 2.0 so I could add the cache-related 
stuff, so I'm happy to change as long as it's consistent with whatever you 
ultimately decide for 2.1.

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
  Labels: bootcamp, jmx
 Attachments: cassandra-2.0-v2.txt, cassandra-2.0-v3.txt, 
 cassandra-2.0-v4.txt, cassandra-2.0-v5.txt, cassandra-2.0-v6.txt, 
 cassandra-2.0.txt, cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-16 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135992#comment-14135992
 ] 

Robbie Strickland commented on CASSANDRA-7930:
--

Wow, you're right.  This looks like a duplicate of that ticket, just 
implemented slightly differently.  I brought up the issue at boot camp, and it 
looks like Nate and I both worked on it simultaneously...

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
  Labels: bootcamp, jmx
 Attachments: cassandra-2.0-v2.txt, cassandra-2.0-v3.txt, 
 cassandra-2.0-v4.txt, cassandra-2.0-v5.txt, cassandra-2.0-v6.txt, 
 cassandra-2.0.txt, cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-16 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136023#comment-14136023
 ] 

Robbie Strickland commented on CASSANDRA-7930:
--

Sure, I'll wait until you get things sorted out.

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
  Labels: bootcamp, jmx
 Attachments: cassandra-2.0-v2.txt, cassandra-2.0-v3.txt, 
 cassandra-2.0-v4.txt, cassandra-2.0-v5.txt, cassandra-2.0-v6.txt, 
 cassandra-2.0.txt, cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-15 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7930:
-
Attachment: cassandra-2.0-v5.txt

Changed scheduler to use StorageService instead of new executor.

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
 Attachments: cassandra-2.0-v2.txt, cassandra-2.0-v3.txt, 
 cassandra-2.0-v4.txt, cassandra-2.0-v5.txt, cassandra-2.0.txt, 
 cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-13 Thread Robbie Strickland (JIRA)
Robbie Strickland created CASSANDRA-7930:


 Summary: Warn when evicting prepared statements from cache
 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland


The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
There is currently no warning when statements are evicted, which could be 
problematic if the user is unaware that this is happening.

At the very least, we should provide a JMX metric and possibly a log message 
indicating this is happening.  At some point it may also be worthwhile to make 
this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-13 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7930:
-
Attachment: cassandra-2.1.txt

The attached patch adds a metric for evicted statements and logs a warning when 
eviction occurs.

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
 Attachments: cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-13 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7930:
-
Attachment: cassandra-2.0.txt

Rebased against 2.0, and moved logging to a scheduled executor that logs new 
evictions once per minute.

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
 Attachments: cassandra-2.0.txt, cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-13 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7930:
-
Attachment: cassandra-2.0-v2.txt

Changed log level to debug.

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
 Attachments: cassandra-2.0-v2.txt, cassandra-2.0.txt, 
 cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-13 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7930:
-
Attachment: cassandra-2.0-v3.txt

One last change to add bytes to the log message, so it's not confused with 
max number of statements.

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
 Attachments: cassandra-2.0-v2.txt, cassandra-2.0-v3.txt, 
 cassandra-2.0.txt, cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7930) Warn when evicting prepared statements from cache

2014-09-13 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7930:
-
Attachment: cassandra-2.0-v4.txt

Changed log level to info since it's an infrequent message now.

 Warn when evicting prepared statements from cache
 -

 Key: CASSANDRA-7930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7930
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robbie Strickland
Assignee: Robbie Strickland
 Attachments: cassandra-2.0-v2.txt, cassandra-2.0-v3.txt, 
 cassandra-2.0-v4.txt, cassandra-2.0.txt, cassandra-2.1.txt


 The prepared statement cache is an LRU, with a max size of maxMemory / 256.  
 There is currently no warning when statements are evicted, which could be 
 problematic if the user is unaware that this is happening.
 At the very least, we should provide a JMX metric and possibly a log message 
 indicating this is happening.  At some point it may also be worthwhile to 
 make this tunable for users with large numbers of statements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7252) RingCache cannot be configured to use local DC only

2014-08-17 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100040#comment-14100040
 ] 

Robbie Strickland commented on CASSANDRA-7252:
--

[~iamaleksey] Do you want me to create another ticket for the address option?

 RingCache cannot be configured to use local DC only
 ---

 Key: CASSANDRA-7252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7252
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Robbie Strickland
Assignee: Robbie Strickland
  Labels: patch
 Fix For: 2.0.10, 2.1.1

 Attachments: cassandra-2.0.7-7252-2.txt, cassandra-2.0.7-7252.txt


 RingCache always calls describe_ring, returning the entire cluster.  
 Considering it's used in the context of writing from Hadoop (which is 
 typically in a multi-DC configuration), this is often not desirable behavior. 
  In some cases there may be high-latency connections between the analytics DC 
 and other DCs.
 I am attaching a patch that adds an optional config value to tell RingCache 
 to use local nodes only, in which case it calls describe_local_ring instead.  
 It also adds helpful failed host information to IOExceptions thrown in 
 AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
 and ColumnFamilyRecordWriter.  This allows a user to more easily solve 
 related connectivity issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7252) RingCache cannot be configured to use local DC only

2014-05-20 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003408#comment-14003408
 ] 

Robbie Strickland commented on CASSANDRA-7252:
--

[~iamaleksey] I see you changed the fix version to 2.0.9.  Do you want me to 
rebase the patch?

 RingCache cannot be configured to use local DC only
 ---

 Key: CASSANDRA-7252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7252
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Robbie Strickland
Assignee: Jonathan Ellis
  Labels: patch
 Fix For: 2.0.9

 Attachments: cassandra-2.0.7-7252-2.txt, cassandra-2.0.7-7252.txt


 RingCache always calls describe_ring, returning the entire cluster.  
 Considering it's used in the context of writing from Hadoop (which is 
 typically in a multi-DC configuration), this is often not desirable behavior. 
  In some cases there may be high-latency connections between the analytics DC 
 and other DCs.
 I am attaching a patch that adds an optional config value to tell RingCache 
 to use local nodes only, in which case it calls describe_local_ring instead.  
 It also adds helpful failed host information to IOExceptions thrown in 
 AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
 and ColumnFamilyRecordWriter.  This allows a user to more easily solve 
 related connectivity issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7252) RingCache cannot be configured to use local DC only

2014-05-16 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7252:
-

Attachment: cassandra-2.0.7-7252.txt

 RingCache cannot be configured to use local DC only
 ---

 Key: CASSANDRA-7252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7252
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Robbie Strickland
Assignee: Robbie Strickland
  Labels: patch
 Fix For: 2.0.7

 Attachments: cassandra-2.0.7-7252.txt


 RingCache always calls describe_ring, returning the entire cluster.  
 Considering it's used in the context of writing from Hadoop (which is 
 typically in a multi-DC configuration), this is often not desirable behavior. 
  In some cases there may be high-latency connections between the analytics DC 
 and other DCs.
 I am attaching a patch that adds an optional config value to tell RingCache 
 to use local nodes only.  It also adds helpful failed host information to 
 IOExceptions thrown in 
 AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
 and ColumnFamilyRecordWriter.  This allows a user to more easily solve 
 related connectivity issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7252) RingCache cannot be configured to use local DC only

2014-05-16 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7252:
-

Description: 
RingCache always calls describe_ring, returning the entire cluster.  
Considering it's used in the context of writing from Hadoop (which is typically 
in a multi-DC configuration), this is often not desirable behavior.  In some 
cases there may be high-latency connections between the analytics DC and other 
DCs.

I am attaching a patch that adds an optional config value to tell RingCache to 
use local nodes only, in which case it calls describe_local_ring instead.  It 
also adds helpful failed host information to IOExceptions thrown in 
AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
and ColumnFamilyRecordWriter.  This allows a user to more easily solve related 
connectivity issues.

  was:
RingCache always calls describe_ring, returning the entire cluster.  
Considering it's used in the context of writing from Hadoop (which is typically 
in a multi-DC configuration), this is often not desirable behavior.  In some 
cases there may be high-latency connections between the analytics DC and other 
DCs.

I am attaching a patch that adds an optional config value to tell RingCache to 
use local nodes only.  It also adds helpful failed host information to 
IOExceptions thrown in 
AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
and ColumnFamilyRecordWriter.  This allows a user to more easily solve related 
connectivity issues.


 RingCache cannot be configured to use local DC only
 ---

 Key: CASSANDRA-7252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7252
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Robbie Strickland
Assignee: Robbie Strickland
  Labels: patch
 Fix For: 2.0.7

 Attachments: cassandra-2.0.7-7252.txt


 RingCache always calls describe_ring, returning the entire cluster.  
 Considering it's used in the context of writing from Hadoop (which is 
 typically in a multi-DC configuration), this is often not desirable behavior. 
  In some cases there may be high-latency connections between the analytics DC 
 and other DCs.
 I am attaching a patch that adds an optional config value to tell RingCache 
 to use local nodes only, in which case it calls describe_local_ring instead.  
 It also adds helpful failed host information to IOExceptions thrown in 
 AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
 and ColumnFamilyRecordWriter.  This allows a user to more easily solve 
 related connectivity issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7252) RingCache cannot be configured to use local DC only

2014-05-16 Thread Robbie Strickland (JIRA)
Robbie Strickland created CASSANDRA-7252:


 Summary: RingCache cannot be configured to use local DC only
 Key: CASSANDRA-7252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7252
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Robbie Strickland
Assignee: Robbie Strickland


RingCache always calls describe_ring, returning the entire cluster.  
Considering it's used in the context of writing from Hadoop (which is typically 
in a multi-DC configuration), this is often not desirable behavior.  In some 
cases there may be high-latency connections between the analytics DC and other 
DCs.

I am attaching a patch that adds an optional config value to tell RingCache to 
use local nodes only.  It also adds helpful failed host information to 
IOExceptions thrown in 
AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
and ColumnFamilyRecordWriter.  This allows a user to more easily solve related 
connectivity issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7252) RingCache cannot be configured to use local DC only

2014-05-16 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland updated CASSANDRA-7252:
-

Attachment: cassandra-2.0.7-7252-2.txt

Updated patch to address the issue of RingCache always using broadcast_address. 
 This adds optional config to switch to using rpc_address, which is important 
for multi-region EC2 configurations where you have to specify the public IP for 
broadcast_address, whereas you may not want Hadoop using that IP.

 RingCache cannot be configured to use local DC only
 ---

 Key: CASSANDRA-7252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7252
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Robbie Strickland
Assignee: Robbie Strickland
  Labels: patch
 Fix For: 2.0.7

 Attachments: cassandra-2.0.7-7252-2.txt, cassandra-2.0.7-7252.txt


 RingCache always calls describe_ring, returning the entire cluster.  
 Considering it's used in the context of writing from Hadoop (which is 
 typically in a multi-DC configuration), this is often not desirable behavior. 
  In some cases there may be high-latency connections between the analytics DC 
 and other DCs.
 I am attaching a patch that adds an optional config value to tell RingCache 
 to use local nodes only, in which case it calls describe_local_ring instead.  
 It also adds helpful failed host information to IOExceptions thrown in 
 AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
 and ColumnFamilyRecordWriter.  This allows a user to more easily solve 
 related connectivity issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (CASSANDRA-7252) RingCache cannot be configured to use local DC only

2014-05-16 Thread Robbie Strickland (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robbie Strickland reassigned CASSANDRA-7252:


Assignee: Jonathan Ellis  (was: Robbie Strickland)

 RingCache cannot be configured to use local DC only
 ---

 Key: CASSANDRA-7252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7252
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Robbie Strickland
Assignee: Jonathan Ellis
  Labels: patch
 Fix For: 2.0.7

 Attachments: cassandra-2.0.7-7252-2.txt, cassandra-2.0.7-7252.txt


 RingCache always calls describe_ring, returning the entire cluster.  
 Considering it's used in the context of writing from Hadoop (which is 
 typically in a multi-DC configuration), this is often not desirable behavior. 
  In some cases there may be high-latency connections between the analytics DC 
 and other DCs.
 I am attaching a patch that adds an optional config value to tell RingCache 
 to use local nodes only, in which case it calls describe_local_ring instead.  
 It also adds helpful failed host information to IOExceptions thrown in 
 AbstractColumnFamilyOutputFormat.createAuthenticatedClient, CqlRecordWriter, 
 and ColumnFamilyRecordWriter.  This allows a user to more easily solve 
 related connectivity issues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7102) CqlStorage loads incorrect schema

2014-04-28 Thread Robbie Strickland (JIRA)
Robbie Strickland created CASSANDRA-7102:


 Summary: CqlStorage loads incorrect schema
 Key: CASSANDRA-7102
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7102
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
 Environment: OS-X, Hadoop 1.2.1, C* 2.0.0, Pig 0.12
Reporter: Robbie Strickland


When running C* 2.0.0 with Hadoop 1.2.1 and Pig 0.12, attempting to load a 
table using CqlStorage produces an invalid schema.

Given the following table:

CREATE TABLE checkins (
  user text,
  time bigint,
  checkinid text,
  address text,
  geohash text,
  lat double,
  lon double,
  PRIMARY KEY (user, time, checkinid)
)

I load my table in Pig as follows:

checkins = LOAD 'cql://wxcheckin/checkins' USING CqlStorage()

... which produces the following schema:

(user:chararray,time:long,checkinid:chararray,address:chararray,checkinid:chararray,geohash:chararray,lat:double,lon:double,time:long,user:chararray)

As you can see it repeats the fields in the PK, and it throws an error when any 
field is referenced:

Invalid field projection. Projected field [time] does not exist in schema: 
user:chararray,time:long,checkinid:chararray,address:chararray,checkinid:chararray,geohash:chararray,lat:double,lon:double,time:long,user:chararray.

Simply upgrading to C* 2.0.7 fixes the issue.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-6915) Show storage rows in cqlsh

2014-03-24 Thread Robbie Strickland (JIRA)
Robbie Strickland created CASSANDRA-6915:


 Summary: Show storage rows in cqlsh
 Key: CASSANDRA-6915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Robbie Strickland


In Cassandra it's super important to understand how your CQL schema translates 
to the underlying storage rows.  Right now the only way to see this is to 
create the schema in cqlsh, write some data, then query it using the CLI.  
Obviously we don't want to be encouraging people to use the CLI when it's 
supposed to be deprecated.  So I'd like to see a function in cqlsh to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6915) Show storage rows in cqlsh

2014-03-24 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945172#comment-13945172
 ] 

Robbie Strickland commented on CASSANDRA-6915:
--

It's the compound key (whether composite partition key or composite column) 
case that makes this useful--and I would still argue really important.  Yes you 
can read the documentation to understand the mapping, but I think this remains 
one of the most misunderstood concepts in CQL.  I would argue that it's 
important to understand the storage layer difference between PRIMARY KEY ((id, 
timestamp), event) and PRIMARY KEY (id, timestamp, event), and that the best 
way to see the difference is to visualize it.  People still don't seem to get 
the difference between partition keys and composite column names, and this 
obviously has huge implications for what sorts of queries you can run and how 
wide your rows will get.  

Perhaps something along the lines of:

CREATE TABLE MyTable (
id uuid,
timestamp int,
event string,
details string,
userId string,
PRIMARY KEY (id, timestamp, event)
);

EXPLAIN MyTable;

Partition Key: id (uuid)
Columns: 
timestamp:event:details (int:string:string)
timestamp:event:userId (int:string:string)

CREATE TABLE MyTable (
id uuid,
timestamp int,
event string,
details string,
userId string,
PRIMARY KEY ((id, timestamp), event)
);

EXPLAIN MyTable;

Partition Key: id:timestamp (uuid:int)
Columns: 
event:details (string)
event:userId (string)

 Show storage rows in cqlsh
 --

 Key: CASSANDRA-6915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6915
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Robbie Strickland
  Labels: cqlsh

 In Cassandra it's super important to understand how your CQL schema 
 translates to the underlying storage rows.  Right now the only way to see 
 this is to create the schema in cqlsh, write some data, then query it using 
 the CLI.  Obviously we don't want to be encouraging people to use the CLI 
 when it's supposed to be deprecated.  So I'd like to see a function in cqlsh 
 to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >