[jira] [Updated] (CASSANDRA-5922) Delete doesn't delete data.

2013-08-22 Thread Chris Eineke (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Eineke updated CASSANDRA-5922:


Description: 
In a nutshell, I'm running several test cases against my Cassandra JPA 
implementation (astyanax-jpa, see https://github.com/ceineke/astyanax-jpa) and 
sometimes (!) batched deletes seem not to delete all rows specified in the 
batch. 

Here's the sequence of prepared CQL3 statements that is causing the issue to 
appear:

TRUNCATE compositeentity;

(delete all records so we have a clean slate)

INSERT INTO compositeentity (compositekeypartone, compositekeyparttwo, 
compositekeypartthree, astring, auuid) VALUES (?, ?, ?, ?, ?);
INSERT INTO compositeentity (compositekeypartone, compositekeyparttwo, 
compositekeypartthree, astring, auuid) VALUES (?, ?, ?, ?, ?);

(insert two unique rows into the table)

SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;
SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;

(load both rows from the table to validate their existence)

SELECT COUNT(1) FROM compositeentity;

(counts rows to validate the number of records in the table)

BEGIN BATCH  DELETE  FROM compositeentity WHERE compositekeypartone = ? AND 
compositekeyparttwo = ? AND compositekeypartthree = ?; DELETE  FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?; APPLY BATCH;

(uses a logged batch to delete the two rows from the table)

SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;
SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;

(tries loads the rows from the table to check that they don't exist anymore)

After the delete, Cassandra has deleted only the first row so that the second 
SELECT here actually returns data. So far, this behaviour occurs randomly.

This happens even if there's a long sleep (1s, 10s) between the batch delete 
and the selects. It is always the second row that isn't deleted, never the 
first.

Thinking it might be timing issue (based on 
http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/),
 I've set up NTP to keep the clocks synchronized across all nodes (one node 
acts as the master which syncs to time.nrc.ca and {0,1,2,3}.ca.pool.ntp.org, 
whereas the remaining ones sync against the master. This hasn't reduced the 
number of times this behaviour crops up.

(I am executing all statements with QUORUM level consistency.)

I'm open to suggestions as to why this occurs and how I can fix it, if this can 
be fixed.

  was:
In a nutcase, I'm running several test cases against my Cassandra JPA 
implementation (astyanax-jpa, see https://github.com/ceineke/astyanax-jpa) and 
sometimes(!) batched deletes seem not to delete all data. 

Here's the sequence of prepared CQL3 statements that is causing the issue to 
appear:

TRUNCATE compositeentity;

(delete all records so we have a clean slate)

INSERT INTO compositeentity (compositekeypartone, compositekeyparttwo, 
compositekeypartthree, astring, auuid) VALUES (?, ?, ?, ?, ?);
INSERT INTO compositeentity (compositekeypartone, compositekeyparttwo, 
compositekeypartthree, astring, auuid) VALUES (?, ?, ?, ?, ?);

(insert two unique rows into the table)

SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;
SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;

(load both rows from the table to validate their existence)

SELECT COUNT(1) FROM compositeentity;

(counts rows to validate the number of records in the table)

BEGIN BATCH  DELETE  FROM compositeentity WHERE compositekeypartone = ? AND 
compositekeyparttwo = ? AND compositekeypartthree = ?; DELETE  FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?; APPLY BATCH;

(uses a logged batch to delete the two rows from the table)

SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;
SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypa

[jira] [Created] (CASSANDRA-5922) Delete doesn't delete data.

2013-08-22 Thread Chris Eineke (JIRA)
Chris Eineke created CASSANDRA-5922:
---

 Summary: Delete doesn't delete data.
 Key: CASSANDRA-5922
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5922
 Project: Cassandra
  Issue Type: Bug
 Environment: 4-node cluster w/ Cassandra v1.2.8
Oracle JDK 1.6.0_45
Netflix Astyanax 1.56.42
Quorum read and write consistency level
Reporter: Chris Eineke


In a nutcase, I'm running several test cases against my Cassandra JPA 
implementation (astyanax-jpa, see https://github.com/ceineke/astyanax-jpa) and 
sometimes(!) batched deletes seem not to delete all data. 

Here's the sequence of prepared CQL3 statements that is causing the issue to 
appear:

TRUNCATE compositeentity;

(delete all records so we have a clean slate)

INSERT INTO compositeentity (compositekeypartone, compositekeyparttwo, 
compositekeypartthree, astring, auuid) VALUES (?, ?, ?, ?, ?);
INSERT INTO compositeentity (compositekeypartone, compositekeyparttwo, 
compositekeypartthree, astring, auuid) VALUES (?, ?, ?, ?, ?);

(insert two unique rows into the table)

SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;
SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;

(load both rows from the table to validate their existence)

SELECT COUNT(1) FROM compositeentity;

(counts rows to validate the number of records in the table)

BEGIN BATCH  DELETE  FROM compositeentity WHERE compositekeypartone = ? AND 
compositekeyparttwo = ? AND compositekeypartthree = ?; DELETE  FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?; APPLY BATCH;

(uses a logged batch to delete the two rows from the table)

SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;
SELECT compositekeypartone, compositekeyparttwo, compositekeypartthree FROM 
compositeentity WHERE compositekeypartone = ? AND compositekeyparttwo = ? AND 
compositekeypartthree = ?;

(tries loads the rows from the table to check that they don't exist anymore)

After the delete, Cassandra has deleted only the first row so that the second 
SELECT here actually returns data. So far, this behaviour occurs randomly.

This happens even if there's a long sleep (1s, 10s) between the batch delete 
and the selects. It is always the second row that isn't deleted, never the 
first.

Thinking it might be timing issue (based on 
http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/),
 I've set up NTP to keep the clocks synchronized across all nodes (one node 
acts as the master which syncs to time.nrc.ca and {0,1,2,3}.ca.pool.ntp.org, 
whereas the remaining ones sync against the master. This hasn't reduced the 
number of times this behaviour crops up.

(I am executing all statements with QUORUM level consistency.)

I'm open to suggestions as to why this occurs and how I can fix it, if this can 
be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5748) When flushing, nodes spent almost 100% in AbstractCompositeType.compare

2013-07-15 Thread Chris Eineke (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708444#comment-13708444
 ] 

Chris Eineke commented on CASSANDRA-5748:
-

Sylvain,

Thank you for your response. That's great to hear! Is there a anticipated 
release date yet?

> When flushing, nodes spent almost 100% in AbstractCompositeType.compare
> ---
>
> Key: CASSANDRA-5748
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5748
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.5, 1.2.6
> Environment: Apache Cassandra v1.2.6
> 4-node cluster, mostly the same hardware
> # java -version
> java version "1.6.0_37"
> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>Reporter: Chris Eineke
>Priority: Critical
> Attachments: thread_dump
>
>
> We're pretty heavy users of CQL3 and CQL3 collection types. Occasionally, 
> some nodes of the cluster will become extremely sluggish and the cluster as a 
> whole starts to become unresponsive, reads will time out, and nodes will drop 
> mutation messages. This happens when nodes flush Memtables to disk (based on 
> my tail of the system.log on each node).
> I'm a curious guy, so I attached jvisualvm (v1.3.3) to the JVMs that were 
> having this problem. These nodes are spending up to 98% of CPU in 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78).
>  I will attach a thread dump.
> Thi is causing us quite a headache, because we're unable to figure what would 
> be causing this. We tried tuning several configuration settings (column cache 
> size, row key cache size), but the cluster exhibits the same issues even with 
> the default configuration (except for a modified num_tokens and 
> listen_address).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-5748) When flushing, nodes spent almost 100% in AbstractCompositeType.compare

2013-07-12 Thread Chris Eineke (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Eineke updated CASSANDRA-5748:


Attachment: thread_dump

> When flushing, nodes spent almost 100% in AbstractCompositeType.compare
> ---
>
> Key: CASSANDRA-5748
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5748
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.2.5, 1.2.6
> Environment: Apache Cassandra v1.2.6
> 4-node cluster, mostly the same hardware
> # java -version
> java version "1.6.0_37"
> Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
>Reporter: Chris Eineke
>Priority: Critical
> Attachments: thread_dump
>
>
> We're pretty heavy users of CQL3 and CQL3 collection types. Occasionally, 
> some nodes of the cluster will become extremely sluggish and the cluster as a 
> whole starts to become unresponsive, reads will time out, and nodes will drop 
> mutation messages. This happens when nodes flush Memtables to disk (based on 
> my tail of the system.log on each node).
> I'm a curious guy, so I attached jvisualvm (v1.3.3) to the JVMs that were 
> having this problem. These nodes are spending up to 98% of CPU in 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78).
>  I will attach a thread dump.
> Thi is causing us quite a headache, because we're unable to figure what would 
> be causing this. We tried tuning several configuration settings (column cache 
> size, row key cache size), but the cluster exhibits the same issues even with 
> the default configuration (except for a modified num_tokens and 
> listen_address).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5747) When flushing, nodes spent almost 100% in AbstractCompositeType.compare

2013-07-12 Thread Chris Eineke (JIRA)
Chris Eineke created CASSANDRA-5747:
---

 Summary: When flushing, nodes spent almost 100% in 
AbstractCompositeType.compare
 Key: CASSANDRA-5747
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5747
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.6, 1.2.5
 Environment: Apache Cassandra v1.2.6

4-node cluster, mostly the same hardware

# java -version
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)


Reporter: Chris Eineke
Priority: Critical


We're pretty heavy users of CQL3 and CQL3 collection types. Occasionally, some 
nodes of the cluster will become extremely sluggish and the cluster as a whole 
starts to become unresponsive, reads will time out, and nodes will drop 
mutation messages. This happens when nodes flush Memtables to disk (based on my 
tail of the system.log on each node).

I'm a curious guy, so I attached jvisualvm (v1.3.3) to the JVMs that were 
having this problem. These nodes are spending up to 98% of CPU in 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78).
 I will attach a thread dump.

Thi is causing us quite a headache, because we're unable to figure what would 
be causing this. We tried tuning several configuration settings (column cache 
size, row key cache size), but the cluster exhibits the same issues even with 
the default configuration (except for a modified num_tokens and listen_address).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5748) When flushing, nodes spent almost 100% in AbstractCompositeType.compare

2013-07-12 Thread Chris Eineke (JIRA)
Chris Eineke created CASSANDRA-5748:
---

 Summary: When flushing, nodes spent almost 100% in 
AbstractCompositeType.compare
 Key: CASSANDRA-5748
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5748
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.6, 1.2.5
 Environment: Apache Cassandra v1.2.6

4-node cluster, mostly the same hardware

# java -version
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)


Reporter: Chris Eineke
Priority: Critical


We're pretty heavy users of CQL3 and CQL3 collection types. Occasionally, some 
nodes of the cluster will become extremely sluggish and the cluster as a whole 
starts to become unresponsive, reads will time out, and nodes will drop 
mutation messages. This happens when nodes flush Memtables to disk (based on my 
tail of the system.log on each node).

I'm a curious guy, so I attached jvisualvm (v1.3.3) to the JVMs that were 
having this problem. These nodes are spending up to 98% of CPU in 
org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:78).
 I will attach a thread dump.

Thi is causing us quite a headache, because we're unable to figure what would 
be causing this. We tried tuning several configuration settings (column cache 
size, row key cache size), but the cluster exhibits the same issues even with 
the default configuration (except for a modified num_tokens and listen_address).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira