[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

2014-04-29 Thread Ryan McGuire (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McGuire updated CASSANDRA-5789:


Labels: qa-resolved  (was: )

> Data not fully replicated with 2 nodes and replication factor 2
> ---
>
> Key: CASSANDRA-5789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.2.2, 1.2.6
> Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 
> 6.2.  I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>Reporter: James Lee
>Assignee: Russ Hatch
>  Labels: qa-resolved
> Attachments: 5789.py, CassBugRepro.py, CassTestData.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems 
> that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other 
> node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a 
> connection pool pointed at both nodes.  These are populated with writes using 
> consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs 
> outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client 
> with a connection pool pointed at both nodes.  These are read using 
> consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying 
> one of the nodes, but not when querying the other.  So it seems like some of 
> the rows have simply not been replicated, even though the write for these 
> rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the 
> database then I don't see any failed reads, so this seems like a load-related 
> issue.  My understanding is that if all writes were successful and there are 
> no pending hinted handoffs, then the data should be fully-replicated and 
> reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0  2 0
>  0
> RequestResponseStage  0 0 878494 0
>  0
> MutationStage 0 02869107 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> GossipStage   0 0   2208 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0994 0
>  0
> MemtablePostFlusher   0 0   4399 0
>  0
> FlushWriter   0 0   2264 0
>556
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0153 0
>  0
> HintedHandoff 0 0  2 0
>  0
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ 0
> MUTATION 87655
> _TRACE   0
> REQUEST_RESPONSE 0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0868 0
>  0
> RequestResponseStage  0 03919665 0
>  0
> MutationStage 0 08177325 0
>  0
> ReadRepairStage   0 0113 0
>  0
> ReplicateOnWrit

[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

2014-01-08 Thread Russ Hatch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russ Hatch updated CASSANDRA-5789:
--

Attachment: 5789.py

> Data not fully replicated with 2 nodes and replication factor 2
> ---
>
> Key: CASSANDRA-5789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.2.2, 1.2.6
> Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 
> 6.2.  I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>Reporter: James Lee
>Assignee: Russ Hatch
> Attachments: 5789.py, CassBugRepro.py, CassTestData.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems 
> that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other 
> node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a 
> connection pool pointed at both nodes.  These are populated with writes using 
> consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs 
> outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client 
> with a connection pool pointed at both nodes.  These are read using 
> consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying 
> one of the nodes, but not when querying the other.  So it seems like some of 
> the rows have simply not been replicated, even though the write for these 
> rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the 
> database then I don't see any failed reads, so this seems like a load-related 
> issue.  My understanding is that if all writes were successful and there are 
> no pending hinted handoffs, then the data should be fully-replicated and 
> reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0  2 0
>  0
> RequestResponseStage  0 0 878494 0
>  0
> MutationStage 0 02869107 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> GossipStage   0 0   2208 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0994 0
>  0
> MemtablePostFlusher   0 0   4399 0
>  0
> FlushWriter   0 0   2264 0
>556
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0153 0
>  0
> HintedHandoff 0 0  2 0
>  0
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ 0
> MUTATION 87655
> _TRACE   0
> REQUEST_RESPONSE 0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0868 0
>  0
> RequestResponseStage  0 03919665 0
>  0
> MutationStage 0 08177325 0
>  0
> ReadRepairStage   0 0113 0
>  0
> ReplicateOnWriteStage 0 0  0   

[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

2013-08-08 Thread James Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Lee updated CASSANDRA-5789:
-

Attachment: CassTestData.py

Attaching missing module.

> Data not fully replicated with 2 nodes and replication factor 2
> ---
>
> Key: CASSANDRA-5789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.2.2, 1.2.6
> Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 
> 6.2.  I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>Reporter: James Lee
>Assignee: Brandon Williams
> Attachments: CassBugRepro.py, CassTestData.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems 
> that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other 
> node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a 
> connection pool pointed at both nodes.  These are populated with writes using 
> consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs 
> outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client 
> with a connection pool pointed at both nodes.  These are read using 
> consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying 
> one of the nodes, but not when querying the other.  So it seems like some of 
> the rows have simply not been replicated, even though the write for these 
> rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the 
> database then I don't see any failed reads, so this seems like a load-related 
> issue.  My understanding is that if all writes were successful and there are 
> no pending hinted handoffs, then the data should be fully-replicated and 
> reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0  2 0
>  0
> RequestResponseStage  0 0 878494 0
>  0
> MutationStage 0 02869107 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> GossipStage   0 0   2208 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0994 0
>  0
> MemtablePostFlusher   0 0   4399 0
>  0
> FlushWriter   0 0   2264 0
>556
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0153 0
>  0
> HintedHandoff 0 0  2 0
>  0
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ 0
> MUTATION 87655
> _TRACE   0
> REQUEST_RESPONSE 0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0868 0
>  0
> RequestResponseStage  0 03919665 0
>  0
> MutationStage 0 08177325 0
>  0
> ReadRepairStage   0 0113 0
>  0
> ReplicateOnWriteS

[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

2013-08-07 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-5789:
--

Assignee: Brandon Williams

> Data not fully replicated with 2 nodes and replication factor 2
> ---
>
> Key: CASSANDRA-5789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.2.2, 1.2.6
> Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 
> 6.2.  I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>Reporter: James Lee
>Assignee: Brandon Williams
> Attachments: CassBugRepro.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems 
> that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other 
> node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a 
> connection pool pointed at both nodes.  These are populated with writes using 
> consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs 
> outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client 
> with a connection pool pointed at both nodes.  These are read using 
> consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying 
> one of the nodes, but not when querying the other.  So it seems like some of 
> the rows have simply not been replicated, even though the write for these 
> rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the 
> database then I don't see any failed reads, so this seems like a load-related 
> issue.  My understanding is that if all writes were successful and there are 
> no pending hinted handoffs, then the data should be fully-replicated and 
> reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0  2 0
>  0
> RequestResponseStage  0 0 878494 0
>  0
> MutationStage 0 02869107 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> GossipStage   0 0   2208 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0994 0
>  0
> MemtablePostFlusher   0 0   4399 0
>  0
> FlushWriter   0 0   2264 0
>556
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0153 0
>  0
> HintedHandoff 0 0  2 0
>  0
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ 0
> MUTATION 87655
> _TRACE   0
> REQUEST_RESPONSE 0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0868 0
>  0
> RequestResponseStage  0 03919665 0
>  0
> MutationStage 0 08177325 0
>  0
> ReadRepairStage   0 0113 0
>  0
> ReplicateOnWriteStage 0 0  0

[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

2013-08-07 Thread James Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Lee updated CASSANDRA-5789:
-

Attachment: CassBugRepro.py

Repro script for the bug, run it as follows:
-- The script assumes you have two-node Cassandra cluster set up and running.
-- The system running the test should have Python (I used 2.7) with pycassa 
installed.
-- Run the setup stage as follows: "python CassBugRepro.py -c ip1,ip2 -s -f".  
This creates keyspaces and writes 2M rows into them.
-- Once the above has completed, wait until all hints have been delivered (I 
checked using nodetool).
-- Then run the next stage which does random read/writes: "python 
CassBugRepro.py -c ip1,ip2 -r".
-- If the bug has been repro'd, you'll see output like "NotFoundException for 
DN 11055691"; where we haven't found something we'd previously sucessfully 
written.

> Data not fully replicated with 2 nodes and replication factor 2
> ---
>
> Key: CASSANDRA-5789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.2.2, 1.2.6
> Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 
> 6.2.  I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>Reporter: James Lee
> Attachments: CassBugRepro.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems 
> that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other 
> node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a 
> connection pool pointed at both nodes.  These are populated with writes using 
> consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs 
> outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client 
> with a connection pool pointed at both nodes.  These are read using 
> consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying 
> one of the nodes, but not when querying the other.  So it seems like some of 
> the rows have simply not been replicated, even though the write for these 
> rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the 
> database then I don't see any failed reads, so this seems like a load-related 
> issue.  My understanding is that if all writes were successful and there are 
> no pending hinted handoffs, then the data should be fully-replicated and 
> reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0  2 0
>  0
> RequestResponseStage  0 0 878494 0
>  0
> MutationStage 0 02869107 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> GossipStage   0 0   2208 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0994 0
>  0
> MemtablePostFlusher   0 0   4399 0
>  0
> FlushWriter   0 0   2264 0
>556
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0153 0
>  0
> HintedHandoff 0 0  2 0
>  0
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ 0
> MUTATION  

[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

2013-07-23 Thread Brandon Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-5789:


Assignee: (was: Alex Zarutin)

Check if hints were generated and force hint delivery if so.

> Data not fully replicated with 2 nodes and replication factor 2
> ---
>
> Key: CASSANDRA-5789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
> Project: Cassandra
>  Issue Type: Bug
>Affects Versions: 1.2.2, 1.2.6
> Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 
> 6.2.  I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>Reporter: James Lee
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems 
> that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other 
> node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a 
> connection pool pointed at both nodes.  These are populated with writes using 
> consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs 
> outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client 
> with a connection pool pointed at both nodes.  These are read using 
> consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying 
> one of the nodes, but not when querying the other.  So it seems like some of 
> the rows have simply not been replicated, even though the write for these 
> rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the 
> database then I don't see any failed reads, so this seems like a load-related 
> issue.  My understanding is that if all writes were successful and there are 
> no pending hinted handoffs, then the data should be fully-replicated and 
> reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0  2 0
>  0
> RequestResponseStage  0 0 878494 0
>  0
> MutationStage 0 02869107 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> ReplicateOnWriteStage 0 0  0 0
>  0
> GossipStage   0 0   2208 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0994 0
>  0
> MemtablePostFlusher   0 0   4399 0
>  0
> FlushWriter   0 0   2264 0
>556
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0153 0
>  0
> HintedHandoff 0 0  2 0
>  0
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ 0
> MUTATION 87655
> _TRACE   0
> REQUEST_RESPONSE 0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0868 0
>  0
> RequestResponseStage  0 03919665 0
>  0
> MutationStage 0 08177325 0
>  0
> ReadRepairStage   0 0113 0
>  0
> ReplicateOnWriteStage 0 0  0