[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2
[ https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McGuire updated CASSANDRA-5789: Labels: qa-resolved (was: ) > Data not fully replicated with 2 nodes and replication factor 2 > --- > > Key: CASSANDRA-5789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5789 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.2.2, 1.2.6 > Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL > 6.2. I've seen the same behavior with Cassandra 1.2.2. > Sun Java 1.7.0_10-b18 64-bit > Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M >Reporter: James Lee >Assignee: Russ Hatch > Labels: qa-resolved > Attachments: 5789.py, CassBugRepro.py, CassTestData.py > > > I'm seeing a problem with a 2-node Cassandra test deployment, where it seems > that data isn't being replicated among the nodes as I would expect. > The setup and test is as follows: > - Two Cassandra nodes in the cluster (they each have themselves and the other > node as seeds in cassandra.yaml). > - Create 40 keyspaces, each with simple replication strategy and > replication factor 2. > - Populate 125,000 rows into each keyspace, using a pycassa client with a > connection pool pointed at both nodes. These are populated with writes using > consistency level of 1. > - Wait until nodetool on each node reports that there are no hinted handoffs > outstanding (see output below). > - Do random reads of the rows in the keyspaces, again using a pycassa client > with a connection pool pointed at both nodes. These are read using > consistency level 1. > I'm finding that the vast majority of reads are successful, but a small > proportion (~0.1%) are returned as Not Found. If I manually try to look up > those keys using cassandra-cli, I see that they are returned when querying > one of the nodes, but not when querying the other. So it seems like some of > the rows have simply not been replicated, even though the write for these > rows was reported to the client as successful. > If I reduce the rate at which the test tool initially writes data into the > database then I don't see any failed reads, so this seems like a load-related > issue. My understanding is that if all writes were successful and there are > no pending hinted handoffs, then the data should be fully-replicated and > reads should return it (even with read and write consistency of 1). > Here's the output from notetool on the two nodes: > comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0 2 0 > 0 > RequestResponseStage 0 0 878494 0 > 0 > MutationStage 0 02869107 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > ReplicateOnWriteStage 0 0 0 0 > 0 > GossipStage 0 0 2208 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > MigrationStage0 0994 0 > 0 > MemtablePostFlusher 0 0 4399 0 > 0 > FlushWriter 0 0 2264 0 >556 > MiscStage 0 0 0 0 > 0 > commitlog_archiver0 0 0 0 > 0 > InternalResponseStage 0 0153 0 > 0 > HintedHandoff 0 0 2 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > BINARY 0 > READ 0 > MUTATION 87655 > _TRACE 0 > REQUEST_RESPONSE 0 > comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0868 0 > 0 > RequestResponseStage 0 03919665 0 > 0 > MutationStage 0 08177325 0 > 0 > ReadRepairStage 0 0113 0 > 0 > ReplicateOnWrit
[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2
[ https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russ Hatch updated CASSANDRA-5789: -- Attachment: 5789.py > Data not fully replicated with 2 nodes and replication factor 2 > --- > > Key: CASSANDRA-5789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5789 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.2.2, 1.2.6 > Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL > 6.2. I've seen the same behavior with Cassandra 1.2.2. > Sun Java 1.7.0_10-b18 64-bit > Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M >Reporter: James Lee >Assignee: Russ Hatch > Attachments: 5789.py, CassBugRepro.py, CassTestData.py > > > I'm seeing a problem with a 2-node Cassandra test deployment, where it seems > that data isn't being replicated among the nodes as I would expect. > The setup and test is as follows: > - Two Cassandra nodes in the cluster (they each have themselves and the other > node as seeds in cassandra.yaml). > - Create 40 keyspaces, each with simple replication strategy and > replication factor 2. > - Populate 125,000 rows into each keyspace, using a pycassa client with a > connection pool pointed at both nodes. These are populated with writes using > consistency level of 1. > - Wait until nodetool on each node reports that there are no hinted handoffs > outstanding (see output below). > - Do random reads of the rows in the keyspaces, again using a pycassa client > with a connection pool pointed at both nodes. These are read using > consistency level 1. > I'm finding that the vast majority of reads are successful, but a small > proportion (~0.1%) are returned as Not Found. If I manually try to look up > those keys using cassandra-cli, I see that they are returned when querying > one of the nodes, but not when querying the other. So it seems like some of > the rows have simply not been replicated, even though the write for these > rows was reported to the client as successful. > If I reduce the rate at which the test tool initially writes data into the > database then I don't see any failed reads, so this seems like a load-related > issue. My understanding is that if all writes were successful and there are > no pending hinted handoffs, then the data should be fully-replicated and > reads should return it (even with read and write consistency of 1). > Here's the output from notetool on the two nodes: > comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0 2 0 > 0 > RequestResponseStage 0 0 878494 0 > 0 > MutationStage 0 02869107 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > ReplicateOnWriteStage 0 0 0 0 > 0 > GossipStage 0 0 2208 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > MigrationStage0 0994 0 > 0 > MemtablePostFlusher 0 0 4399 0 > 0 > FlushWriter 0 0 2264 0 >556 > MiscStage 0 0 0 0 > 0 > commitlog_archiver0 0 0 0 > 0 > InternalResponseStage 0 0153 0 > 0 > HintedHandoff 0 0 2 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > BINARY 0 > READ 0 > MUTATION 87655 > _TRACE 0 > REQUEST_RESPONSE 0 > comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0868 0 > 0 > RequestResponseStage 0 03919665 0 > 0 > MutationStage 0 08177325 0 > 0 > ReadRepairStage 0 0113 0 > 0 > ReplicateOnWriteStage 0 0 0
[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2
[ https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Lee updated CASSANDRA-5789: - Attachment: CassTestData.py Attaching missing module. > Data not fully replicated with 2 nodes and replication factor 2 > --- > > Key: CASSANDRA-5789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5789 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.2.2, 1.2.6 > Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL > 6.2. I've seen the same behavior with Cassandra 1.2.2. > Sun Java 1.7.0_10-b18 64-bit > Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M >Reporter: James Lee >Assignee: Brandon Williams > Attachments: CassBugRepro.py, CassTestData.py > > > I'm seeing a problem with a 2-node Cassandra test deployment, where it seems > that data isn't being replicated among the nodes as I would expect. > The setup and test is as follows: > - Two Cassandra nodes in the cluster (they each have themselves and the other > node as seeds in cassandra.yaml). > - Create 40 keyspaces, each with simple replication strategy and > replication factor 2. > - Populate 125,000 rows into each keyspace, using a pycassa client with a > connection pool pointed at both nodes. These are populated with writes using > consistency level of 1. > - Wait until nodetool on each node reports that there are no hinted handoffs > outstanding (see output below). > - Do random reads of the rows in the keyspaces, again using a pycassa client > with a connection pool pointed at both nodes. These are read using > consistency level 1. > I'm finding that the vast majority of reads are successful, but a small > proportion (~0.1%) are returned as Not Found. If I manually try to look up > those keys using cassandra-cli, I see that they are returned when querying > one of the nodes, but not when querying the other. So it seems like some of > the rows have simply not been replicated, even though the write for these > rows was reported to the client as successful. > If I reduce the rate at which the test tool initially writes data into the > database then I don't see any failed reads, so this seems like a load-related > issue. My understanding is that if all writes were successful and there are > no pending hinted handoffs, then the data should be fully-replicated and > reads should return it (even with read and write consistency of 1). > Here's the output from notetool on the two nodes: > comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0 2 0 > 0 > RequestResponseStage 0 0 878494 0 > 0 > MutationStage 0 02869107 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > ReplicateOnWriteStage 0 0 0 0 > 0 > GossipStage 0 0 2208 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > MigrationStage0 0994 0 > 0 > MemtablePostFlusher 0 0 4399 0 > 0 > FlushWriter 0 0 2264 0 >556 > MiscStage 0 0 0 0 > 0 > commitlog_archiver0 0 0 0 > 0 > InternalResponseStage 0 0153 0 > 0 > HintedHandoff 0 0 2 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > BINARY 0 > READ 0 > MUTATION 87655 > _TRACE 0 > REQUEST_RESPONSE 0 > comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0868 0 > 0 > RequestResponseStage 0 03919665 0 > 0 > MutationStage 0 08177325 0 > 0 > ReadRepairStage 0 0113 0 > 0 > ReplicateOnWriteS
[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2
[ https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-5789: -- Assignee: Brandon Williams > Data not fully replicated with 2 nodes and replication factor 2 > --- > > Key: CASSANDRA-5789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5789 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.2.2, 1.2.6 > Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL > 6.2. I've seen the same behavior with Cassandra 1.2.2. > Sun Java 1.7.0_10-b18 64-bit > Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M >Reporter: James Lee >Assignee: Brandon Williams > Attachments: CassBugRepro.py > > > I'm seeing a problem with a 2-node Cassandra test deployment, where it seems > that data isn't being replicated among the nodes as I would expect. > The setup and test is as follows: > - Two Cassandra nodes in the cluster (they each have themselves and the other > node as seeds in cassandra.yaml). > - Create 40 keyspaces, each with simple replication strategy and > replication factor 2. > - Populate 125,000 rows into each keyspace, using a pycassa client with a > connection pool pointed at both nodes. These are populated with writes using > consistency level of 1. > - Wait until nodetool on each node reports that there are no hinted handoffs > outstanding (see output below). > - Do random reads of the rows in the keyspaces, again using a pycassa client > with a connection pool pointed at both nodes. These are read using > consistency level 1. > I'm finding that the vast majority of reads are successful, but a small > proportion (~0.1%) are returned as Not Found. If I manually try to look up > those keys using cassandra-cli, I see that they are returned when querying > one of the nodes, but not when querying the other. So it seems like some of > the rows have simply not been replicated, even though the write for these > rows was reported to the client as successful. > If I reduce the rate at which the test tool initially writes data into the > database then I don't see any failed reads, so this seems like a load-related > issue. My understanding is that if all writes were successful and there are > no pending hinted handoffs, then the data should be fully-replicated and > reads should return it (even with read and write consistency of 1). > Here's the output from notetool on the two nodes: > comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0 2 0 > 0 > RequestResponseStage 0 0 878494 0 > 0 > MutationStage 0 02869107 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > ReplicateOnWriteStage 0 0 0 0 > 0 > GossipStage 0 0 2208 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > MigrationStage0 0994 0 > 0 > MemtablePostFlusher 0 0 4399 0 > 0 > FlushWriter 0 0 2264 0 >556 > MiscStage 0 0 0 0 > 0 > commitlog_archiver0 0 0 0 > 0 > InternalResponseStage 0 0153 0 > 0 > HintedHandoff 0 0 2 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > BINARY 0 > READ 0 > MUTATION 87655 > _TRACE 0 > REQUEST_RESPONSE 0 > comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0868 0 > 0 > RequestResponseStage 0 03919665 0 > 0 > MutationStage 0 08177325 0 > 0 > ReadRepairStage 0 0113 0 > 0 > ReplicateOnWriteStage 0 0 0
[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2
[ https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Lee updated CASSANDRA-5789: - Attachment: CassBugRepro.py Repro script for the bug, run it as follows: -- The script assumes you have two-node Cassandra cluster set up and running. -- The system running the test should have Python (I used 2.7) with pycassa installed. -- Run the setup stage as follows: "python CassBugRepro.py -c ip1,ip2 -s -f". This creates keyspaces and writes 2M rows into them. -- Once the above has completed, wait until all hints have been delivered (I checked using nodetool). -- Then run the next stage which does random read/writes: "python CassBugRepro.py -c ip1,ip2 -r". -- If the bug has been repro'd, you'll see output like "NotFoundException for DN 11055691"; where we haven't found something we'd previously sucessfully written. > Data not fully replicated with 2 nodes and replication factor 2 > --- > > Key: CASSANDRA-5789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5789 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.2.2, 1.2.6 > Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL > 6.2. I've seen the same behavior with Cassandra 1.2.2. > Sun Java 1.7.0_10-b18 64-bit > Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M >Reporter: James Lee > Attachments: CassBugRepro.py > > > I'm seeing a problem with a 2-node Cassandra test deployment, where it seems > that data isn't being replicated among the nodes as I would expect. > The setup and test is as follows: > - Two Cassandra nodes in the cluster (they each have themselves and the other > node as seeds in cassandra.yaml). > - Create 40 keyspaces, each with simple replication strategy and > replication factor 2. > - Populate 125,000 rows into each keyspace, using a pycassa client with a > connection pool pointed at both nodes. These are populated with writes using > consistency level of 1. > - Wait until nodetool on each node reports that there are no hinted handoffs > outstanding (see output below). > - Do random reads of the rows in the keyspaces, again using a pycassa client > with a connection pool pointed at both nodes. These are read using > consistency level 1. > I'm finding that the vast majority of reads are successful, but a small > proportion (~0.1%) are returned as Not Found. If I manually try to look up > those keys using cassandra-cli, I see that they are returned when querying > one of the nodes, but not when querying the other. So it seems like some of > the rows have simply not been replicated, even though the write for these > rows was reported to the client as successful. > If I reduce the rate at which the test tool initially writes data into the > database then I don't see any failed reads, so this seems like a load-related > issue. My understanding is that if all writes were successful and there are > no pending hinted handoffs, then the data should be fully-replicated and > reads should return it (even with read and write consistency of 1). > Here's the output from notetool on the two nodes: > comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0 2 0 > 0 > RequestResponseStage 0 0 878494 0 > 0 > MutationStage 0 02869107 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > ReplicateOnWriteStage 0 0 0 0 > 0 > GossipStage 0 0 2208 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > MigrationStage0 0994 0 > 0 > MemtablePostFlusher 0 0 4399 0 > 0 > FlushWriter 0 0 2264 0 >556 > MiscStage 0 0 0 0 > 0 > commitlog_archiver0 0 0 0 > 0 > InternalResponseStage 0 0153 0 > 0 > HintedHandoff 0 0 2 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > BINARY 0 > READ 0 > MUTATION
[jira] [Updated] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2
[ https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-5789: Assignee: (was: Alex Zarutin) Check if hints were generated and force hint delivery if so. > Data not fully replicated with 2 nodes and replication factor 2 > --- > > Key: CASSANDRA-5789 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5789 > Project: Cassandra > Issue Type: Bug >Affects Versions: 1.2.2, 1.2.6 > Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL > 6.2. I've seen the same behavior with Cassandra 1.2.2. > Sun Java 1.7.0_10-b18 64-bit > Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M >Reporter: James Lee > > I'm seeing a problem with a 2-node Cassandra test deployment, where it seems > that data isn't being replicated among the nodes as I would expect. > The setup and test is as follows: > - Two Cassandra nodes in the cluster (they each have themselves and the other > node as seeds in cassandra.yaml). > - Create 40 keyspaces, each with simple replication strategy and > replication factor 2. > - Populate 125,000 rows into each keyspace, using a pycassa client with a > connection pool pointed at both nodes. These are populated with writes using > consistency level of 1. > - Wait until nodetool on each node reports that there are no hinted handoffs > outstanding (see output below). > - Do random reads of the rows in the keyspaces, again using a pycassa client > with a connection pool pointed at both nodes. These are read using > consistency level 1. > I'm finding that the vast majority of reads are successful, but a small > proportion (~0.1%) are returned as Not Found. If I manually try to look up > those keys using cassandra-cli, I see that they are returned when querying > one of the nodes, but not when querying the other. So it seems like some of > the rows have simply not been replicated, even though the write for these > rows was reported to the client as successful. > If I reduce the rate at which the test tool initially writes data into the > database then I don't see any failed reads, so this seems like a load-related > issue. My understanding is that if all writes were successful and there are > no pending hinted handoffs, then the data should be fully-replicated and > reads should return it (even with read and write consistency of 1). > Here's the output from notetool on the two nodes: > comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0 2 0 > 0 > RequestResponseStage 0 0 878494 0 > 0 > MutationStage 0 02869107 0 > 0 > ReadRepairStage 0 0 0 0 > 0 > ReplicateOnWriteStage 0 0 0 0 > 0 > GossipStage 0 0 2208 0 > 0 > AntiEntropyStage 0 0 0 0 > 0 > MigrationStage0 0994 0 > 0 > MemtablePostFlusher 0 0 4399 0 > 0 > FlushWriter 0 0 2264 0 >556 > MiscStage 0 0 0 0 > 0 > commitlog_archiver0 0 0 0 > 0 > InternalResponseStage 0 0153 0 > 0 > HintedHandoff 0 0 2 0 > 0 > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > BINARY 0 > READ 0 > MUTATION 87655 > _TRACE 0 > REQUEST_RESPONSE 0 > comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats > Pool NameActive Pending Completed Blocked All > time blocked > ReadStage 0 0868 0 > 0 > RequestResponseStage 0 03919665 0 > 0 > MutationStage 0 08177325 0 > 0 > ReadRepairStage 0 0113 0 > 0 > ReplicateOnWriteStage 0 0 0