[ 
https://issues.apache.org/jira/browse/CASSANDRA-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hazel Bobrins updated CASSANDRA-9715:
-------------------------------------
    Environment: RHEL 6.2 2.6.32-220.13.1.el6.x86_64
    Description: 
On 2.0.15 ( we moved from 2.08 hoping this problem would go away) we am seeing 
intermittent issues where a secondary index is getting out of sync.

Set up is a 6 node cluster with 3 data centers, two nodes in each and with a RF 
of 2 in each data centre.
So far I have been unable to reproduce this synthetically but have seen 
multiple instances across all nodes within the cluster.

Data set is very small ~40K keys and >100MB of data. We add maybe 1000 records 
a day, delete ~500 and update ~200. Not a very write based system. Reads we can 
push out to ~2000/sec.

Writes are done at CL ALL and reads at ONE

All examples so far have been triggered when a record has been deleted and then 
other added with the same index cardinality; I think it has also always been 
the last record in the set which was deleted before the addition.

On a flushed keyspace a sstable2json export of the primary index shows all 
records correctly, however, an export of the secondary index is missing the 
records.

nodetool rebuild_index does not resolve the problem

Nether does a compact or repair

A select on the primary key at CL ALL also has no impact

However, a select at CL ALL on the secondary index does resolve the problem.

There is currently a none critical record which is out of the index on one of 
our nodes. If another key is added with the same index cardinality it is added 
to the index correctly. If this is then removed it once again returns empty.

We have checked all the obvious OS bits and confirmed our time sync (ntp based).

At DEBUG level we see nothing obvious wrong when adding/removing keys to the 
above broken entry.

Due to the very intermittent nature of this problem is been impossible so far 
to gather any DEBUG logs of it failing; we have also been unsuccessful so far 
in reproducing this in out QA.

I know this is not much to go on, if there is anything we can provide to help 
expand what might be the issue please let me know and we'll provide it asap.
    Component/s: Core

> Secondary index out of sync
> ---------------------------
>
>                 Key: CASSANDRA-9715
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9715
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: RHEL 6.2 2.6.32-220.13.1.el6.x86_64
>            Reporter: Hazel Bobrins
>
> On 2.0.15 ( we moved from 2.08 hoping this problem would go away) we am 
> seeing intermittent issues where a secondary index is getting out of sync.
> Set up is a 6 node cluster with 3 data centers, two nodes in each and with a 
> RF of 2 in each data centre.
> So far I have been unable to reproduce this synthetically but have seen 
> multiple instances across all nodes within the cluster.
> Data set is very small ~40K keys and >100MB of data. We add maybe 1000 
> records a day, delete ~500 and update ~200. Not a very write based system. 
> Reads we can push out to ~2000/sec.
> Writes are done at CL ALL and reads at ONE
> All examples so far have been triggered when a record has been deleted and 
> then other added with the same index cardinality; I think it has also always 
> been the last record in the set which was deleted before the addition.
> On a flushed keyspace a sstable2json export of the primary index shows all 
> records correctly, however, an export of the secondary index is missing the 
> records.
> nodetool rebuild_index does not resolve the problem
> Nether does a compact or repair
> A select on the primary key at CL ALL also has no impact
> However, a select at CL ALL on the secondary index does resolve the problem.
> There is currently a none critical record which is out of the index on one of 
> our nodes. If another key is added with the same index cardinality it is 
> added to the index correctly. If this is then removed it once again returns 
> empty.
> We have checked all the obvious OS bits and confirmed our time sync (ntp 
> based).
> At DEBUG level we see nothing obvious wrong when adding/removing keys to the 
> above broken entry.
> Due to the very intermittent nature of this problem is been impossible so far 
> to gather any DEBUG logs of it failing; we have also been unsuccessful so far 
> in reproducing this in out QA.
> I know this is not much to go on, if there is anything we can provide to help 
> expand what might be the issue please let me know and we'll provide it asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to