[jira] [Commented] (CASSANDRA-3070) counter repair

ivan (JIRA) Thu, 01 Sep 2011 13:24:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095582#comment-13095582
 ]


ivan commented on CASSANDRA-3070:
---------------------------------

{quote}
The last txt file you attached is just a copy of your comment.
{quote}

Sry. :) New log is uploaded.

I catched that with original "broken" sstables from production environment.

{quote}
There seems to be a value in there that shouldn't exists though.
{quote}

If you think of {1bb7ba60-9748-11e0-0000-01970cd1a6ff, -1314104014113, 0} the 
original value was {1bb7ba60-9748-11e0-0000-01970cd1a6ff, 24, 24} as I remember.
Value changed while we debug.

{quote}
I'll continue to look the code in the eyes, see if I find something.
{quote}

Many thanks. ;)

Answers

1. We increment these counters by one always. I have no other source relating 
to these counters.
"Right" value should be greater one based on our previous experiences.

We had similar issues with earlier version of 0.8 series. (0.8.1 as I remember)
That time nodetool repair solved our issue. In that case we had another source. 
Interestingly one server of 6 provided the correct answer and after repair 
counters were correct on all servers.

We experience that QUORUM/LOCAL_QUORUM provides correct result in out-of-sync 
situations.

2. As I mentioned we experienced similar issues earlier so I suspect there are 
many counters out-of-sync.
I have no report about any other "bad" counter, I will try to collect some 
problematic counters.

3. Question is really good. We started to use counters with 0.8.0.
nodetool repair solved out-of-sync errors with earlier version.
Bad counters were created with earlier versions (0.8.0 or 0.8.1, we didn't use 
0.8.2).
I think we ran into https://issues.apache.org/jira/browse/CASSANDRA-3006 . We 
noticed that some counters after server restart changed. We experienced that 
counter mutations read from commit log caused problems, so it may be the root 
of our out-of-sync situation.

4. CFs were no truncated and as I know sstables were not deleted.

I will try to collect as many problematic counters as I can and will provide 
output log.

{quote}
But the thing is, there is nothing wrong with the way the counter values are 
resolved.
{quote}

I can accept this but I don't see how can I synchronize values of replicated 
nodes? (In this case QUORUM/LOCAL_QUORUM always triggers DigestMismatch.)
Is there any solution to synchronize replicas?

Regards,
ivan


> counter repair
> --------------
>
>                 Key: CASSANDRA-3070
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3070
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.4
>            Reporter: ivan
>            Assignee: Sylvain Lebresne
>         Attachments: counter_local_quroum_maybeschedulerepairs.txt, 
> counter_local_quroum_maybeschedulerepairs_2.txt, 
> counter_local_quroum_maybeschedulerepairs_3.txt
>
>
> Hi!
> We have some counters out of sync but repair doesn't sync values.
> We tried nodetool repair.
> We use LOCAL_QUORUM for read. A repair row mutation is sent to other nodes 
> while reading a bad row but counters wasn't repaired by mutation.
> Output of two nodes were uploaded. (Some new debug messages were added.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3070) counter repair

Reply via email to