subject:"\[jira\] \[Comment Edited\] \(CASSANDRA\-4417\) invalid counter shard detected"

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2013-01-10 Thread Janne Jalkanen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549412#comment-13549412
]

Janne Jalkanen edited comment on CASSANDRA-4417 at 1/10/13 7:58 AM:

I'm seeing this while running repair -pr. Three-cluster node, RF 3. Straight
upgrade from 1.0.12 to 1.1.8; no topology changes. I see two invalid shard
IDs, counts differ by more than one - sometimes even by 3000 or more. Seems
random to my eyes.

Our counters are in a composite column family, no TTLs in use. We *mostly*
increment by one, but sometimes more.

I did disablegossip, disablethrift, drain, shutdown, upgrade, restart on every
node in a rolling fashion. Then I did upgradesstables and repair -pr on every
node when the entire cluster had been upgraded. Environment is Ubuntu Linux
12.04 LTS, JVM is OpenJDK 7u9.

Last repair picked 497 invalid counter shards, and we have approximately 8
million counters, of which about a hundred is incremented each second (and
sometimes subtracted from if our read repair kicks in - we have our own in-app
repair for certain low values). All the counter writes are batched with 100
increments/batch. So this is only affecting a really small subset, though it's
rather annoying when it happens, as it means that you can never really trust
the counters to be even in the ballpark :-/

was (Author: jalkanen):
I'm seeing this while running repair -pr. Three-cluster node, RF 3.
Straight upgrade from 1.0.12 to 1.1.8; no topology changes. I see two invalid
shard IDs, counts differ by more than one - sometimes even by 3000 or more.
Seems random to my eyes.

Our counters are in a composite column family, no TTLs in use. We *mostly*
increment by one, but sometimes more.

invalid counter shard detected
---

Key: CASSANDRA-4417
URL: https://issues.apache.org/jira/browse/CASSANDRA-4417
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.1.1
Environment: Amazon Linux
Reporter: Senthilvel Rangaswamy
Attachments: cassandra-mck.log.bz2, err.txt

Seeing errors like these:
2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected;
(17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and
(17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick
highest to self-heal; this indicates a bug or corruption generated a bad
counter shard
What does it mean ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2013-01-09 Thread Janne Jalkanen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549412#comment-13549412
]

Janne Jalkanen edited comment on CASSANDRA-4417 at 1/10/13 7:48 AM:

Our counters are in a composite column family, no TTLs in use. We *mostly*
increment by one, but sometimes more.

I did disablegossip, disablethrift, drain, upgrade, restart on every node in a
rolling fashion. Then I did upgradesstables and repair -pr on every node when
the entire cluster had been upgraded.

Our counters are in a composite column family, no TTLs in use.

I did disablegossip, disablethrift, drain, upgrade, restart on every node in a
rolling fashion. Then I did upgradesstables and repair -pr on every node when
the entire cluster had been upgraded.

invalid counter shard detected
---

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2013-01-09 Thread Janne Jalkanen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549412#comment-13549412
]

Janne Jalkanen edited comment on CASSANDRA-4417 at 1/10/13 7:48 AM:

Our counters are in a composite column family, no TTLs in use. We *mostly*
increment by one, but sometimes more.

I did disablegossip, disablethrift, drain, upgrade, restart on every node in a
rolling fashion. Then I did upgradesstables and repair -pr on every node when
the entire cluster had been upgraded.

invalid counter shard detected
---

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2013-01-09 Thread Janne Jalkanen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549412#comment-13549412
]

Janne Jalkanen edited comment on CASSANDRA-4417 at 1/10/13 7:49 AM:

Our counters are in a composite column family, no TTLs in use. We *mostly*
increment by one, but sometimes more.

invalid counter shard detected
---

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2013-01-09 Thread Janne Jalkanen (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549412#comment-13549412
]

Janne Jalkanen edited comment on CASSANDRA-4417 at 1/10/13 7:50 AM:

Our counters are in a composite column family, no TTLs in use. We *mostly*
increment by one, but sometimes more.

invalid counter shard detected
---

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2012-11-07 Thread Mck SembWever (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492245#comment-13492245
]

Mck SembWever edited comment on CASSANDRA-4417 at 11/7/12 10:20 AM:

Sylvain, here's log from one node. For most of the log we were running 1.0.8.
And then at line 2883399 we upgraded (and this was the first node to upgrade)
to 1.1.6.

The error msg comes every few seconds.
Our counters are sub-columns inside supercolumns.
We completed the upgrade on all nodes. Then restarted again (because jna was
missing).

We are now running upgradesstables but that's not in this logfile. The error
msgs still appear.

An operational problem we've had recently is that we had one node down for ~one
month (faulty raid controller) and when we finally brought the node back into
the cluster nightly repairs would never finish. In the end we just disabled
nightly repairs (we don't have tombstones) with the plan that an upgrade and
upgradesstables would bring us back to a state where repairs would work again.
I have no idea if this can be related.

was (Author: michaelsembwever):
Sylvain, here's log from one node. For most of the log we were running
1.0.8. And then at line 2883399 we upgraded (and this was the first node to
upgrade) to 1.1.6.

The error msg comes every few seconds.
Our counters are sub-columns inside supercolumns.
We completed the upgrade on all nodes. Then restarted again (because jna was
missing).

We are now running upgradesstables but that's not in this logfile. The error
msgs still appear.

An operational problem we're had recently is that we had one node down for ~one
month (faulty raid controller) and when we finally brought the node back into
the cluster nightly repairs would never finish. In the end we just disabled
nightly repairs (we don't have tombstones) with the plan that an upgrade and
upgradesstables would bring us back to a state where repairs would work again.
I have no idea if this can be related.

invalid counter shard detected
---

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2012-10-26 Thread Eric Lubow (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485333#comment-13485333
]

Eric Lubow edited comment on CASSANDRA-4417 at 10/27/12 2:22 AM:
-

We are getting this on DSE 2.2 (C* 1.1.5) on a new node during bootstrap. We
upgraded the cluster from C* 1.0.10 about 10 days ago and upgradesstables was
run on every node and we repaired the entire cluster. We ran We've been
getting this error sporadically on various nodes at various points but it's not
consistent. I've double and triple checked every node looking for sstable
files named *- hd -* and I don't see any (assuming that's enough to tell that
the sstable has been upgraded. If this error is an effect of requiring one to
run upgradesstables, then how would it happen during a bootstrap? All nodes
involved in this cluster are 1.1.5.

was (Author: elubow):
We are getting this on DSE 2.2 (C* 1.1.5) on a new node during bootstrap.
We upgraded the cluster from C* 1.0.10 about 10 days ago and upgradesstables
was run on every node and we repaired the entire cluster. We ran We've been
getting this error sporadically on various nodes at various points but it's not
consistent. I've double and triple checked every node looking for sstable
files named *-hd-* and I don't see any (assuming that's enough to tell that
the sstable has been upgraded. If this error is an effect of requiring one to
run upgradesstables, then how would it happen during a bootstrap? All nodes
involved in this cluster are 1.1.5.

invalid counter shard detected
---

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2012-09-11 Thread Omid Aladini (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453537#comment-13453537
 ] 

Omid Aladini edited comment on CASSANDRA-4417 at 9/12/12 10:18 AM:
---

{quote}
A simple workaround is to use batch commit log, but that has a potentially 
important performance impact.
{quote}

I'm a bit confused why batch commit would solve the problem. If cassandra 
crashes before the batch is fsynced, the counter mutations in the batch which 
it was the leader for will still be lost although they might have been applied 
on other replicas. The difference would be that the mutations won't be 
acknowledged to the client, and since counters aren't idempotent, the client 
won't know weather to retry or not. Am I missing something?

  was (Author: omid):
{quote}
A simple workaround is to use batch commit log, but that has a potentially 
important performance impact.
{quote}

I'm a bit confused why batch commit would solve the problem. If cassandra 
crashes before the batch is fsynced, the counter mutations which it was the 
leader for will still be lost although they might have been applied on other 
replicas. The difference would be that the mutations won't be acknowledged to 
the client, and since counters aren't idempotent, the client won't know weather 
to retry or not. Am I missing something?
  
 invalid counter shard detected 
 ---

 Key: CASSANDRA-4417
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
 Environment: Amazon Linux
Reporter: Senthilvel Rangaswamy

 Seeing errors like these:
 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; 
 (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and 
 (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick 
 highest to self-heal; this indicates a bug or corruption generated a bad 
 counter shard
 What does it mean ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2012-09-07 Thread Charles Brophy (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450766#comment-13450766
]

Charles Brophy edited comment on CASSANDRA-4417 at 9/8/12 3:35 AM:
---

We have a six node cluster with even key range balance, random partitioner, and
with replication factor=2. I get these errors immediately following running
nodetool repair but ONLY if a streaming repair happens as a result. We are
serving live updates to our counters from our clickstream. My guess is that the
sstable being streamed between the servers winds up becoming out of date for
the duration of the streaming process and ends up containing these duplicates
that are vetted during the subsequent compaction. In any case, for us it is
100% reproducible via: nodetool repair - streaming repair - subsequent
compaction. Let me know if you need more details. Hope this helps!

was (Author: charlesb_zulily):
We have a six node cluster with even key range balance, random partitioner,
and with relication factor=2. I get these errors immediately following running
nodetool repair but ONLY if a streaming repair happens as a result. We are
serving live updates to our counters from our clickstream. My guess is that the
sstable being streamed between the servers winds up becoming out of date for
the duration of the streaming process and ends up containing these duplicates
that are vetted during the subsequent compaction. In any case, for us it is
100% reproducible via: nodetool repair - streaming repair - subsequent
compaction. Let me know if you need more details. Hope this helps!

invalid counter shard detected
---

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2012-09-07 Thread Charles Brophy (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450766#comment-13450766
]

Charles Brophy edited comment on CASSANDRA-4417 at 9/8/12 3:39 AM:
---

We have a six node cluster [1.1.3, jdk 1.6.33, CentOs 6] with even key range
balance, random partitioner, and with replication factor=2. I get these errors
immediately following running nodetool repair but ONLY if a streaming repair
happens as a result. We are serving live updates to our counters from our
clickstream. My guess is that the sstable being streamed between the servers
winds up becoming out of date for the duration of the streaming process and
ends up containing these duplicates that are vetted during the subsequent
compaction. In any case, for us it is 100% reproducible via: nodetool repair -
streaming repair - subsequent compaction. Let me know if you need more
details. Hope this helps!

was (Author: charlesb_zulily):
We have a six node cluster with even key range balance, random partitioner,
and with replication factor=2. I get these errors immediately following running
nodetool repair but ONLY if a streaming repair happens as a result. We are
serving live updates to our counters from our clickstream. My guess is that the
sstable being streamed between the servers winds up becoming out of date for
the duration of the streaming process and ends up containing these duplicates
that are vetted during the subsequent compaction. In any case, for us it is
100% reproducible via: nodetool repair - streaming repair - subsequent
compaction. Let me know if you need more details. Hope this helps!

invalid counter shard detected
---

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

10 matches

Site Navigation

Mail list logo

Footer information