subject:"\[jira\] \[Commented\] \(CASSANDRA\-6127\) vnodes don't scale to hundreds of nodes"

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-20 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827750#comment-13827750
]

Quentin Conner commented on CASSANDRA-6127:
---

Yes, both use case 1 and use case 2 (detailed in early comment above) were
cured by patch #3. Zero flaps were recorded in multiple trials in both use
cases. Patch #3 cures the flaps, but does not address the cpu usage symptom.

This was tested against the cassandra-1.2 branch. I am conducting the same
test today against use case 2 today, but using the current cassandra-2.0 branch
of source.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 2013-11-05_18-04-03_no_compression_cpu_time.png,
2013-11-05_18-09-38_compression_on_cpu_time.png, 6000vnodes.patch,
AdjustableGossipPeriod.patch, cpu-vs-token-graph.png,
delayEstimatorUntilStatisticallyValid.patch, flaps-vs-tokens.png

There are a lot of gossip-related issues related to very wide clusters that
also have vnodes enabled. Let's use this ticket as a master in case there are
sub-tickets.
The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge
instances. Each node configured with 32 vnodes.
Without vnodes, cluster spins up fine and is ready to handle requests within
30 minutes or less.
With vnodes, nodes are reporting constant up/down flapping messages with no
external load on the cluster. After a couple of hours, they were still
flapping, had very high cpu load, and the cluster never looked like it was
going to stabilize or be useful for traffic.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-19 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826949#comment-13826949
]

Jonathan Ellis commented on CASSANDRA-6127:
---

bq. Untested patch #3. Delays output from FailureDetector until statistically
valid number of samples have been obtained.

Did we ever find a scenario where we can demonstrate this patch making a
difference? Because I think it's a good idea in theory.

vnodes don't scale to hundreds of nodes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-12 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820669#comment-13820669
]

Jonathan Ellis commented on CASSANDRA-6127:
---

bq. ISTM that FD processing Gossip updates synchronously is a fundamental
problem. Any hiccup in processing will cause FD false positives.

I've pulled a fix for this out to CASSANDRA-6338.

vnodes don't scale to hundreds of nodes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-07 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816037#comment-13816037
]

Quentin Conner commented on CASSANDRA-6127:
---

Good morning. We saw the same CPU usage profile with cassandra-1.2
8e7d7285cdeac4f2527c933280d595bbddd26935 (which included the patch to not flush
peers CF).

CPU time was spent in looking up EndpointState or spent in PHI calculation. No
surprises were found. No race conditions, no deadlocks or mutex/monitor
contention.

I do not know if flapping happens in 1.2 head without vnodes. I will find out
today, if I can get the nodes (having trouble this morning allocating from
EC2). Will keep trying (Fridays seem better) but could slip into the weekend...

vnodes don't scale to hundreds of nodes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-07 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13816044#comment-13816044
]

Quentin Conner commented on CASSANDRA-6127:
---

Tupshin, can you further quantify the CPU usage you observed, in terms of USER
CPU and KERNEL CPU?
Also, can you confirm the number of nodes and vnodes for those observations.

I've seen about 25% user cpu @ 256 nodes and 60% @ 512 nodes. Kernel cpu was
under 5% for both in my trials.

vnodes don't scale to hundreds of nodes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-06 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815286#comment-13815286
]

Jonathan Ellis commented on CASSANDRA-6127:
---

ISTM that FD processing Gossip updates synchronously is a fundamental problem.
Any hiccup in processing will cause FD false positives. (And even if our own
code is perfect, GC pauses can still do this to us.)

Wouldn't it be better if we:
- time heartbeats based on when they arrive instead of when Gossip processes
them
- teach FD to recognize that its information is only good up to the most
recently processed message -- the absence of messages after that doesn't mean
everyone is down unless the Gossip stage is empty

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-06 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815297#comment-13815297
 ] 

Tupshin Harper commented on CASSANDRA-6127:
---

+1. Strongly agree with Jonathan's analysis and proposal.

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-06 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815379#comment-13815379
 ] 

Brandon Williams commented on CASSANDRA-6127:
-

At this point, I think we should:

* see if the flapping happens with vnodes (maybe Quentin already knows from his 
last test)
* see if the flapping happens without vnodes but the same number of nodes

Because if sum() in ArrivalWindow is burning the most CPU in the Gossiper task 
(note: not bottlenecking, each call was at most ~3ms, there were just lots of 
them) then that means that the problem is no longer tied to vnodes (if it ever 
was, since sum is per-node, not per-token) and we should probably open a new 
ticket (can't start a cluster of size =X all at once, or similar) and discuss 
there.  We know that clusters much larger than any discussed on this ticket 
exist, but I don't think any of them have all rebooted at once.

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-05 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813928#comment-13813928
]

Quentin Conner commented on CASSANDRA-6127:
---

Good cpu profile results were obtained last night with the 1.2.9 code line.
Switching over to the cassandra-1.2 HEAD this morning for up-to-date analysis.
CPU profile of 1.2.9 showed bottleneck was computation of sum for the
ArrivalWindow deque members (inter-arrival times of gossip messages).

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-04 Thread Matt Stump (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813134#comment-13813134
]

Matt Stump commented on CASSANDRA-6127:
---

As another datapoint/use case create a 32 node ring with vnodes, decommission
one of the nodes and observe the logs. Every node in the ring will be marked as
down by the gossiper, then immediately be re-added again as up/available.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-04 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813147#comment-13813147
 ] 

Brandon Williams commented on CASSANDRA-6127:
-

bq. Every node in the ring will be marked as down by the gossiper

In which node's view? (or all of them?)

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-04 Thread Matt Stump (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813160#comment-13813160
]

Matt Stump commented on CASSANDRA-6127:
---

We're observing the logs of a random sample of nodes and on all nodes observed
the entire ring is marked as down, so I assume it's for all nodes.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-04 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813184#comment-13813184
 ] 

Jonathan Ellis commented on CASSANDRA-6127:
---

How heavy is read/write load?

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-04 Thread Matt Stump (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813198#comment-13813198
 ] 

Matt Stump commented on CASSANDRA-6127:
---

Zero to minimal load. 177 writes/second, 0 reads against the entire ring. 
m2.4xlarge instances.

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-04 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813309#comment-13813309
 ] 

Brandon Williams commented on CASSANDRA-6127:
-

With CASSANDRA-6244 and CASSANDRA-6297 in 1.2 head, I think we need to 
re-verify this is still a problem.

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-11-01 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13811823#comment-13811823
]

Quentin Conner commented on CASSANDRA-6127:
---

Monday (11/4) I will be start getting the CPU profiling captured with a 256 or
512 node cluster. Plan is to capture with internode compression and without.
I was able to get semi-reproduction this week in a 256 node cluster -- one node
had twice the cpu utilization of the others (20% user versus 10% user). But I
had too much logging enabled and that skewed results.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-29 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808260#comment-13808260
]

Quentin Conner commented on CASSANDRA-6127:
---

Brandon,

You said Patch #3 will make it take much longer for a rebooted node to know
who's actually up or down, exacerbating CASSANDRA-4288. I've given this some
thought and want to see if I understand your concern.

Patch #3 serves to send a zero value for phi, for newly-discovered nodes, until
an accurate calculation of variance is complete. This would be 40 seconds,
applicable to new nodes only.

However (and this is what I'm looking for you to confirm) If a new node comes
online, but is stopped again within 40 seconds of start-up, the FD will not
convict it until the end of that 40 seconds.

I suspect this occurs less frequently than adding a node to a cluster, but
probably depends on your use case (dev vs prod).

In my view, we can't escape the math, and the need to amass 40 samples. That
is why the bug exists today. I agree we should look at tying thrift to a
healthy startup as a compensating measure.

Instead of a fixed amount of time (gossip rounds), perhaps we should consider
adding a hold-down timer based on a statistical measure?

This hold-down timer could be implemented for newly discovered nodes to
suppress interaction until Gossip stabilizes. Just like we have a high-water
mark for phi to denote failure, we could set a low-water mark and call it a
trust threshold. We wouldn't enable thrift communications to the new node
until their phi value is below this low-water mark.

So the condition for recognizing a new node for thrift purposes could be two
fold:
1. valid computation for variance (40 samples obtained in the 1000 sample
window)
2. accurate phi value is indeed below the low-water mark

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-29 Thread Brandon Williams (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808328#comment-13808328
]

Brandon Williams commented on CASSANDRA-6127:
-

Let's move that discussion to CASSANDRA-4288, since that change is orthogonal
to the actual problem we have here, regardless of whether it fixes it or just
papers over the problem. What we need to do next on this ticket is either
correlate a thread dump to what is burning up CPU, or attach a debugger and see
where the time is being spent.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-29 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808335#comment-13808335
]

Jonathan Ellis commented on CASSANDRA-6127:
---

bq. it would be better to limit that in the config instead of failing at an
assert later on.

Split that out to CASSANDRA-6267.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-28 Thread Chris Burroughs (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806703#comment-13806703
]

Chris Burroughs commented on CASSANDRA-6127:

bq. I'd just set a max of 1024. No one could ever need more than that. (Famous
last words)

Isn't that equivalent to saying no one will have a heterogeneous cluster with
more than a 1024/256 = 4 performance delta between physical nodes? SSD vs
spinny could account for more than that.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-28 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806796#comment-13806796
]

Jonathan Ellis commented on CASSANDRA-6127:
---

I'm okay with that limitation. Intuitively it's reasonable that C* can't
compensate for really ridiculous performance differences.

(Of course, you could also reduce the weak nodes below 256.)

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-25 Thread Chris Burroughs (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805267#comment-13805267
]

Chris Burroughs commented on CASSANDRA-6127:

It would be helpful to dump the interval times for a node that is flapping
(dumpInterArrivalTimes on the FD) so we can see how long the heartbeats are
taking.

A per endpoint histogram of heartbeat arrival latency seems a worthwhile
o.a.c.Metric to have all the time.

[~qconner] On the topic of wait until there is enough data before doing
stuff you might also be interested in the heuristic report from
CASSANDRA-4288

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-25 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805838#comment-13805838
]

Quentin Conner commented on CASSANDRA-6127:
---

I grabbed some sample log files from 10 nodes of 256 in a run today.
[flap-intervals.tar.gz|http://qconner.s3.amazonaws.com/flap-intervals.tar.gz]

Convictions are happening with only 1 to 5 intervals recorded. Patch #3 is
looking like the winner but we should do the math by hand to be sure
(volunteers?).

Also, I just tested [Patch
#3|https://issues.apache.org/jira/secure/attachment/12610117/delayEstimatorUntilStatisticallyValid.patch]
and found 0 flaps for the same setup as yesterday (256 nodes, phi=8, normal
1000 ms gossip period).

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-25 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805880#comment-13805880
]

Jonathan Ellis commented on CASSANDRA-6127:
---

Patch 1 will break things since later on we write the length of the string as
two bytes.

I think we're fine with 1700 vnodes per machine TBH, although it would be
better to limit that in the config instead of failing at an assert later on.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-25 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805890#comment-13805890
 ] 

Tupshin Harper commented on CASSANDRA-6127:
---

I'd just set a max of 1024. No one could ever need more than that. (Famous
last words)



 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-25 Thread Brandon Williams (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805928#comment-13805928
]

Brandon Williams commented on CASSANDRA-6127:
-

Patch #3 will make it take much longer for a rebooted node to know who's
actually up or down, exacerbating CASSANDRA-4288. I'd still like to know *why*
things are taking longer with vnodes, and I'm especially hesitant to make any
adjustments to the gossiper or FD since we know they work fine with single
tokens, and also because they *have no knowledge about tokens*, it's just
another opaque state to them. I suspect something in StorageService is
blocking the gossiper long enough to cause this, perhaps CASSANDRA-6244 or
something similar.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-25 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805977#comment-13805977
 ] 

Jonathan Ellis commented on CASSANDRA-6127:
---

Couldn't we tie the thrift/native server startup to I have enough gossip data 
now?

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-25 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805981#comment-13805981
 ] 

Brandon Williams commented on CASSANDRA-6127:
-

That might confuse autodiscovery clients, at least without further changes.

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-24 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804361#comment-13804361
]

Quentin Conner commented on CASSANDRA-6127:
---

*Background and Reproduction*

The symptom is evident with the presence of is now DOWN messages in the
Cassandra system.log file. The recording of a node DOWN is often followed by a
node UP a few seconds later. Users have coined this phenomenon gossip flap
and the occurence of Gossip flaps has a machine and a human consequence.

Humans react strongly to the (temporary) marking of a node down. Automated
monitoring may trigger SNMP traps, etc. A busy node that doesn't transmit
heartbeat gossip messages on time will be marked as down though it may still
be performing useful work.

Machine reactions include other C* nodes buffering of row mutations and storage
of hints on disk when another node is marked down. I have not explored the
machine reactions but imagine the endpointSnitch could also be affected from
the client frame of reference.

One piece of good news is that I was able to reproduce two different use cases
that elicit the is now DOWN message in Log4J log files.

Use Case #1 is as follows:
provision 256 or 512 nodes in EC2
install Cassandra 1.2.9
take defaults except specify num_tokens=256 in c*.yaml
start one node at a time

Use Case #2 is as follows:
provision 32 nodes in EC2
install Cassandra 1.2.9
take defaults in c*.yaml
configure rack
start one node at a time
when all nodes are up create about 1GB of data
e.g. tools/bin/cassandra-stress -c 20 -l 3 -n 100
provision a 33rdxtra node in EC2
install Cassandra 1.2.9
take defaults except specify num_tokens=256
start the node (auto_bootstrap=true)

vnodes don't scale to hundreds of nodes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-24 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804365#comment-13804365
]

Quentin Conner commented on CASSANDRA-6127:
---

*Feature Suggestion*

The current Gossip failure detector is characterized by a sliding window of
elapsed time, a heartbeat message period and a PHI threshold used to make the
continuous random variable (lower case phi) into a dichotomous (binary) random
variable. That PHI (uppercase) threshold is called phi_convict_threshold.

I don't have a better mathmatical theory or derivation at this writing, but I
do have an easy workaround for your consideration. While phi_convict_threshold
is adjustable, the period (or frequency) of Gossip messages is not. Adjusting
the gossip period to integrate over a longer time baseline reduced false
positives from the Gossip failure detector. The side effect increases the
elapsed time to detect a legitimately-failed node.

Depending on user workload characteristics, and the related sources of latency
(CPU, disk and network activity or transient delays) cited above, a System
Architect could present a reasonable use case for controlling the Gossip
message period.

The goal would be to set a detection window that accomodates common occurences
for a given deployment scenario. Not all data centers are created equal.

Patches and results from implementation will follow in subsequent posts.

*Potential Next Steps*
Explore concern about sensitivity to gossip period. Do the vnode gossip
messages exceed capacity for peers to ingest?
Explore concern about phi estimates from un-filled (new) deque.
Explore concern about assuming Gaussian PDF. Networks (not computers)
generally characterize expected arrival time by Poisson distribution, not
Gaussian.

vnodes don't scale to hundreds of nodes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-24 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804362#comment-13804362
]

Quentin Conner commented on CASSANDRA-6127:
---

*Analysis*

My first experiments aimed to quantify the length of Gossip messages and
determine what factors drive the message length. I found the size of certain
gossip messages increases proportionally with the number of vnodes (num_tokens
in c.yaml). I recorded message size over the num_tokens and number of nodes
domains (64,128,256,512,...) for tokens and (32,64,128,256,512) for nodes. I
also made non-rigorous observation of User and Kernel CPU (Ubuntu 10.0.4 LTS).
My hunch is that both vnode count and node count have a mild effect on user CPU
resource usage.

What is the rough estimate of bytes sent for certain Gossip messages and why
does this matter? The Phi Accrual Failure Detector (Hayashibara, et al)
assumes fixed length heartbeat messages while Cassandra uses variable length
messages. I observed a correlation with larger messages, higher vnodes and
false positive detections by the Gossip FailureDetector. These observations,
IMHO, are not explained by the research paper. I formed a hypothesis that the
false positives are due to jitter in the interval values. I wondered if
perhaps using a longer baseline to integrate over would reduce the jitter.

I have a second theory to follow up on. A newly added node will not have a
long history of Gossip heartbeat interarrival times. At least 40 samples are
needed to compute mean, variance with any statistical significance. It's
possible the phi estimation algorithm is simply invalid for newly created nodes
and that is why we see them flap shortly after creation.

In any case, the message of interest is the GossipDigestAck2 (GDA2) because it
is the largest of the Gossip messages. GDA2 contains the set of
EndpointStateMaps (node metadata) for newly-discovered nodes, i.e. those nodes
just added to an existing cluster. When each node becomes aware of joining
node, they Gossip it to three randomly-chosen other nodes. The GDA2 message is
tailored to contain the delta of new node metadata the receiving node is
unaware of.

For a single node, the upper limit on GDA message size is roughly 3 * N * k * V
Where N is the number of nodes in the cluster,
V is the number of tokens (vnodes) per cluster,
k is a constant value, approximately 64 bytes, that represents a serialized
token plus some other endpoint metadata.

If one is running hundreds of nodes in a cluster, the Gossip message traffic
created when a node joins can be significant and increases with the number of
nodes. I believe this to be the first order effect and probably violates one
of the assumptions of the PHI Accrual Failure Detection, that heartbeat
messages are small enough not to consume a relevant amount of compute or
communication resources. The variable transmission time (due to variable
length messages) is a clear violation of assumptions, if I've read the source
code correctly.

On a related topic, there is a hard-coded limitation to the number of vnodes
due to the serialization of the GDA messages.
No more than 1720 vnodes can be configured without creating a greater than 32K
serialized String vnode message. A patch is provided below for future use
should this become an issue.

In clusters with hundreds of nodes, GDA2 messages can be 200 KB or 2 MB if many
nodes join simultaneously. This is not an issue if the computer experiences no
latency from competing workloads. In the real world, nodes are added because
the cluster load has grown in terms of retained data, or in terms of a high
transaction arrival rate. This means node resources may be fully utilized when
adding new nodes is typically attempted.

It occured to me that we have another use case to accomodate. It is common to
experience transient failure modes, even in modern data centers with
disciplined maintenance practices. Ethernet cables get moved, switches and
routers rebooted. BGP route errors and other temporary interruptions may occur
with the network fabric in real world scenarios. People make mistakes, plans
change and preventative maintenance often causes short-lived interruptions
occur with network, CPU and disk subsystems.

vnodes don't scale to hundreds of nodes
---

There are a lot of gossip-related issues related to very wide clusters that
also have vnodes enabled. Let's use this ticket as a

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-24 Thread Quentin Conner (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804459#comment-13804459
]

Quentin Conner commented on CASSANDRA-6127:
---

First results with workaround patch #2.
No load. No data. Only system keyspace and Gossip on a 256 node m1.medium
cluster in EC2.
Nodes started in rapid succession.

*phi=8, variable gossip_period*
1154 flaps for 1 sec
685 flaps for 2 sec
146 flaps for 3 sec
88 flaps for 4 sec
70 flaps for 5 sec
100 flaps for 10 sec

*phi=12*
1289 flaps for 1 sec
77 flaps for 2 sec
6 flaps for 3 sec
1 flaps for 4 sec
3 flaps for 5 sec
1 flaps for 6 sec
0 flaps for 8 sec
1 flaps for 10 sec

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-24 Thread Brandon Williams (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804545#comment-13804545
]

Brandon Williams commented on CASSANDRA-6127:
-

It would be helpful to dump the interval times for a node that is flapping
(dumpInterArrivalTimes on the FD) so we can see how long the heartbeats are
taking. If some are excessively long, we need to get threads dumps/debugger
timings from the gossiper to see if something is blocking it or taking a long
time before changing any fundamentals (gossip interval, FD formula) that we
already know work in principle without vnodes. Increasing the payload size to
32k shouldn't cause these problems, since that is only sent during initial
state synchronization and isn't all that large to begin with.

vnodes don't scale to hundreds of nodes
---

Key: CASSANDRA-6127
URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Any cluster that has vnodes and consists of hundreds of
physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch,
delayEstimatorUntilStatisticallyValid.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-24 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804770#comment-13804770
 ] 

Brandon Williams commented on CASSANDRA-6127:
-

Can you see if adding -Dcassandra.unsafesystem=true allows the cluster to 
stabilize at some point?

 vnodes don't scale to hundreds of nodes
 ---

 Key: CASSANDRA-6127
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Any cluster that has vnodes and consists of hundreds of 
 physical nodes.
Reporter: Tupshin Harper
Assignee: Jonathan Ellis
 Attachments: 6000vnodes.patch, AdjustableGossipPeriod.patch, 
 delayEstimatorUntilStatisticallyValid.patch


 There are a lot of gossip-related issues related to very wide clusters that 
 also have vnodes enabled. Let's use this ticket as a master in case there are 
 sub-tickets.
 The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge 
 instances. Each node configured with 32 vnodes.
 Without vnodes, cluster spins up fine and is ready to handle requests within 
 30 minutes or less. 
 With vnodes, nodes are reporting constant up/down flapping messages with no 
 external load on the cluster. After a couple of hours, they were still 
 flapping, had very high cpu load, and the cluster never looked like it was 
 going to stabilize or be useful for traffic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-01 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783372#comment-13783372
]

Jonathan Ellis commented on CASSANDRA-6127:
---

bq. After a couple of hours, they were still flapping, had very high cpu load

To clarify, this is a bit of a mashup of multiple observations:

bq. When there was zero traffic on the cluster, we were seeing flapping without
very high cpu. On smaller tests, we saw much higher cpu than expected when
under load.

vnodes don't scale to hundreds of nodes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

2013-10-01 Thread Darla Baker (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783481#comment-13783481
]

Darla Baker commented on CASSANDRA-6127:

Per Jonathan's request, I'm adding an update here regarding eBay's experience
on https://support.datastax.com/tickets/6928 which was the result of first
stage of executing the plan from https://support.datastax.com/requests/6636.

He had an existing 32 cluster DSE 3.1.0 cluster in their PHX data center.
Their plan was to add a second data center to the cluster in SLC with 50 nodes
and vnodes enabled. They were to begin with bringing all nodes up with auto
bootstrapping turned off to prevent any data streaming until they were ready to
make other changes to bring the data center fully online.

Essentially immediately upon bringing the nodes up in SLC, the nodes in PHX
began reporting as down and he began receiving SMS messages and calls from
application engineers that the application which uses that cluster was down.

As we were in triage mode, the most expedient course of action was to shut down
the SLC nodes and remove them from gossip. Upon trying to execute the nodetool
removenode command we hit CASSANDRA-5857 although we thought up to this point
that nodetool decommission was responsible for the issue. In any case, we
started the process of executing the workaround as per that ticket. At the
point we parted, the process was going slowly but he reported it was working
and the nodes were disappearing from the ring and the application engineers
were reporting that the application was back online.

At some point during the weekend, Alex reached out to Jeremy who was on call
and Jeremy who was able to finally get the nodes removed from gossip and fully
stabilize the 32 node PHX data center and fully decommission the SLC data
center.

Alex attached some logs to the ticket during the event. We were seeing node
flapping and NPEs during the event.

Ticket https://support.datastax.com/tickets/6917 contains some additional
details on the test cases.

Ticket https://support.datastax.com/tickets/6939 contains the alternate plan
that eBay is considering in light of the difficulties encountered with bringing
SLC online.

vnodes don't scale to hundreds of nodes
---

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

[jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes

36 matches

Site Navigation

Mail list logo

Footer information