Hallo Jeff,
very interesting stuff, thank you for sharing!
Indeed, I am storing time-series data. The table has 67 columns. Writing is
done in two steps: First 43 fields (3 primary key fields and 40 data fields)
than 27 fields (3 primary key fields and 24 data fields) in a second step,
always one row (one timestamp) at a time. The two write steps happen within
milliseconds of each other, i.e. in the vast majority of cases the column
should be consolidated in the memtable before hitting the disk. Extrapolating
from your example I would think that this should not be the cause of the
excessive bandwidth usage?
Best
Jens
On 15. Dec 2020, at 17:47, Jeff Jirsa
mailto:jji...@gmail.com>> wrote:
There's a small amount of overhead on each packet for serialization - e.g.,
each mutation is tied to a column family (uuid) and gets serialized with sizes
and checksums, so I guess there's a point where your updates are small enough
that the overhead of the mutations starts being visible.
You mentioned you're storing time-series. Hard to know what your time series
actually is, but pretending it's recording weather over time (easy throwaway
example): if you're writing 100kb chunks of text or json at various timestamps
(e.g. "wind speed, wind direction, low temp, high temp, precipitation volume,
precipitation type, small craft advisory, hurricane warning"), you won't notice
the serialized sizes or uuid overhead. But if you're setting
location=temperature values, that's pretty small, and the overhead starts
showing up
On Tue, Dec 15, 2020 at 8:39 AM Jens Fischer
mailto:j.fisc...@sonnen.de>> wrote:
Hi Scott,
Thank you for your help. There was an error or at least an ambiguity in my
second Mail! I wrote:
I still see outgoing cross-DC traffic of ~ 2x the “write size A”
What I wanted to say was: I still see outgoing cross-DC traffic of ~ 2x the
“write size A” per remote DC or 4x the "write size A” in total.
Your response underlines that this is way more than expected. Any idea what
could cause this or how to further debug? As mentioned I already checked for
anti-entropy repair (not running) and read repairs (read_repair_chance and
dclocal_read_repair_chance set to 0) and hints (no hints replay according to
logs, hint directory empty).
Best
Jens
On 10. Dec 2020, at 01:28, Scott Hirleman
mailto:scott.hirle...@gmail.com>> wrote:
2x makes sense tho. If you have 3 DCs, you write locally to DC1 and then it
gets replicated once in DC1 and then it gets replicated to DC2 AND DC3 at
consistency local_one via cross DCtraffic to one of the nodes in each DC, then
replicated in each DC to a second node via local traffic
Write comes in to DC1, node 1; it replicates to 1) DC1 node 2, DC2 node 1, and
DC3 node 1. So the outgoing traffic is 2x the write size by going to each DC2
and DC3. Once it gets written to DC2 node 1, it gets replicated locally to DC2
node 2; same for DC3 re DC3 node 2.
On Wed, Dec 2, 2020 at 9:36 AM Jens Fischer
mailto:j.fisc...@sonnen.de>> wrote:
Hi,
I checked for all the given other factors - anti entropy repair, hints, read
repair - and I still see outgoing cross-DC traffic of ~ 2x the “write size A”
(as defined below). Given Jeffs answers this is not to be expected, i.e. there
is something wrong here. Does anybody have an idea how to debug?
I define the "write size A” as follows: Take the incoming traffic from all
nodes inserting into DC1 and sum it up.
Best
Jens
On 30. Nov 2020, at 12:00, Jens Fischer
mailto:j.fisc...@sonnen.de>> wrote:
Hi Jeff,
Thank you for your answer, very helpful already!
All writes are done with `LOCAL_ONE` and we have RF=2 in each data center.
To compare our examples we need to come to an agreement on what you are calling
“write size A”. I gave two different write sizes:
I call the bandwidth for receiving the the data on Node A "base bandwidth”
This is the inbound traffic at Node A. Data to Node A is transmitted as
Protobuf inside VPN tunnels. A very rough estimate of data size, I know. Node A
is not a Cassandra node!
Inserting into Cassandra (in one DC) takes 2-3 times the base bandwidth
I looked at all the Cassandra nodes in DC1 and the traffic coming from Node A.
I then summed up this traffic.
@Jeff: I assume this is closer to what you call “write size A”?
Best
Jens
On 26. Nov 2020, at 17:12, Jeff Jirsa
mailto:jji...@gmail.com>> wrote:
On Nov 26, 2020, at 9:53 AM, Jens Fischer
mailto:j.fisc...@sonnen.de>> wrote:
Hi,
we run a Cassandra cluster with three DCs. We noticed that the traffic incurred
by running the Cluster is significant.
Consider the following simplified IoT scenario:
* time series data from devices in the field is received at Node A
* Node A inserts the data into DC 1
* DC 1 replicates the data within the DC and two the other two DCs
The traffic this produces is significant. The numbers below are based on
observing the incoming and outgoing traffic on the node level:
* I call the bandwidth