[ https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eddy Truyen updated CASSANDRA-15717: ------------------------------------ Description: This is my first JIRA issue. Sorry if I do something wrong in the reporting. I experienced a performance degradation when running a single Cassandra instance inside Kubernetes in comparison with running the Docker container stand-alone. I used the following image decomads/cassandra:2.2.16, which uses cassandra:2.2.16 as base image and adds a readinessProbe to it. I used identical Docker configuration parameters by ensuring that the output of docker inspect is as much as possible the same. First we got the ycsb benchmark in a container that is co-located with the cassandra container in one pod. Kubernetes starts these containers then with network mode "net=container:... This is a separate container that link up the ycsb and cassandra containers within the same network space so they can talk via localhost – by this we hope to avoid network plugin interference from the CNI plugin. We ran the docker-only container within the Kubernetes node using the default bridge network Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack VM Ubuntu 16:04 (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 16 CPU cores. Storage is Ceph. * A write-only workload (YCSB benchmark workload A - Load phase) using the following user table: cqlsh> create keyspace ycsb WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 } ; cqlsh> USE ycsb; cqlsh> create table usertable ( y_id varchar primary key, field0 varchar, field1 varchar, field2 varchar, field3 varchar, field4 varchar, field5 varchar, field6 varchar, field7 varchar, field8 varchar, field9 varchar); * And using the following script: python ./bin/ycsb load cassandra2-cql -P workloads/workloada -p recordcount=1500000 -p operationcount=1500000 -p measurementtype=raw -p cassandra.connecttimeoutmillis=60000 -p cassandra.readtimeoutmillis=60000 -target 1500 -threads 20 -p hosts=localhost > results/cassandra-docker/cassandra-docker-load-workloada-1-records-1500000-rnd-1762034446.txt sleep 15 Observations (On Ubuntu-OpenStack) * Docker: ** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms * Kubernetes ** Mean average response latency YCSB benchmark: 2.7 ms-3ms * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my position paper: [https://lirias.kuleuven.be/2788169?limo=0)]: Possible causes: * Network overhead of virtual bridge in container orchestrator is not the cause of the problem in our opinion ** We repeated the experiment where we ran the Docker-Only containers inside a Kubernetes node and we linked the containers using the --net=container: mode mechanisms as similar as possible as we could. The YCSB latency stayed the same. * Disk/io bottleneck: Nodetool tablestats are very similar ** Cassandra containers are configured to write data to a filesystem that is mounted from the host inside the container. Exactly the same Docker mount type is used ** Write latency is very stable over multiple runs *** Kubernetes for ycsb user table: 0.0167 ms. *** Write latency Docker for ycsb usertable: 0.0150 ms. ** Compaction_history/compaction_in_progress is also very similar (as opposed to earlier versions of the issue – sorry for the confusion!) Do you know of any other causes that might explain the difference in reported YCSB reponse latency? was: This is my first JIRA issue. Sorry if I do something wrong in the reporting. I experienced a performance degradation when running a single Cassandra Docker container inside Kubernetes in comparison with running the Docker container stand-alone. I used the following image decomads/cassandra:2.2.16, which uses cassandra:2.2.16 as base image and adds a readinessProbe to it. I used identical Docker configuration parameters by ensuring that the output of docker inspect is as much as possible the same. First we got the ycsb benchmark in a container that is co-located with the cassandra container in one pod. Kubernetes starts these containers then with network mode "net=container:... This is a separate container that link up the ycsb and cassandra containers within the same network space so they can talk via localhost – by this we hope to avoid network plugin interference from the CNI plugin. We ran the docker-only container within the Kubernetes node using the default bridge network Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack VM Ubuntu 16:04 (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 16 CPU cores. Storage is Ceph. * A write-only workload (YCSB benchmark workload A - Load phase) using the following user table: cqlsh> create keyspace ycsb WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 } ; cqlsh> USE ycsb; cqlsh> create table usertable ( y_id varchar primary key, field0 varchar, field1 varchar, field2 varchar, field3 varchar, field4 varchar, field5 varchar, field6 varchar, field7 varchar, field8 varchar, field9 varchar); * And using the following script: python ./bin/ycsb load cassandra2-cql -P workloads/workloada -p recordcount=1500000 -p operationcount=1500000 -p measurementtype=raw -p cassandra.connecttimeoutmillis=60000 -p cassandra.readtimeoutmillis=60000 -target 1500 -threads 20 -p hosts=localhost > results/cassandra-docker/cassandra-docker-load-workloada-1-records-1500000-rnd-1762034446.txt sleep 15 Observations (On Ubuntu-OpenStack) * Docker: ** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms * Kubernetes ** Mean average response latency YCSB benchmark: 2.7 ms-3ms * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my position paper: [https://lirias.kuleuven.be/2788169?limo=0)]: Possible causes: * Network overhead of virtual bridge in container orchestrator is not the cause of the problem in our opinion ** We repeated the experiment where we ran the Docker-Only containers inside a Kubernetes node and we linked the containers using the --net=container: mode mechanisms as similar as possible as we could. The YCSB latency stayed the same. * Disk/io bottleneck: Nodetool tablestats are very similar ** Write latency is very stable over multiple runs *** Kubernetes for ycsb user table: 0.0167 ms. *** Write latency Docker for ycsb usertable: 0.0150 ms. ** Compaction_history/compaction_in_progress is also very similar (as opposed to earlier versions of the issue – sorry for the confusion!) * > Benchmark performance difference between Docker and Kubernetes when running > Cassandra:2.2.16 official Docker image > ------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-15717 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15717 > Project: Cassandra > Issue Type: Bug > Components: Test/benchmark > Reporter: Eddy Truyen > Priority: Normal > Attachments: nodetool-compaction-history-docker-cassandra.txt, > nodetool-compaction-history-kubeadm-cassandra.txt > > > This is my first JIRA issue. Sorry if I do something wrong in the reporting. > I experienced a performance degradation when running a single Cassandra > instance inside Kubernetes in comparison with running the Docker container > stand-alone. I used the following image decomads/cassandra:2.2.16, which uses > cassandra:2.2.16 as base image and adds a readinessProbe to it. > I used identical Docker configuration parameters by ensuring that the output > of docker inspect is as much as possible the same. First we got the ycsb > benchmark in a container that is co-located with the cassandra container in > one pod. Kubernetes starts these containers then with network mode > "net=container:... This is a separate container that link up the ycsb and > cassandra containers within the same network space so they can talk via > localhost – by this we hope to avoid network plugin interference from the CNI > plugin. > We ran the docker-only container within the Kubernetes node using the default > bridge network > Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on > physical laptop with 4 cores/8 logical processors and 16GB RAM on and > Openstack VM Ubuntu 16:04 (4GB, 4 CPU cores, 50GB), that runs on a physical > nodes with 16 CPU cores. Storage is Ceph. > * A write-only workload (YCSB benchmark workload A - Load phase) using the > following user table: > cqlsh> create keyspace ycsb > WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 } > ; > cqlsh> USE ycsb; > cqlsh> create table usertable ( > y_id varchar primary key, > field0 varchar, > field1 varchar, > field2 varchar, > field3 varchar, > field4 varchar, > field5 varchar, > field6 varchar, > field7 varchar, > field8 varchar, > field9 varchar); > * And using the following script: python ./bin/ycsb load cassandra2-cql -P > workloads/workloada -p recordcount=1500000 -p operationcount=1500000 -p > measurementtype=raw -p cassandra.connecttimeoutmillis=60000 -p > cassandra.readtimeoutmillis=60000 -target 1500 -threads 20 -p hosts=localhost > > > results/cassandra-docker/cassandra-docker-load-workloada-1-records-1500000-rnd-1762034446.txt > sleep 15 > Observations (On Ubuntu-OpenStack) > * Docker: > ** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms > * Kubernetes > ** Mean average response latency YCSB benchmark: 2.7 ms-3ms > * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my > position paper: [https://lirias.kuleuven.be/2788169?limo=0)]: > Possible causes: > * Network overhead of virtual bridge in container orchestrator is not the > cause of the problem in our opinion > ** We repeated the experiment where we ran the Docker-Only containers inside > a Kubernetes node and we linked the containers using the --net=container: > mode mechanisms as similar as possible as we could. The YCSB latency stayed > the same. > * Disk/io bottleneck: Nodetool tablestats are very similar > ** Cassandra containers are configured to write data to a filesystem that is > mounted from the host inside the container. Exactly the same Docker mount > type is used > ** Write latency is very stable over multiple runs > *** Kubernetes for ycsb user table: 0.0167 ms. > *** Write latency Docker for ycsb usertable: 0.0150 ms. > ** Compaction_history/compaction_in_progress is also very similar (as > opposed to earlier versions of the issue – sorry for the confusion!) > Do you know of any other causes that might explain the difference in reported > YCSB reponse latency? > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org