[ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088741#comment-17088741
 ] 

Eddy Truyen commented on CASSANDRA-15717:
-----------------------------------------

Hi,

The performance overhead turned out to  be caused by a --parent-cgroup option 
that was set in Kubernetes orchestrated container and not set in the Docker 
container. For more information,see the following [Kubernetes 
issue|https://github.com/kubernetes/kubernetes/issues/90133].

> Benchmark performance difference between Docker and Kubernetes when running 
> Cassandra:2.2.16 official Docker image
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/benchmark
>            Reporter: Eddy Truyen
>            Priority: Normal
>         Attachments: nodetool-compaction-history-docker-cassandra.txt, 
> nodetool-compaction-history-kubeadm-cassandra.txt
>
>
> Sorry for the slightly irrelevant post. This is not an issue with Cassandra 
> but possibly with the interaction between Cassandra and Kubernetes.
> We experienced a performance degradation when running a single Cassandra 
> instance inside kubeadm 1.14 in comparison with running the Docker container 
> stand-alone.
>  A write-only workload (YCSB benchmark workload A - Load phase) using the 
> following user table:
>  
> {{ cqlsh> create keyspace ycsb
>     WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
>     ;
>     cqlsh> USE ycsb;
>     cqlsh> create table usertable (
>     y_id varchar primary key,
>     field0 varchar,
>     field1 varchar,
>     field2 varchar,
>     field3 varchar,
>     field4 varchar,
>     field5 varchar,
>     field6 varchar,
>     field7 varchar,
>     field8 varchar,
>     field9 varchar);}}
> And using the following script:
>  
> {{python ./bin/ycsb load cassandra2-cql -P workloads/workloada -p 
> recordcount=1500000 -p 
> operationcount=1500000 -p measurementtype=raw -p 
> cassandra.connecttimeoutmillis=60000 -p 
> cassandra.readtimeoutmillis=60000 -target 1500 -threads 20 -p hosts=localhost 
> > 
> results/cassandra-docker/cassandra-docker-load-workloada-1-records-1500000-rnd-1762034446.txt
> sleep 15}}
> We used the following image: {{decomads/cassandra:2.2.16}}, which uses the 
> official {{cassandra:2.2.16}} as base image and adds a readinessProbe to it.
> We used identical Docker configuration parameters by ensuring that the output 
> of {{docker inspect}} is as much as possible the same. First we got the YCSB 
> benchmark in a container that is co-located with the cassandra container in 
> one pod. Kubernetes starts these containers then with network mode 
> {{net=container:...}} This is a separate container that links up the ycsb and 
> cassandra containers within the same network space so they can talk via 
> localhost. By this we hope to avoid network plugin interference from the CNI 
> plugin.
> We ran the docker-only container within the Kubernetes node using the default 
> bridge network
> We first performed the experiment on an Openstack VM Ubuntu 16:04 (4GB, 4 CPU 
> cores, 50GB), that runs on a physical nodes with 16 CPU cores. Storage is 
> Ceph however and therefore distributed
> To avoid distributed storage of ceph, we repeated the experiment also on 
> minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on a Windows 10 laptop with 4 
> cores/8 logical processors and 16GB RAM. However the same performance 
> degradation was measured.
> Observations (On Ubuntu-OpenStack)
>  * Docker:
>  ** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms
>  * Kubernetes
>  ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
>  * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
> position paper: [https://lirias.kuleuven.be/2788169?limo=0]):
> Possible causes:
>  * Network overhead of virtual bridge in Kubernetes is not the cause of the 
> problem in our opinion.
>  ** We repeated the experiment where we ran the Docker-Only containers inside 
> a Kubernetes node and we linked the containers using the --net=container: 
> mode mechanisms as similar as possible as we could. The YCSB latency stayed 
> the same.
>  * Disk/io bottleneck: Nodetool tablestats are very similar. Cassandra 
> containers are configured to write data to a filesystem that is mounted from 
> the host inside the container. Exactly the same Docker mount type is used
>  ** Write latency is very stable over multiple runs
>  * Kubernetes for ycsb user table: 0.0167 ms.
>  * Write latency Docker for ycsb usertable: 0.0150 ms.
>  ** Compaction_history/compaction_in_progress is also very similar (see 
> attached files)
> )
> Do you know of any other causes that might explain the difference in reported 
> YCSB reponse latency? Could it be the the Cassandra Session is closed by 
> Kubernetes after each request?  How can I diagnose this?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to