[ceph-users] Re: Poor Windows performance on ceph RBD.

Maged Mokhtar Mon, 13 Jul 2020 03:36:08 -0700

On 13/07/2020 10:43, Frank Schilder wrote:

To anyone who is following this thread, we found a possible explanation for
(some of) our observations.

If someone is following this, they probably want the possible
explanation and not the knowledge of you having the possible
explanation.

So you are saying if you do eg. a core installation (without gui) of
2016/2019 disable all services. The fio test results are signficantly
different to eg. a centos 7 vm doing the same fio test? Are you sure
this is not related to other processes writing to disk?

Right, its not an explanation but rather a further observation. We don't really 
have an explanation yet.


Its an identical installation of both server versions, same services 
configured. Our operators are not really into debugging Windows, that's why we 
were asking here. Their hypothesis is, that the VD driver for accessing RBD 
images has problems with Windows servers newer than 2016. I'm not a Windows 
guy, so can't really comment on this.

The test we do is a simple copy-test of a single 10g file and we monitor the 
transfer speed. This info was cut out of this e-mail, the original report for 
reference is: 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/ANHJQZLJT474B457VVM4ZZZ6HBXW4OPO/
 .

We are very sure that it is not related to other processes writing to disk, we 
monitor that too. There is also no competition on the RBD pool at the time of 
testing.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Marc Roos <m.r...@f1-outsourcing.eu>
Sent: 13 July 2020 10:24
To: ceph-users; Frank Schilder
Subject: RE: [ceph-users] Re: Poor Windows performance on ceph RBD.

To anyone who is following this thread, we found a possible

explanation for

(some of) our observations.

If someone is following this, they probably want the possible
explanation and not the knowledge of you having the possible
explanation.

So you are saying if you do eg. a core installation (without gui) of
2016/2019 disable all services. The fio test results are signficantly
different to eg. a centos 7 vm doing the same fio test? Are you sure
this is not related to other processes writing to disk?



-----Original Message-----
From: Frank Schilder [mailto:fr...@dtu.dk]
Sent: maandag 13 juli 2020 9:28
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Poor Windows performance on ceph RBD.

To anyone who is following this thread, we found a possible explanation
for (some of) our observations.

We are running Windows servers version 2016 and 2019 as storage servers
exporting data on an rbd image/disk. We recently found that Windows
server 2016 runs fine. It is still not as fast as Linux + SAMBA share on
an rbd image (ca. 50%), but runs with a reasonable sustained bandwidth.
With Windows server 2019, however, we observe near-complete stall of
file transfers and time-outs using standard copy tools (robocopy). We
don't have an explanation yet and are downgrading Windows servers where
possible.

If anyone has a hint what we can do, please let us know.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

I am not sure exactly how you are testing the speed on Windows, but 2possible factors are block size and caching.

Block size depends on the client application, so a Windows file copyfrom ui will have a 512k block size which is different than xcopy orrobocopy, the later can change block size depending on flags / restartmode..etc. Similarly the dd command on Linux will give different speeddepending on block size.

Caching: Caching will make a big difference for sequential writes as itmerges smaller blocks, but in some cases it is not obvious if caching isbeing used or not since it could be at different layers, for example inyour Linux Samba export test, there could be caching done at thegateway, a clustered setup with high availability may explicitly turncaching off. What you report as initial high speed then decrease couldbe indicative of initially writing to a cache buffer then slowing whenit fills.


It will help to quantify/compare latency (iops qd=1) via:
on Linux:

rbd bench --io-type write POOL_NAME/IMAGE_NAME --io-threads=1 --io-size4K --io-pattern rand --rbd_cache=falsefio --name=xx --filename=FILE_NAME --iodepth=1 --rw=randwrite --bs=4k--direct=1 --runtime=30 --time_based

On Windows vm:
diskspd -b4k -d30 -o1 -t1 -r -Su -w100 -c1G  FILE_NAME


Measure/compare sequential writes with 512k block size
on Linux:

rbd bench --io-type write POOL_NAME/IMAGE_NAME --io-threads=1 --io-size512K --io-pattern seq --rbd_cache=falsefio --name=xx --filename=FILE_NAME --iodepth=1 --rw=write --bs=512k--direct=1 --runtime=30 --time_based

On Windows vm:
diskspd -b512k -d30 -o1 -t1  -Su -w100 -c1G  FILE_NAME

/Maged

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Poor Windows performance on ceph RBD.

Reply via email to