Doing some tests using iperf, our network has a bandwidth among nodes of 940 Mbits/sec. According to our metrics of network use in this cluster, hosts with OSD have a peek traffic of about 200 Mbits/sec each and the client which runs FIO about 300 Mbits/sec. It doesn't seem to be saturated the network.
On Wed, Oct 5, 2016 at 4:16 PM, Will.Boege <will.bo...@target.com> wrote: > Because you do not have segregated networks, the cluster traffic is most > likely drowning out the FIO user traffic. This is especially exacerbated > by the fact that it is only a 1gb link between the cluster nodes. > > > > If you are planning on using this cluster for anything other than testing, > you’ll want to re-evaluate your network architecture. > > > > + >= 10gbe > > + Dedicated cluster network > > > > > > *From: *Mario Rodríguez Molins <mariorodrig...@tuenti.com> > *Date: *Wednesday, October 5, 2016 at 8:38 AM > *To: *"Will.Boege" <will.bo...@target.com> > *Cc: *"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com> > *Subject: *Re: [EXTERNAL] [ceph-users] Benchmarks using fio tool gets > stuck > > > > Hi, > > > > Currently, we do not have a separated cluster network and our setup is: > > - 3 nodes for OSD with 1Gbps links. Each node is running a unique OSD > daemon. Although we plan to increase the number of OSDs per host. > > - 3 virtual machines also with 1Gbps links, where each vm is running one > monitor daemon (two of them are running a metadata server too). > > - The two clients used for testing purposes are also 2 vms. > > > > In each run of FIO tool, we do the following steps (all of them in the > client): > > 1.- Create an rbd image of 1Gb within a pool and map this image to a > block device > > 2.- Create the ext4 filesystem in this block device > > 3.- Unmap the device from the client > > 4.- Before testing, drop caches (echo 3 | tee /proc/sys/vm/drop_caches && > sync) > > 5.- Perform the fio test, setting the pool and name of the rbd image. In > each run, the block size used is changed. > > 6.- Remove the image from the pool > > > > > > > > Thanks in advance! > > > > On Wed, Oct 5, 2016 at 2:57 PM, Will.Boege <will.bo...@target.com> wrote: > > What does your network setup look like? Do you have a separate cluster > network? > > > > Can you explain how you are performing the FIO test? Are you mounting a > volume through krbd and testing that from a different server? > > > On Oct 5, 2016, at 3:11 AM, Mario Rodríguez Molins < > mariorodrig...@tuenti.com> wrote: > > Hello, > > > > We are setting a new cluster of Ceph and doing some benchmarks on it. > > At this moment, our cluster consists of: > > - 3 nodes for OSD. In our current configuration one daemon per node. > > - 3 nodes for monitors (MON). In two of these nodes, there is a metadata > server (MDS). > > > > Benchmarks are performed with tools that ceph/rados provides us as well as > with fio benchmark tool. > > Our benchmark tests are based on this tutorial: http://tracker.ceph.com/ > projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance. > > > > Using fio benchmark tool, we are having some issues. After some > executions, the fio process gets stuck with futex_wait_queue_me call: > > # cat /proc/14413/stack > > [<ffffffffa7af6622>] futex_wait_queue_me+0xd2/0x140 > > [<ffffffffa7af74bf>] futex_wait+0xff/0x260 > > [<ffffffffa7aa3a6d>] wake_up_q+0x2d/0x60 > > [<ffffffffa7af7d11>] futex_requeue+0x2c1/0x930 > > [<ffffffffa7af8fd1>] do_futex+0x2b1/0xb20 > > [<ffffffffa7badfb1>] handle_mm_fault+0x14e1/0x1cd0 > > [<ffffffffa7aa48e8>] wake_up_new_task+0x108/0x1a0 > > [<ffffffffa7af98c3>] SyS_futex+0x83/0x180 > > [<ffffffffa7a63981>] __do_page_fault+0x221/0x510 > > [<ffffffffa7fda736>] system_call_fast_compare_end+0xc/0x96 > > [<ffffffffffffffff>] 0xffffffffffffffff > > > > Logs of osd and mon daemons do not show any information or error about > what the problem could be. > > > > Executing strace command to trace the execution of the fio process show > the following: > > > > [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, > 632809, {1475609725, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed > out) > > [pid 14416] gettimeofday({1475609725, 98347}, NULL) = 0 > > [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0 > > [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 345690227}) = 0 > > [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, > 632811, {1475609725, 348199000}, ffffffff <unfinished ...> > > [pid 14429] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed > out) > > [pid 14429] clock_gettime(CLOCK_REALTIME, {1475609725, 127563261}) = 0 > > [pid 14429] futex(0x7cefc8, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 14429] futex(0x7cf01c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, > 79103, {1475609727, 127563261}, ffffffff <unfinished ...> > > [pid 14416] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed > out) > > [pid 14416] gettimeofday({1475609725, 348403}, NULL) = 0 > > [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0 > > [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 595788486}) = 0 > > [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, > 632813, {1475609725, 598199000}, ffffffff) = -1 ETIMEDOUT (Connection timed > out) > > [pid 14416] gettimeofday({1475609725, 598360}, NULL) = 0 > > [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0 > > [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 845712817}) = 0 > > [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, > 632815, {1475609725, 848199000}, ffffffff) = -1 ETIMEDOUT (Connection timed > out) > > [pid 14416] gettimeofday({1475609725, 848353}, NULL) = 0 > > [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0 > > [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 95705677}) = 0 > > [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, > 632817, {1475609726, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed > out) > > [pid 14416] gettimeofday({1475609726, 98359}, NULL) = 0 > > [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0 > > [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 345711731}) = 0 > > [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, > 632819, {1475609726, 348199000}, ffffffff <unfinished ...> > > [pid 14418] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed > out) > > [pid 14418] futex(0x7c1f08, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 14418] clock_gettime(CLOCK_REALTIME, {1475609726, 103526543}) = 0 > > [pid 14418] futex(0x7c1f5c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, > 31641, {1475609731, 103526543}, ffffffff <unfinished ...> > > [pid 14419] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed > out) > > .... > > > > [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730557149}) = 0 > > [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730727417}) = 0 > > [pid 14423] futex(0x7c8c34, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, > 0x7c8b60, 15902 <unfinished ...> > > [pid 14425] <... futex resumed> ) = 0 > > [pid 14425] futex(0x7c8b60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> > > [pid 14423] <... futex resumed> ) = 1 > > [pid 14423] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> > > [pid 14425] <... futex resumed> ) = 0 > > [pid 14425] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 14425] clock_gettime(CLOCK_REALTIME, {1475609728, 731160249}) = 0 > > [pid 14425] sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1}, > {"\200\4\364W\271\236\224+", 8}], msg_controllen=0, msg_flags=0}, > MSG_NOSIGNAL) = 9 > > [pid 14425] futex(0x7c8c34, FUTEX_WAIT_PRIVATE, 15903, NULL <unfinished > ...> > > [pid 14423] <... futex resumed> ) = 1 > > [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 731811246}) = 0 > > [pid 14423] futex(0x775430, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 14423] futex(0x775494, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, > 15823, {1475609738, 731811246}, ffffffff <unfinished ...> > > [pid 14426] <... restart_syscall resumed> ) = 1 > > [pid 14426] recvfrom(3, "\17\200\4\364W\271\236\224+", 4096, MSG_DONTWAIT, > NULL, NULL) = 9 > > [pid 14426] clock_gettime(CLOCK_REALTIME, {1475609728, 732608460}) = 0 > > [pid 14426] poll([{fd=3, events=POLLIN|0x2000}], 1, 900000 <unfinished ...> > > [pid 14417] <... futex resumed> ) = 0 > > [pid 14417] futex(0x771e28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 14417] futex(0x771eac, FUTEX_WAIT_PRIVATE, 32223, NULL <unfinished > ...> > > [pid 14416] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed > out) > > > > > > This issue has appeared in our two clients. These two clients are running > Debian Jessie, each one with a different kernel: > > - kernel 3.16.7-ckt25-2+deb8u3 > - kernel 4.7.2-1~bpo8+1 > > And the following version of the packages have been used in both clients: > > - Ceph cluster 10.2.2 & FIO 2.1.11-2 > > - Ceph cluster 10.2.3 & FIO 2.1.11-2 > > - Ceph cluster 10.2.3 & FIO 2.14 > > > > We launch fio tool varying different settings such block size and > operation type. > This is a simplified snippet of the shell script used: > > > > for operation in read write randread randwrite; do > > > for rbd in 4K 64K 1M 4M; do > > for bs in 4k 64k 1M 4M ; do > > # create rbd image with block size $rbd > > # drop caches > > > > fio --name=global \ > > --ioengine=rbd \ > > --clientname=admin \ > > --pool=scbench \ > > --rbdname=image01 \ > > --bs=${bs} \ > > --name=rbd_iodeph32 \ > --iodepth=32 \ > > --rw=${operation} \ > > --output-format=json > > > > sleep 10 > # delete rbd image > > done > > done > > done > > > > > > > > Any ideas why it could be happening ? Are we missing some settings in fio > tool ? > > > > Regards, > > > > > > -- > > [image: mage removed by sender.] > *Mario Rodríguez* > SRE > mariorodrig...@tuenti.com > > +34 914 294 039 — 645 756 437 > C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid > Tuenti Technologies, S.L. > www.tuenti.com > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > -- > > [image: mage removed by sender.] > *Mario Rodríguez* > SRE > mariorodrig...@tuenti.com > > +34 914 294 039 — 645 756 437 > C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid > Tuenti Technologies, S.L. > www.tuenti.com > -- *Mario Rodríguez* SRE mariorodrig...@tuenti.com +34 914 294 039 — 645 756 437 C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid Tuenti Technologies, S.L. www.tuenti.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com