Doing some tests using iperf, our network has a bandwidth among nodes of
940 Mbits/sec.
According to our metrics of network use in this cluster, hosts with OSD
have a peek traffic of about 200 Mbits/sec each and the client which runs
FIO about 300 Mbits/sec.
It doesn't seem to be saturated the network.





On Wed, Oct 5, 2016 at 4:16 PM, Will.Boege <will.bo...@target.com> wrote:

> Because you do not have segregated networks, the cluster traffic is most
> likely drowning out the FIO user traffic.  This is especially exacerbated
> by the fact that it is only a 1gb link between the cluster nodes.
>
>
>
> If you are planning on using this cluster for anything other than testing,
> you’ll want to re-evaluate your network architecture.
>
>
>
> +  >= 10gbe
>
> + Dedicated cluster network
>
>
>
>
>
> *From: *Mario Rodríguez Molins <mariorodrig...@tuenti.com>
> *Date: *Wednesday, October 5, 2016 at 8:38 AM
> *To: *"Will.Boege" <will.bo...@target.com>
> *Cc: *"ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
> *Subject: *Re: [EXTERNAL] [ceph-users] Benchmarks using fio tool gets
> stuck
>
>
>
> Hi,
>
>
>
> Currently, we do not have a separated cluster network and our setup is:
>
>  - 3 nodes for OSD with 1Gbps links. Each node is running a unique OSD
> daemon. Although we plan to increase the number of OSDs per host.
>
>  - 3 virtual machines also with 1Gbps links, where each vm is running one
> monitor daemon (two of them are running a metadata server too).
>
>  - The two clients used for testing purposes are also 2 vms.
>
>
>
> In each run of FIO tool, we do the following steps (all of them in the
> client):
>
>  1.- Create an rbd image of 1Gb within a pool and map this image to a
> block device
>
>  2.- Create the ext4 filesystem in this block device
>
>  3.- Unmap the device from the client
>
>  4.- Before testing, drop caches (echo 3 | tee /proc/sys/vm/drop_caches &&
> sync)
>
>  5.- Perform the fio test, setting the pool and name of the rbd image. In
> each run, the block size used is changed.
>
>  6.- Remove the image from the pool
>
>
>
>
>
>
>
> Thanks in advance!
>
>
>
> On Wed, Oct 5, 2016 at 2:57 PM, Will.Boege <will.bo...@target.com> wrote:
>
> What does your network setup look like?  Do you have a separate cluster
> network?
>
>
>
> Can you explain how you are performing the FIO test? Are you mounting a
> volume through krbd and testing that from a different server?
>
>
> On Oct 5, 2016, at 3:11 AM, Mario Rodríguez Molins <
> mariorodrig...@tuenti.com> wrote:
>
> Hello,
>
>
>
> We are setting a new cluster of Ceph and doing some benchmarks on it.
>
> At this moment, our cluster consists of:
>
>  - 3 nodes for OSD. In our current configuration one daemon per node.
>
>  - 3 nodes for monitors (MON). In two of these nodes, there is a metadata
> server (MDS).
>
>
>
> Benchmarks are performed with tools that ceph/rados provides us as well as
> with fio benchmark tool.
>
> Our benchmark tests are based on this tutorial: http://tracker.ceph.com/
> projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance.
>
>
>
> Using fio benchmark tool, we are having some issues. After some
> executions, the fio process gets stuck with futex_wait_queue_me call:
>
> # cat /proc/14413/stack
>
> [<ffffffffa7af6622>] futex_wait_queue_me+0xd2/0x140
>
> [<ffffffffa7af74bf>] futex_wait+0xff/0x260
>
> [<ffffffffa7aa3a6d>] wake_up_q+0x2d/0x60
>
> [<ffffffffa7af7d11>] futex_requeue+0x2c1/0x930
>
> [<ffffffffa7af8fd1>] do_futex+0x2b1/0xb20
>
> [<ffffffffa7badfb1>] handle_mm_fault+0x14e1/0x1cd0
>
> [<ffffffffa7aa48e8>] wake_up_new_task+0x108/0x1a0
>
> [<ffffffffa7af98c3>] SyS_futex+0x83/0x180
>
> [<ffffffffa7a63981>] __do_page_fault+0x221/0x510
>
> [<ffffffffa7fda736>] system_call_fast_compare_end+0xc/0x96
>
> [<ffffffffffffffff>] 0xffffffffffffffff
>
>
>
> Logs of osd and mon daemons do not show any information or error about
> what the problem could be.
>
>
>
> Executing strace command to trace the execution of the fio process show
> the following:
>
>
>
> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
> 632809, {1475609725, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed
> out)
>
> [pid 14416] gettimeofday({1475609725, 98347}, NULL) = 0
>
> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>
> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 345690227}) = 0
>
> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
> 632811, {1475609725, 348199000}, ffffffff <unfinished ...>
>
> [pid 14429] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> out)
>
> [pid 14429] clock_gettime(CLOCK_REALTIME, {1475609725, 127563261}) = 0
>
> [pid 14429] futex(0x7cefc8, FUTEX_WAKE_PRIVATE, 1) = 0
>
> [pid 14429] futex(0x7cf01c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME,
> 79103, {1475609727, 127563261}, ffffffff <unfinished ...>
>
> [pid 14416] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> out)
>
> [pid 14416] gettimeofday({1475609725, 348403}, NULL) = 0
>
> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>
> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 595788486}) = 0
>
> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
> 632813, {1475609725, 598199000}, ffffffff) = -1 ETIMEDOUT (Connection timed
> out)
>
> [pid 14416] gettimeofday({1475609725, 598360}, NULL) = 0
>
> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>
> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125063, 845712817}) = 0
>
> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
> 632815, {1475609725, 848199000}, ffffffff) = -1 ETIMEDOUT (Connection timed
> out)
>
> [pid 14416] gettimeofday({1475609725, 848353}, NULL) = 0
>
> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>
> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 95705677}) = 0
>
> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
> 632817, {1475609726, 98199000}, ffffffff) = -1 ETIMEDOUT (Connection timed
> out)
>
> [pid 14416] gettimeofday({1475609726, 98359}, NULL) = 0
>
> [pid 14416] futex(0x7fffdffa16d0, FUTEX_WAKE, 1) = 0
>
> [pid 14416] clock_gettime(CLOCK_MONOTONIC_RAW, {125064, 345711731}) = 0
>
> [pid 14416] futex(0x7fffdffa16fc, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME,
> 632819, {1475609726, 348199000}, ffffffff <unfinished ...>
>
> [pid 14418] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> out)
>
> [pid 14418] futex(0x7c1f08, FUTEX_WAKE_PRIVATE, 1) = 0
>
> [pid 14418] clock_gettime(CLOCK_REALTIME, {1475609726, 103526543}) = 0
>
> [pid 14418] futex(0x7c1f5c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME,
> 31641, {1475609731, 103526543}, ffffffff <unfinished ...>
>
> [pid 14419] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> out)
>
> ....
>
>
>
> [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730557149}) = 0
>
> [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 730727417}) = 0
>
> [pid 14423] futex(0x7c8c34, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647,
> 0x7c8b60, 15902 <unfinished ...>
>
> [pid 14425] <... futex resumed> )       = 0
>
> [pid 14425] futex(0x7c8b60, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>
> [pid 14423] <... futex resumed> )       = 1
>
> [pid 14423] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>
> [pid 14425] <... futex resumed> )       = 0
>
> [pid 14425] futex(0x7c8b60, FUTEX_WAKE_PRIVATE, 1) = 0
>
> [pid 14425] clock_gettime(CLOCK_REALTIME, {1475609728, 731160249}) = 0
>
> [pid 14425] sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"\16", 1},
> {"\200\4\364W\271\236\224+", 8}], msg_controllen=0, msg_flags=0},
> MSG_NOSIGNAL) = 9
>
> [pid 14425] futex(0x7c8c34, FUTEX_WAIT_PRIVATE, 15903, NULL <unfinished
> ...>
>
> [pid 14423] <... futex resumed> )       = 1
>
> [pid 14423] clock_gettime(CLOCK_REALTIME, {1475609728, 731811246}) = 0
>
> [pid 14423] futex(0x775430, FUTEX_WAKE_PRIVATE, 1) = 0
>
> [pid 14423] futex(0x775494, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME,
> 15823, {1475609738, 731811246}, ffffffff <unfinished ...>
>
> [pid 14426] <... restart_syscall resumed> ) = 1
>
> [pid 14426] recvfrom(3, "\17\200\4\364W\271\236\224+", 4096, MSG_DONTWAIT,
> NULL, NULL) = 9
>
> [pid 14426] clock_gettime(CLOCK_REALTIME, {1475609728, 732608460}) = 0
>
> [pid 14426] poll([{fd=3, events=POLLIN|0x2000}], 1, 900000 <unfinished ...>
>
> [pid 14417] <... futex resumed> )       = 0
>
> [pid 14417] futex(0x771e28, FUTEX_WAKE_PRIVATE, 1) = 0
>
> [pid 14417] futex(0x771eac, FUTEX_WAIT_PRIVATE, 32223, NULL <unfinished
> ...>
>
> [pid 14416] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> out)
>
>
>
>
>
> This issue has appeared in our two clients. These two clients are running
> Debian Jessie, each one with a different kernel:
>
>  - kernel 3.16.7-ckt25-2+deb8u3
>  - kernel 4.7.2-1~bpo8+1
>
> And the following version of the packages have been used in both clients:
>
> - Ceph cluster 10.2.2 & FIO 2.1.11-2
>
> - Ceph cluster 10.2.3 & FIO 2.1.11-2
>
> - Ceph cluster 10.2.3 & FIO 2.14
>
>
>
> We launch fio tool varying different settings such block size and
> operation type.
> This is a simplified snippet of the shell script used:
>
>
>
> for operation in read write randread randwrite; do
>
>
>   for rbd in 4K 64K 1M 4M; do
>
>     for bs in 4k 64k 1M 4M ; do
>
>       # create rbd image with block size $rbd
>
>       # drop caches
>
>
>
>       fio --name=global \
>
>       --ioengine=rbd \
>
>       --clientname=admin \
>
>       --pool=scbench \
>
>       --rbdname=image01 \
>
>       --bs=${bs} \
>
>       --name=rbd_iodeph32 \
>       --iodepth=32 \
>
>       --rw=${operation} \
>
>       --output-format=json
>
>
>
>       sleep 10
>       # delete rbd image
>
>     done
>
>   done
>
> done
>
>
>
>
>
>
>
> Any ideas why it could be happening ? Are we missing some settings in fio
> tool ?
>
>
>
> Regards,
>
>
>
>
>
> --
>
> [image: mage removed by sender.]
> *Mario Rodríguez*
> SRE
> mariorodrig...@tuenti.com
>
> +34 914 294 039 — 645 756 437
> C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
> Tuenti Technologies, S.L.
> www.tuenti.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
> --
>
> [image: mage removed by sender.]
> *Mario Rodríguez*
> SRE
> mariorodrig...@tuenti.com
>
> +34 914 294 039 — 645 756 437
> C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
> Tuenti Technologies, S.L.
> www.tuenti.com
>



-- 

*Mario Rodríguez*
SRE
mariorodrig...@tuenti.com

+34 914 294 039 — 645 756 437
C/ Gran Vía, nº 28, 6ª planta — 28013 Madrid
Tuenti Technologies, S.L.
www.tuenti.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to