Re: [ceph-users] Ceph performance is too good (impossible..)...
Hi, if you wrote from an client, the data was written in an (or more) Placement Group in 4MB-Chunks. This PGs are written to journal and the osd-disk and due this the data are in the linux file buffer on the osd-node too (until the os need the storage for other data (file buffer or anything else)). If you read the data from the client again, the osd-node takes the data from the file buffer instead to read the same data again from the slow disks. Ths is the reason, why huge ram in osd-nodes speed up ceph ;-) Normaly nice, but difficult for benchmarking. Udo Am 2016-12-12 05:51, schrieb V Plus: Hi.. Udo, I am not sure I understood what you said. Did you mean that the 'dd' command also got cached in the osd node? or?? On Sun, Dec 11, 2016 at 10:46 PM, Udo Lembkewrote: Hi, but I assume you measure also cache in this scenario - the osd-nodes has cached the writes in the filebuffer (due this the latency should be very small). Udo On 12.12.2016 03:00, V Plus wrote: > Thanks Somnath! > As you recommended, I executed: > dd if=/dev/zero bs=1M count=4096 of=/dev/rbd0 > dd if=/dev/zero bs=1M count=4096 of=/dev/rbd1 > > Then the output results look more reasonable! > Could you tell me why?? > > Btw, the purpose of my run is to test the performance of rbd in ceph. > Does my case mean that before every test, I have to "initialize" all > the images??? > > Great thanks!! > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
Hi.. Udo, I am not sure I understood what you said. Did you mean that the 'dd' command also got cached in the osd node? or?? On Sun, Dec 11, 2016 at 10:46 PM, Udo Lembke <ulem...@polarzone.de> wrote: > Hi, > but I assume you measure also cache in this scenario - the osd-nodes has > cached the writes in the filebuffer > (due this the latency should be very small). > > Udo > > On 12.12.2016 03:00, V Plus wrote: > > Thanks Somnath! > > As you recommended, I executed: > > dd if=/dev/zero bs=1M count=4096 of=/dev/rbd0 > > dd if=/dev/zero bs=1M count=4096 of=/dev/rbd1 > > > > Then the output results look more reasonable! > > Could you tell me why?? > > > > Btw, the purpose of my run is to test the performance of rbd in ceph. > > Does my case mean that before every test, I have to "initialize" all > > the images??? > > > > Great thanks!! > > > > On Sun, Dec 11, 2016 at 8:47 PM, Somnath Roy <somnath@sandisk.com > > <mailto:somnath@sandisk.com>> wrote: > > > > Fill up the image with big write (say 1M) first before reading and > > you should see sane throughput. > > > > > > > > Thanks & Regards > > > > Somnath > > > > *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com > > <mailto:ceph-users-boun...@lists.ceph.com>] *On Behalf Of *V Plus > > *Sent:* Sunday, December 11, 2016 5:44 PM > > *To:* ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > *Subject:* [ceph-users] Ceph performance is too good > (impossible..)... > > > > > > > > Hi Guys, > > > > we have a ceph cluster with 6 machines (6 OSD per host). > > > > 1. I created 2 images in Ceph, and map them to another host A > > (*/outside /*the Ceph cluster). On host A, I > > got *//dev/rbd0/* and*/ /dev/rbd1/*. > > > > 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio > > job descriptions can be found below) > > > > */"sudo fio fioA.job -output a.txt & sudo fio fioB.job -output > > b.txt & wait"/* > > > > 3. After the test, in a.txt, we got */bw=1162.7MB/s/*, in b.txt, > > we get */bw=3579.6MB/s/*. > > > > The results do NOT make sense because there is only one NIC on > > host A, and its limit is 10 Gbps (1.25GB/s). > > > > > > > > I suspect it is because of the cache setting. > > > > But I am sure that in file *//etc/ceph/ceph.conf/* on host A,I > > already added: > > > > */[client]/* > > > > */rbd cache = false/* > > > > > > > > Could anyone give me a hint what is missing? why > > > > Thank you very much. > > > > > > > > *fioA.job:* > > > > /[A]/ > > > > /direct=1/ > > > > /group_reporting=1/ > > > > /unified_rw_reporting=1/ > > > > /size=100%/ > > > > /time_based=1/ > > > > /filename=/dev/rbd0/ > > > > /rw=read/ > > > > /bs=4MB/ > > > > /numjobs=16/ > > > > /ramp_time=10/ > > > > /runtime=20/ > > > > > > > > *fioB.job:* > > > > /[B]/ > > > > /direct=1/ > > > > /group_reporting=1/ > > > > /unified_rw_reporting=1/ > > > > /size=100%/ > > > > /time_based=1/ > > > > /filename=/dev/rbd1/ > > > > /rw=read/ > > > > /bs=4MB/ > > > > /numjobs=16/ > > > > /ramp_time=10/ > > > > /runtime=20/ > > > > > > > > /Thanks.../ > > > > PLEASE NOTE: The information contained in this electronic mail > > message is intended only for the use of the designated > > recipient(s) named above. If the reader of this message is not the > > intended recipient, you are hereby notified that you have received > > this message in error and that any review, dissemination, > > distribution, or copying of this message is strictly prohibited. > > If you have received this communication in error, please notify > > the sender by telephone or e-mail (as shown above) immediately and > > destroy any and all copies of this message in your possession > > (whether hard copies or electronically stored copies). > > > > > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
Hi, but I assume you measure also cache in this scenario - the osd-nodes has cached the writes in the filebuffer (due this the latency should be very small). Udo On 12.12.2016 03:00, V Plus wrote: > Thanks Somnath! > As you recommended, I executed: > dd if=/dev/zero bs=1M count=4096 of=/dev/rbd0 > dd if=/dev/zero bs=1M count=4096 of=/dev/rbd1 > > Then the output results look more reasonable! > Could you tell me why?? > > Btw, the purpose of my run is to test the performance of rbd in ceph. > Does my case mean that before every test, I have to "initialize" all > the images??? > > Great thanks!! > > On Sun, Dec 11, 2016 at 8:47 PM, Somnath Roy <somnath@sandisk.com > <mailto:somnath@sandisk.com>> wrote: > > Fill up the image with big write (say 1M) first before reading and > you should see sane throughput. > > > > Thanks & Regards > > Somnath > > *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com > <mailto:ceph-users-boun...@lists.ceph.com>] *On Behalf Of *V Plus > *Sent:* Sunday, December 11, 2016 5:44 PM > *To:* ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > *Subject:* [ceph-users] Ceph performance is too good (impossible..)... > > > > Hi Guys, > > we have a ceph cluster with 6 machines (6 OSD per host). > > 1. I created 2 images in Ceph, and map them to another host A > (*/outside /*the Ceph cluster). On host A, I > got *//dev/rbd0/* and*/ /dev/rbd1/*. > > 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio > job descriptions can be found below) > > */"sudo fio fioA.job -output a.txt & sudo fio fioB.job -output > b.txt & wait"/* > > 3. After the test, in a.txt, we got */bw=1162.7MB/s/*, in b.txt, > we get */bw=3579.6MB/s/*. > > The results do NOT make sense because there is only one NIC on > host A, and its limit is 10 Gbps (1.25GB/s). > > > > I suspect it is because of the cache setting. > > But I am sure that in file *//etc/ceph/ceph.conf/* on host A,I > already added: > > */[client]/* > > */rbd cache = false/* > > > > Could anyone give me a hint what is missing? why > > Thank you very much. > > > > *fioA.job:* > > /[A]/ > > /direct=1/ > > /group_reporting=1/ > > /unified_rw_reporting=1/ > > /size=100%/ > > /time_based=1/ > > /filename=/dev/rbd0/ > > /rw=read/ > > /bs=4MB/ > > /numjobs=16/ > > /ramp_time=10/ > > /runtime=20/ > > > > *fioB.job:* > > /[B]/ > > /direct=1/ > > /group_reporting=1/ > > /unified_rw_reporting=1/ > > /size=100%/ > > /time_based=1/ > > /filename=/dev/rbd1/ > > /rw=read/ > > /bs=4MB/ > > /numjobs=16/ > > /ramp_time=10/ > > /runtime=20/ > > > > /Thanks.../ > > PLEASE NOTE: The information contained in this electronic mail > message is intended only for the use of the designated > recipient(s) named above. If the reader of this message is not the > intended recipient, you are hereby notified that you have received > this message in error and that any review, dissemination, > distribution, or copying of this message is strictly prohibited. > If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and > destroy any and all copies of this message in your possession > (whether hard copies or electronically stored copies). > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
I generally do a 1M seq write to fill up the device. Block size doesn’t matter here but bigger block size is faster to fill up and that’s why people use that. From: V Plus [mailto:v.plussh...@gmail.com] Sent: Sunday, December 11, 2016 7:03 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph performance is too good (impossible..)... Thanks! One more question, what do you mean by "bigger" ? Do you mean that bigger block size (say, I will run read test with bs=4K, then I need to first write the rbd with bs>4K?)? or size that is big enough to cover the area where the test will be executed? On Sun, Dec 11, 2016 at 9:54 PM, Somnath Roy <somnath@sandisk.com<mailto:somnath@sandisk.com>> wrote: A block needs to be written before read otherwise you will get funny result. For example, in case of flash (depending on how FW is implemented) , it will mostly return you 0 if a block is not written. Now, I have seen some flash FW is really inefficient on manufacturing this data (say 0) if not written and some are really fast. So, to get predictable result you should be always reading a block that is written. In a device say half of the block is written and you are doing a full device random reads , you will get unpredictable/spiky/imbalanced result. Same with rbd as well, consider it as a storage device and behavior would be similar. So, it is always recommended to precondition (fill up) a rbd image with bigger block seq write before you do any synthetic test on that. Now, for filestore backend added advantage of preconditioning rbd will be the files in the filesystem will be created beforehand. Thanks & Regards Somnath From: V Plus [mailto:v.plussh...@gmail.com<mailto:v.plussh...@gmail.com>] Sent: Sunday, December 11, 2016 6:01 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: Re: [ceph-users] Ceph performance is too good (impossible..)... Thanks Somnath! As you recommended, I executed: dd if=/dev/zero bs=1M count=4096 of=/dev/rbd0 dd if=/dev/zero bs=1M count=4096 of=/dev/rbd1 Then the output results look more reasonable! Could you tell me why?? Btw, the purpose of my run is to test the performance of rbd in ceph. Does my case mean that before every test, I have to "initialize" all the images??? Great thanks!! On Sun, Dec 11, 2016 at 8:47 PM, Somnath Roy <somnath@sandisk.com<mailto:somnath@sandisk.com>> wrote: Fill up the image with big write (say 1M) first before reading and you should see sane throughput. Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>] On Behalf Of V Plus Sent: Sunday, December 11, 2016 5:44 PM To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: [ceph-users] Ceph performance is too good (impossible..)... Hi Guys, we have a ceph cluster with 6 machines (6 OSD per host). 1. I created 2 images in Ceph, and map them to another host A (outside the Ceph cluster). On host A, I got /dev/rbd0 and /dev/rbd1. 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job descriptions can be found below) "sudo fio fioA.job -output a.txt & sudo fio fioB.job -output b.txt & wait" 3. After the test, in a.txt, we got bw=1162.7MB/s, in b.txt, we get bw=3579.6MB/s. The results do NOT make sense because there is only one NIC on host A, and its limit is 10 Gbps (1.25GB/s). I suspect it is because of the cache setting. But I am sure that in file /etc/ceph/ceph.conf on host A,I already added: [client] rbd cache = false Could anyone give me a hint what is missing? why Thank you very much. fioA.job: [A] direct=1 group_reporting=1 unified_rw_reporting=1 size=100% time_based=1 filename=/dev/rbd0 rw=read bs=4MB numjobs=16 ramp_time=10 runtime=20 fioB.job: [B] direct=1 group_reporting=1 unified_rw_reporting=1 size=100% time_based=1 filename=/dev/rbd1 rw=read bs=4MB numjobs=16 ramp_time=10 runtime=20 Thanks... PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
Thanks! One more question, what do you mean by "bigger" ? Do you mean that bigger block size (say, I will run read test with bs=4K, then I need to first write the rbd with bs>4K?)? or size that is big enough to cover the area where the test will be executed? On Sun, Dec 11, 2016 at 9:54 PM, Somnath Roy <somnath@sandisk.com> wrote: > A block needs to be written before read otherwise you will get funny > result. For example, in case of flash (depending on how FW is implemented) > , it will mostly return you 0 if a block is not written. Now, I have seen > some flash FW is really inefficient on manufacturing this data (say 0) if > not written and some are really fast. > > So, to get predictable result you should be always reading a block that is > written. In a device say half of the block is written and you are doing a > full device random reads , you will get unpredictable/spiky/imbalanced > result. > > Same with rbd as well, consider it as a storage device and behavior would > be similar. So, it is always recommended to precondition (fill up) a rbd > image with bigger block seq write before you do any synthetic test on that. > Now, for filestore backend added advantage of preconditioning rbd will be > the files in the filesystem will be created beforehand. > > > > Thanks & Regards > > Somnath > > > > *From:* V Plus [mailto:v.plussh...@gmail.com] > *Sent:* Sunday, December 11, 2016 6:01 PM > *To:* Somnath Roy > *Cc:* ceph-users@lists.ceph.com > *Subject:* Re: [ceph-users] Ceph performance is too good (impossible..)... > > > > Thanks Somnath! > > As you recommended, I executed: > > dd if=/dev/zero bs=1M count=4096 of=/dev/rbd0 > > dd if=/dev/zero bs=1M count=4096 of=/dev/rbd1 > > > > Then the output results look more reasonable! > > Could you tell me why?? > > > > Btw, the purpose of my run is to test the performance of rbd in ceph. Does > my case mean that before every test, I have to "initialize" all the > images??? > > > > Great thanks!! > > > > On Sun, Dec 11, 2016 at 8:47 PM, Somnath Roy <somnath@sandisk.com> > wrote: > > Fill up the image with big write (say 1M) first before reading and you > should see sane throughput. > > > > Thanks & Regards > > Somnath > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *V Plus > *Sent:* Sunday, December 11, 2016 5:44 PM > *To:* ceph-users@lists.ceph.com > *Subject:* [ceph-users] Ceph performance is too good (impossible..)... > > > > Hi Guys, > > we have a ceph cluster with 6 machines (6 OSD per host). > > 1. I created 2 images in Ceph, and map them to another host A (*outside *the > Ceph cluster). On host A, I got */dev/rbd0* and* /dev/rbd1*. > > 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job > descriptions can be found below) > > *"sudo fio fioA.job -output a.txt & sudo fio fioB.job -output b.txt & > wait"* > > 3. After the test, in a.txt, we got *bw=1162.7MB/s*, in b.txt, we get > *bw=3579.6MB/s*. > > The results do NOT make sense because there is only one NIC on host A, and > its limit is 10 Gbps (1.25GB/s). > > > > I suspect it is because of the cache setting. > > But I am sure that in file */etc/ceph/ceph.conf* on host A,I already > added: > > *[client]* > > *rbd cache = false* > > > > Could anyone give me a hint what is missing? why > > Thank you very much. > > > > *fioA.job:* > > *[A]* > > *direct=1* > > *group_reporting=1* > > *unified_rw_reporting=1* > > *size=100%* > > *time_based=1* > > *filename=/dev/rbd0* > > *rw=read* > > *bs=4MB* > > *numjobs=16* > > *ramp_time=10* > > *runtime=20* > > > > *fioB.job:* > > *[B]* > > *direct=1* > > *group_reporting=1* > > *unified_rw_reporting=1* > > *size=100%* > > *time_based=1* > > *filename=/dev/rbd1* > > *rw=read* > > *bs=4MB* > > *numjobs=16* > > *ramp_time=10* > > *runtime=20* > > > > *Thanks...* > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If > the reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
A block needs to be written before read otherwise you will get funny result. For example, in case of flash (depending on how FW is implemented) , it will mostly return you 0 if a block is not written. Now, I have seen some flash FW is really inefficient on manufacturing this data (say 0) if not written and some are really fast. So, to get predictable result you should be always reading a block that is written. In a device say half of the block is written and you are doing a full device random reads , you will get unpredictable/spiky/imbalanced result. Same with rbd as well, consider it as a storage device and behavior would be similar. So, it is always recommended to precondition (fill up) a rbd image with bigger block seq write before you do any synthetic test on that. Now, for filestore backend added advantage of preconditioning rbd will be the files in the filesystem will be created beforehand. Thanks & Regards Somnath From: V Plus [mailto:v.plussh...@gmail.com] Sent: Sunday, December 11, 2016 6:01 PM To: Somnath Roy Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Ceph performance is too good (impossible..)... Thanks Somnath! As you recommended, I executed: dd if=/dev/zero bs=1M count=4096 of=/dev/rbd0 dd if=/dev/zero bs=1M count=4096 of=/dev/rbd1 Then the output results look more reasonable! Could you tell me why?? Btw, the purpose of my run is to test the performance of rbd in ceph. Does my case mean that before every test, I have to "initialize" all the images??? Great thanks!! On Sun, Dec 11, 2016 at 8:47 PM, Somnath Roy <somnath@sandisk.com<mailto:somnath@sandisk.com>> wrote: Fill up the image with big write (say 1M) first before reading and you should see sane throughput. Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>] On Behalf Of V Plus Sent: Sunday, December 11, 2016 5:44 PM To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> Subject: [ceph-users] Ceph performance is too good (impossible..)... Hi Guys, we have a ceph cluster with 6 machines (6 OSD per host). 1. I created 2 images in Ceph, and map them to another host A (outside the Ceph cluster). On host A, I got /dev/rbd0 and /dev/rbd1. 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job descriptions can be found below) "sudo fio fioA.job -output a.txt & sudo fio fioB.job -output b.txt & wait" 3. After the test, in a.txt, we got bw=1162.7MB/s, in b.txt, we get bw=3579.6MB/s. The results do NOT make sense because there is only one NIC on host A, and its limit is 10 Gbps (1.25GB/s). I suspect it is because of the cache setting. But I am sure that in file /etc/ceph/ceph.conf on host A,I already added: [client] rbd cache = false Could anyone give me a hint what is missing? why Thank you very much. fioA.job: [A] direct=1 group_reporting=1 unified_rw_reporting=1 size=100% time_based=1 filename=/dev/rbd0 rw=read bs=4MB numjobs=16 ramp_time=10 runtime=20 fioB.job: [B] direct=1 group_reporting=1 unified_rw_reporting=1 size=100% time_based=1 filename=/dev/rbd1 rw=read bs=4MB numjobs=16 ramp_time=10 runtime=20 Thanks... PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
Thanks. then how can we avoid this if I want to test the ceph rbd performance. BTW, it seems not the case. I followed what Somnath said, and got reasonable results. But I am still confused. On Sun, Dec 11, 2016 at 8:59 PM, JiaJia Zhong <zhongjia...@haomaiyi.com> wrote: > >> 3. After the test, in a.txt, we got *bw=1162.7MB/s*, in b.txt, we get > *bw=3579.6MB/s*. > > mostly, due to your kernel buffer of client host > > > -- Original -- > *From: * "Somnath Roy"<somnath@sandisk.com>; > *Date: * Mon, Dec 12, 2016 09:47 AM > *To: * "V Plus"<v.plussh...@gmail.com>; "CEPH list"<ceph-us...@lists.ceph. > com>; > *Subject: * Re: [ceph-users] Ceph performance is too good > (impossible..)... > > > Fill up the image with big write (say 1M) first before reading and you > should see sane throughput. > > > > Thanks & Regards > > Somnath > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *V Plus > *Sent:* Sunday, December 11, 2016 5:44 PM > *To:* ceph-users@lists.ceph.com > *Subject:* [ceph-users] Ceph performance is too good (impossible..)... > > > > Hi Guys, > > we have a ceph cluster with 6 machines (6 OSD per host). > > 1. I created 2 images in Ceph, and map them to another host A (*outside *the > Ceph cluster). On host A, I got */dev/rbd0* and* /dev/rbd1*. > > 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job > descriptions can be found below) > > *"sudo fio fioA.job -output a.txt & sudo fio fioB.job -output b.txt & > wait"* > > 3. After the test, in a.txt, we got *bw=1162.7MB/s*, in b.txt, we get > *bw=3579.6MB/s*. > > The results do NOT make sense because there is only one NIC on host A, and > its limit is 10 Gbps (1.25GB/s). > > > > I suspect it is because of the cache setting. > > But I am sure that in file */etc/ceph/ceph.conf* on host A,I already > added: > > *[client]* > > *rbd cache = false* > > > > Could anyone give me a hint what is missing? why > > Thank you very much. > > > > *fioA.job:* > > *[A]* > > *direct=1* > > *group_reporting=1* > > *unified_rw_reporting=1* > > *size=100%* > > *time_based=1* > > *filename=/dev/rbd0* > > *rw=read* > > *bs=4MB* > > *numjobs=16* > > *ramp_time=10* > > *runtime=20* > > > > *fioB.job:* > > *[B]* > > *direct=1* > > *group_reporting=1* > > *unified_rw_reporting=1* > > *size=100%* > > *time_based=1* > > *filename=/dev/rbd1* > > *rw=read* > > *bs=4MB* > > *numjobs=16* > > *ramp_time=10* > > *runtime=20* > > > > *Thanks...* > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If > the reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
Thanks Somnath! As you recommended, I executed: dd if=/dev/zero bs=1M count=4096 of=/dev/rbd0 dd if=/dev/zero bs=1M count=4096 of=/dev/rbd1 Then the output results look more reasonable! Could you tell me why?? Btw, the purpose of my run is to test the performance of rbd in ceph. Does my case mean that before every test, I have to "initialize" all the images??? Great thanks!! On Sun, Dec 11, 2016 at 8:47 PM, Somnath Roy <somnath@sandisk.com> wrote: > Fill up the image with big write (say 1M) first before reading and you > should see sane throughput. > > > > Thanks & Regards > > Somnath > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *V Plus > *Sent:* Sunday, December 11, 2016 5:44 PM > *To:* ceph-users@lists.ceph.com > *Subject:* [ceph-users] Ceph performance is too good (impossible..)... > > > > Hi Guys, > > we have a ceph cluster with 6 machines (6 OSD per host). > > 1. I created 2 images in Ceph, and map them to another host A (*outside *the > Ceph cluster). On host A, I got */dev/rbd0* and* /dev/rbd1*. > > 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job > descriptions can be found below) > > *"sudo fio fioA.job -output a.txt & sudo fio fioB.job -output b.txt & > wait"* > > 3. After the test, in a.txt, we got *bw=1162.7MB/s*, in b.txt, we get > *bw=3579.6MB/s*. > > The results do NOT make sense because there is only one NIC on host A, and > its limit is 10 Gbps (1.25GB/s). > > > > I suspect it is because of the cache setting. > > But I am sure that in file */etc/ceph/ceph.conf* on host A,I already > added: > > *[client]* > > *rbd cache = false* > > > > Could anyone give me a hint what is missing? why > > Thank you very much. > > > > *fioA.job:* > > *[A]* > > *direct=1* > > *group_reporting=1* > > *unified_rw_reporting=1* > > *size=100%* > > *time_based=1* > > *filename=/dev/rbd0* > > *rw=read* > > *bs=4MB* > > *numjobs=16* > > *ramp_time=10* > > *runtime=20* > > > > *fioB.job:* > > *[B]* > > *direct=1* > > *group_reporting=1* > > *unified_rw_reporting=1* > > *size=100%* > > *time_based=1* > > *filename=/dev/rbd1* > > *rw=read* > > *bs=4MB* > > *numjobs=16* > > *ramp_time=10* > > *runtime=20* > > > > *Thanks...* > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If > the reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
>> 3. After the test, in a.txt, we got bw=1162.7MB/s, in b.txt, we get >> bw=3579.6MB/s. mostly, due to your kernel buffer of client host -- Original -- From: "Somnath Roy"<somnath@sandisk.com>; Date: Mon, Dec 12, 2016 09:47 AM To: "V Plus"<v.plussh...@gmail.com>; "CEPH list"<ceph-users@lists.ceph.com>; Subject: Re: [ceph-users] Ceph performance is too good (impossible..)... Fill up the image with big write (say 1M) first before reading and you should see sane throughput. Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of V Plus Sent: Sunday, December 11, 2016 5:44 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] Ceph performance is too good (impossible..)... Hi Guys, we have a ceph cluster with 6 machines (6 OSD per host). 1. I created 2 images in Ceph, and map them to another host A (outside the Ceph cluster). On host A, I got /dev/rbd0 and /dev/rbd1. 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job descriptions can be found below) "sudo fio fioA.job -output a.txt & sudo fio fioB.job -output b.txt & wait" 3. After the test, in a.txt, we got bw=1162.7MB/s, in b.txt, we get bw=3579.6MB/s. The results do NOT make sense because there is only one NIC on host A, and its limit is 10 Gbps (1.25GB/s). I suspect it is because of the cache setting. But I am sure that in file /etc/ceph/ceph.conf on host A,I already added: [client] rbd cache = false Could anyone give me a hint what is missing? why Thank you very much. fioA.job: [A] direct=1 group_reporting=1 unified_rw_reporting=1 size=100% time_based=1 filename=/dev/rbd0 rw=read bs=4MB numjobs=16 ramp_time=10 runtime=20 fioB.job: [B] direct=1 group_reporting=1 unified_rw_reporting=1 size=100% time_based=1 filename=/dev/rbd1 rw=read bs=4MB numjobs=16 ramp_time=10 runtime=20 Thanks... PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance is too good (impossible..)...
Fill up the image with big write (say 1M) first before reading and you should see sane throughput. Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of V Plus Sent: Sunday, December 11, 2016 5:44 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] Ceph performance is too good (impossible..)... Hi Guys, we have a ceph cluster with 6 machines (6 OSD per host). 1. I created 2 images in Ceph, and map them to another host A (outside the Ceph cluster). On host A, I got /dev/rbd0 and /dev/rbd1. 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job descriptions can be found below) "sudo fio fioA.job -output a.txt & sudo fio fioB.job -output b.txt & wait" 3. After the test, in a.txt, we got bw=1162.7MB/s, in b.txt, we get bw=3579.6MB/s. The results do NOT make sense because there is only one NIC on host A, and its limit is 10 Gbps (1.25GB/s). I suspect it is because of the cache setting. But I am sure that in file /etc/ceph/ceph.conf on host A,I already added: [client] rbd cache = false Could anyone give me a hint what is missing? why Thank you very much. fioA.job: [A] direct=1 group_reporting=1 unified_rw_reporting=1 size=100% time_based=1 filename=/dev/rbd0 rw=read bs=4MB numjobs=16 ramp_time=10 runtime=20 fioB.job: [B] direct=1 group_reporting=1 unified_rw_reporting=1 size=100% time_based=1 filename=/dev/rbd1 rw=read bs=4MB numjobs=16 ramp_time=10 runtime=20 Thanks... PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph performance is too good (impossible..)...
Hi Guys, we have a ceph cluster with 6 machines (6 OSD per host). 1. I created 2 images in Ceph, and map them to another host A (*outside *the Ceph cluster). On host A, I got */dev/rbd0* and* /dev/rbd1*. 2. I start two fio job to perform READ test on rbd0 and rbd1. (fio job descriptions can be found below) *"sudo fio fioA.job -output a.txt & sudo fio fioB.job -output b.txt & wait"* 3. After the test, in a.txt, we got *bw=1162.7MB/s*, in b.txt, we get *bw=3579.6MB/s*. The results do NOT make sense because there is only one NIC on host A, and its limit is 10 Gbps (1.25GB/s). I suspect it is because of the cache setting. But I am sure that in file */etc/ceph/ceph.conf* on host A,I already added: *[client]* *rbd cache = false* Could anyone give me a hint what is missing? why Thank you very much. *fioA.job:* *[A]* *direct=1* *group_reporting=1* *unified_rw_reporting=1* *size=100%* *time_based=1* *filename=/dev/rbd0* *rw=read* *bs=4MB* *numjobs=16* *ramp_time=10* *runtime=20* *fioB.job:* *[B]* *direct=1* *group_reporting=1* *unified_rw_reporting=1* *size=100%* *time_based=1* *filename=/dev/rbd1* *rw=read* *bs=4MB* *numjobs=16* *ramp_time=10* *runtime=20* *Thanks...* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com