[ceph-users] Re: read latency

2020-11-02 Thread Tony Liu
Thanks Vladimir for the clarification!

Tony
> -Original Message-
> From: Vladimir Prokofev 
> Sent: Monday, November 2, 2020 3:46 AM
> Cc: ceph-users 
> Subject: [ceph-users] Re: read latency
> 
> With sequential read you get "read ahead" mechanics attached which helps
> a lot.
> So let's say you do 4KB seq reads with fio.
> By default, Ubuntu, for example, has 128KB read ahead size. That means
> when you request that 4KB of data, driver will actually request 128KB.
> When your IO is served, and you request next seq 4KB, they're already in
> VMs memory, so no new read IO is necessary.
> All those 128KB will likely reside on the same OSD, depending on your
> CEPH object size.
> When you'll reach the end of that 128KB of data, and request next - once
> again it will likely reside in the same rbd object as before, assuming
> 4MB object size, so depending on the internal mechanics which I'm not
> really familiar with, that data can be either in the hosts memory, or at
> least in osd node memory, so no real physical IO will be necessary.
> What you're thinking about is the worst case scenario - when that 128KB
> is split between 2 objects residing on 2 different osds - well, you just
> get 2 real physical IO for your 1 virtual, and in that moment you'll
> have slower request, but after that you get read ahead to help for a lot
> of seq IOs.
> In the end, read ahead with sequential IOs leads to way way less real
> physical reads than random read, hence the IOPS difference.
> 
> пн, 2 нояб. 2020 г. в 06:20, Tony Liu :
> 
> > Another confusing about read vs. random read. My understanding is
> > that, when fio does read, it reads from the test file sequentially.
> > When it does random read, it reads from the test file randomly.
> > That file read inside VM comes down to volume read handed by RBD
> > client who distributes read to PG and eventually to OSD. So a file
> > sequential read inside VM won't be a sequential read on OSD disk.
> > Is that right?
> > Then what difference seq. and rand. read make on OSD disk?
> > Is it rand. read on OSD disk for both cases?
> > Then how to explain the performance difference between seq. and rand.
> > read inside VM? (seq. read IOPS is 20x than rand. read, Ceph is with
> > 21 HDDs on 3 nodes, 7 on each)
> >
> > Thanks!
> > Tony
> > > -Original Message-
> > > From: Vladimir Prokofev 
> > > Sent: Sunday, November 1, 2020 5:58 PM
> > > Cc: ceph-users 
> > > Subject: [ceph-users] Re: read latency
> > >
> > > Not exactly. You can also tune network/software.
> > > Network - go for lower latency interfaces. If you have 10G go to 25G
> > > or 100G. 40G will not do though, afaik they're just 4x10G so their
> > > latency is the same as in 10G.
> > > Software - it's closely tied to your network card queues and
> > > processor cores. In short - tune affinity so that the packet receive
> > > queues and osds processes run on the same corresponding cores.
> > > Disabling process power saving features helps a lot. Also watch out
> for NUMA interference.
> > > But overall all these tricks will save you less than switching from
> > > HDD to SSD.
> > >
> > > пн, 2 нояб. 2020 г. в 02:45, Tony Liu :
> > >
> > > > Hi,
> > > >
> > > > AWIK, the read latency primarily depends on HW latency, not much
> > > > can be tuned in SW. Is that right?
> > > >
> > > > I ran a fio random read with iodepth 1 within a VM backed by Ceph
> > > > with HDD OSD and here is what I got.
> > > > =
> > > >read: IOPS=282, BW=1130KiB/s (1157kB/s)(33.1MiB/30001msec)
> > > > slat (usec): min=4, max=181, avg=14.04, stdev=10.16
> > > > clat (usec): min=178, max=393831, avg=3521.86, stdev=5771.35
> > > >  lat (usec): min=188, max=393858, avg=3536.38, stdev=5771.51
> > > > = I checked HDD average latency is 2.9 ms. Looks
> > > > like the test result makes perfect sense, isn't it?
> > > >
> > > > If I want to get shorter latency (more IOPS), I will have to go
> > > > for better disk, eg. SSD. Right?
> > > >
> > > >
> > > > Thanks!
> > > > Tony
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send
> > > > an email to ceph-users-le...@ceph.io
> > > >
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > > email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: read latency

2020-11-02 Thread Vladimir Prokofev
With sequential read you get "read ahead" mechanics attached which helps a
lot.
So let's say you do 4KB seq reads with fio.
By default, Ubuntu, for example, has 128KB read ahead size. That means when
you request that 4KB of data, driver will actually request 128KB. When your
IO is served, and you request next seq 4KB, they're already in VMs memory,
so no new read IO is necessary.
All those 128KB will likely reside on the same OSD, depending on your CEPH
object size.
When you'll reach the end of that 128KB of data, and request next - once
again it will likely reside in the same rbd object as before, assuming 4MB
object size, so depending on the internal mechanics which I'm not really
familiar with, that data can be either in the hosts memory, or at least in
osd node memory, so no real physical IO will be necessary.
What you're thinking about is the worst case scenario - when that 128KB is
split between 2 objects residing on 2 different osds - well, you just get 2
real physical IO for your 1 virtual, and in that moment you'll have slower
request, but after that you get read ahead to help for a lot of seq IOs.
In the end, read ahead with sequential IOs leads to way way less real
physical reads than random read, hence the IOPS difference.

пн, 2 нояб. 2020 г. в 06:20, Tony Liu :

> Another confusing about read vs. random read. My understanding is
> that, when fio does read, it reads from the test file sequentially.
> When it does random read, it reads from the test file randomly.
> That file read inside VM comes down to volume read handed by RBD
> client who distributes read to PG and eventually to OSD. So a file
> sequential read inside VM won't be a sequential read on OSD disk.
> Is that right?
> Then what difference seq. and rand. read make on OSD disk?
> Is it rand. read on OSD disk for both cases?
> Then how to explain the performance difference between seq. and rand.
> read inside VM? (seq. read IOPS is 20x than rand. read, Ceph is
> with 21 HDDs on 3 nodes, 7 on each)
>
> Thanks!
> Tony
> > -Original Message-
> > From: Vladimir Prokofev 
> > Sent: Sunday, November 1, 2020 5:58 PM
> > Cc: ceph-users 
> > Subject: [ceph-users] Re: read latency
> >
> > Not exactly. You can also tune network/software.
> > Network - go for lower latency interfaces. If you have 10G go to 25G or
> > 100G. 40G will not do though, afaik they're just 4x10G so their latency
> > is the same as in 10G.
> > Software - it's closely tied to your network card queues and processor
> > cores. In short - tune affinity so that the packet receive queues and
> > osds processes run on the same corresponding cores. Disabling process
> > power saving features helps a lot. Also watch out for NUMA interference.
> > But overall all these tricks will save you less than switching from HDD
> > to SSD.
> >
> > пн, 2 нояб. 2020 г. в 02:45, Tony Liu :
> >
> > > Hi,
> > >
> > > AWIK, the read latency primarily depends on HW latency, not much can
> > > be tuned in SW. Is that right?
> > >
> > > I ran a fio random read with iodepth 1 within a VM backed by Ceph with
> > > HDD OSD and here is what I got.
> > > =
> > >read: IOPS=282, BW=1130KiB/s (1157kB/s)(33.1MiB/30001msec)
> > > slat (usec): min=4, max=181, avg=14.04, stdev=10.16
> > > clat (usec): min=178, max=393831, avg=3521.86, stdev=5771.35
> > >  lat (usec): min=188, max=393858, avg=3536.38, stdev=5771.51
> > > = I checked HDD average latency is 2.9 ms. Looks like
> > > the test result makes perfect sense, isn't it?
> > >
> > > If I want to get shorter latency (more IOPS), I will have to go for
> > > better disk, eg. SSD. Right?
> > >
> > >
> > > Thanks!
> > > Tony
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > > email to ceph-users-le...@ceph.io
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: read latency

2020-11-01 Thread Tony Liu
Another confusing about read vs. random read. My understanding is
that, when fio does read, it reads from the test file sequentially.
When it does random read, it reads from the test file randomly.
That file read inside VM comes down to volume read handed by RBD
client who distributes read to PG and eventually to OSD. So a file
sequential read inside VM won't be a sequential read on OSD disk.
Is that right?
Then what difference seq. and rand. read make on OSD disk?
Is it rand. read on OSD disk for both cases?
Then how to explain the performance difference between seq. and rand.
read inside VM? (seq. read IOPS is 20x than rand. read, Ceph is
with 21 HDDs on 3 nodes, 7 on each)

Thanks!
Tony
> -Original Message-
> From: Vladimir Prokofev 
> Sent: Sunday, November 1, 2020 5:58 PM
> Cc: ceph-users 
> Subject: [ceph-users] Re: read latency
> 
> Not exactly. You can also tune network/software.
> Network - go for lower latency interfaces. If you have 10G go to 25G or
> 100G. 40G will not do though, afaik they're just 4x10G so their latency
> is the same as in 10G.
> Software - it's closely tied to your network card queues and processor
> cores. In short - tune affinity so that the packet receive queues and
> osds processes run on the same corresponding cores. Disabling process
> power saving features helps a lot. Also watch out for NUMA interference.
> But overall all these tricks will save you less than switching from HDD
> to SSD.
> 
> пн, 2 нояб. 2020 г. в 02:45, Tony Liu :
> 
> > Hi,
> >
> > AWIK, the read latency primarily depends on HW latency, not much can
> > be tuned in SW. Is that right?
> >
> > I ran a fio random read with iodepth 1 within a VM backed by Ceph with
> > HDD OSD and here is what I got.
> > =
> >read: IOPS=282, BW=1130KiB/s (1157kB/s)(33.1MiB/30001msec)
> > slat (usec): min=4, max=181, avg=14.04, stdev=10.16
> > clat (usec): min=178, max=393831, avg=3521.86, stdev=5771.35
> >  lat (usec): min=188, max=393858, avg=3536.38, stdev=5771.51
> > = I checked HDD average latency is 2.9 ms. Looks like
> > the test result makes perfect sense, isn't it?
> >
> > If I want to get shorter latency (more IOPS), I will have to go for
> > better disk, eg. SSD. Right?
> >
> >
> > Thanks!
> > Tony
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: read latency

2020-11-01 Thread Vladimir Prokofev
Not exactly. You can also tune network/software.
Network - go for lower latency interfaces. If you have 10G go to 25G or
100G. 40G will not do though, afaik they're just 4x10G so their latency is
the same as in 10G.
Software - it's closely tied to your network card queues and processor
cores. In short - tune affinity so that the packet receive queues and osds
processes run on the same corresponding cores. Disabling process power
saving features helps a lot. Also watch out for NUMA interference.
But overall all these tricks will save you less than switching from HDD to
SSD.

пн, 2 нояб. 2020 г. в 02:45, Tony Liu :

> Hi,
>
> AWIK, the read latency primarily depends on HW latency,
> not much can be tuned in SW. Is that right?
>
> I ran a fio random read with iodepth 1 within a VM backed by
> Ceph with HDD OSD and here is what I got.
> =
>read: IOPS=282, BW=1130KiB/s (1157kB/s)(33.1MiB/30001msec)
> slat (usec): min=4, max=181, avg=14.04, stdev=10.16
> clat (usec): min=178, max=393831, avg=3521.86, stdev=5771.35
>  lat (usec): min=188, max=393858, avg=3536.38, stdev=5771.51
> =
> I checked HDD average latency is 2.9 ms. Looks like the test
> result makes perfect sense, isn't it?
>
> If I want to get shorter latency (more IOPS), I will have to go
> for better disk, eg. SSD. Right?
>
>
> Thanks!
> Tony
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io