On Wed, Feb 6, 2019 at 7:00 AM Poornima Gurusiddaiah <pguru...@redhat.com> wrote:
> > > On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez <xhernan...@redhat.com wrote: > >> On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez <xhernan...@redhat.com> >> wrote: >> >>> On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah < >>> pguru...@redhat.com> wrote: >>> >>>> Can the threads be categorised to do certain kinds of fops? >>>> >>> >>> Could be, but creating multiple thread groups for different tasks is >>> generally bad because many times you end up with lots of idle threads which >>> waste resources and could increase contention. I think we should only >>> differentiate threads if it's absolutely necessary. >>> >>> >>>> Read/write affinitise to certain set of threads, the other metadata >>>> fops to other set of threads. So we limit the read/write threads and not >>>> the metadata threads? Also if aio is enabled in the backend the threads >>>> will not be blocked on disk IO right? >>>> >>> >>> If we don't block the thread but we don't prevent more requests to go to >>> the disk, then we'll probably have the same problem. Anyway, I'll try to >>> run some tests with AIO to see if anything changes. >>> >> >> I've run some simple tests with AIO enabled and results are not good. A >> simple dd takes >25% more time. Multiple parallel dd take 35% more time to >> complete. >> > > > Thank you. That is strange! Had few questions, what tests are you running > for measuring the io-threads performance(not particularly aoi)? is it dd > from multiple clients? > Yes, it's a bit strange. What I see is that many threads from the thread pool are active but using very little CPU. I also see an AIO thread for each brick, but its CPU usage is not big either. Wait time is always 0 (I think this is a side effect of AIO activity). However system load grows very high. I've seen around 50, while on the normal test without AIO it's stays around 20-25. Right now I'm running the tests on a single machine (no real network communication) using an NVMe disk as storage. I use a single mount point. The tests I'm running are these: - Single dd, 128 GiB, blocks of 1MiB - 16 parallel dd, 8 GiB per dd, blocks of 1MiB - fio in sequential write mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file - fio in sequential read mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file - fio in random write mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file - fio in random read mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file - smallfile create, 16 threads, 256 files per thread, 32 MiB per file (with one brick down, for the following test) - self-heal of an entire brick (from the previous smallfile test) - pgbench init phase with scale 100 I run all these tests for a replica 3 volume and a disperse 4+2 volume. Xavi > Regards, > Poornima > > >> Xavi >> >> >>> All this is based on the assumption that large number of parallel read >>>> writes make the disk perf bad but not the large number of dentry and >>>> metadata ops. Is that true? >>>> >>> >>> It depends. If metadata is not cached, it's as bad as a read or write >>> since it requires a disk access (a clear example of this is the bad >>> performance of 'ls' in cold cache, which is basically metadata reads). In >>> fact, cached data reads are also very fast, and data writes could go to the >>> cache and be updated later in background, so I think the important point is >>> if things are cached or not, instead of if they are data or metadata. Since >>> we don't have this information from the user side, it's hard to tell what's >>> better. My opinion is that we shouldn't differentiate requests of >>> data/metadata. If metadata requests happen to be faster, then that thread >>> will be able to handle other requests immediately, which seems good enough. >>> >>> However there's one thing that I would do. I would differentiate reads >>> (data or metadata) from writes. Normally writes come from cached >>> information that is flushed to disk at some point, so this normally happens >>> in the background. But reads tend to be in foreground, meaning that someone >>> (user or application) is waiting for it. So I would give preference to >>> reads over writes. To do so effectively, we need to not saturate the >>> backend, otherwise when we need to send a read, it will still need to wait >>> for all pending requests to complete. If disks are not saturated, we can >>> have the answer to the read quite fast, and then continue processing the >>> remaining writes. >>> >>> Anyway, I may be wrong, since all these things depend on too many >>> factors. I haven't done any specific tests about this. It's more like a >>> brainstorming. As soon as I can I would like to experiment with this and >>> get some empirical data. >>> >>> Xavi >>> >>> >>>> Thanks, >>>> Poornima >>>> >>>> >>>> On Fri, Feb 1, 2019, 5:34 PM Emmanuel Dreyfus <m...@netbsd.org wrote: >>>> >>>>> On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote: >>>>> > Perhaps we could throttle both aspects - number of I/O requests per >>>>> disk >>>>> >>>>> While there it would be nice to detect and report a disk with lower >>>>> than >>>>> peer performance: that happen sometimes when a disk is dying, and last >>>>> time I was hit by that performance problem, I had a hard time finding >>>>> the culprit. >>>>> >>>>> -- >>>>> Emmanuel Dreyfus >>>>> m...@netbsd.org >>>>> _______________________________________________ >>>>> Gluster-devel mailing list >>>>> Gluster-devel@gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>> >>>>
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel