I defer to those with more operational experience of ken and smoosh but
wouldn't those new subsystems radically impact performance if IOQ is
completely bypassed (assuming ken/smoosh are enabled by default)?

On Wed, 11 Sep 2019 at 22:04, Adam Kocoloski <kocol...@apache.org> wrote:

> A few months ago a bunch of code landed on master around IO QoS and
> prioritization. I think we need to have a conversation about the defaults
> for that system and what we want to allow users to enable.
>
> First topic - there are actually two different generations of the IOQ
> system: IOQ and IOQ2. Only one can be active at a given time, and the
> configurations are not compatible. The best use case for this queueing
> system is to de-prioritize IO for bookkeeping tasks like internal
> replication and compaction in favor of IO to respond to client requests.
>
> The original and currently default IOQ system primarily works by
> classifying the IO based on whether it’s serving an interactive read or
> write request, an index build, a compaction job, etc. It builds queues for
> each of these IO classes and allows for relative prioritization of the
> different classes of IO. The main downside of this system is that it can
> only sustain a total throughput of about 20,000 operations/sec/node.
> Heavily-loaded systems frequently have to configure “bypasses” for certain
> classes of IO to keep latencies low.
>
> IOQ2 was conceived to deliver higher throughput without resorting to
> bypasses and thus defeating the QoS. It’s a significantly more complex
> system. Tenants are a first-class concept in IOQ2, but of course they’re
> not in the rest of the CouchDB, so some of the code in there that computes
> per-user priorities will not work correctly. As far as I can tell it will
> fail gracefully (i.e., it will bucket every database as belonging to the
> same “user”), but I doubt this has been tested. IOQ2 definitely can sustain
> higher throughputs, though it has been known to enqueue so many more IO
> requests than it can issue that it effectively led to an outage anyway. It
> is still a material overhead compared to bypassing the QoS entirely.
>
> I think there are a few possible paths forward:
>
> 1) Switch to IOQ2 and only document that one.
> 2) Document IOQ, installing bypasses across the board by default to avoid
> a big performance regression on upgrade
> 3) Just bypass the whole thing and don’t document it, to avoid introducing
> a big new admin capability in 3.0 and removing it in 4.0
>
> Personally I think I’m leaning towards 3) at this point, but could be
> convinced otherwise. Regards,
>
> Adam

Reply via email to