Re: LocalConsensus

Mike Percy Tue, 07 Jun 2016 18:28:57 -0700

Yeah basically all I did was use the create-demo-table tool to create a
"twitter" table and pounded on it for about a minute with the
insert-loadgen tool from kudu-examples. Then I killed the processes,
swapped the binaries for Raft-only ones and started back up the load
generator. Everything seemed fine.


If no one has concerns I'll wrap up the patches for this and try a longer
run on a cluster with a more thorough verification step.

Mike

--
Mike Percy
Software Engineer, Cloudera


On Tue, Jun 7, 2016 at 6:08 PM, David Alves <davidral...@gmail.com> wrote:

> I think the slowdown is a reasonable tradeoff for the simplicity and code
> cleaning, so I'd be +1 on merging with it.
> How "quick" was the migration test? Was it a cluster that we'd be hammering
> with writes on local and then boot with raft? Ideally we'd do this a few
> times on a reasonably sized cluster.
>
> -david
>
> On Tue, Jun 7, 2016 at 5:56 PM, Mike Percy <mpe...@apache.org> wrote:
>
> > I did some more benchmarking yesterday and today and got the following
> > results:
> >
> > # 4 runs each of the YCSB Workload A "load" job, in QPS (inserts only)
> > local_nums = [72031.838, 73134.16462, 69772.715379, 72666.4971115]
> > raft_nums = [67037.6080981, 66876.2121313, 65779.7365522, 65876.1528327]
> >
> > Min slowdown:  3.9200241327 %
> > Max slowdown:  10.0560772192 %
> > Average slowdown:  7.66171972498 %
> >
> > So it looks like a 4-10% write slowdown on tables with replication
> disabled
> > if we remove LocalConsensus.
> >
> > FWIW, this is a pure insert workload only. When comparing performance on
> > YCSB runs with a mixed read / write workload there is essentially no
> > difference.
> >
> > Worth mentioning the settings used. Same hardware as before, with the
> > following flags:
> >
> >   ycsb_opts:
> >     recordcount:    4000000
> >     operationcount: 1000000
> >     threads:        16
> >     max_execution_time: 1800
> >     load_sync: true
> >   ts_flags:
> >     cfile_do_on_finish: "flush"
> >     flush_threshold_mb: "1000"
> >     maintenance_manager_num_threads: "2"
> >
> > (I also tuned election timeouts to be near zero to make leader election
> > instantaneous)
> >
> > I did a quick test of migrating from a version of Kudu with support for
> > LocalConsensus and a version without support for it and it worked.
> >
> > What do you guys think? Is this too large of a hit to take to remove our
> > "fake" version of Consensus?
> >
> > As mentioned previously, the drawback to keeping LocalConsensus is that
> > there is currently no way to add nodes to a system running with it. It's
> > currently the default choice for people who set replication factor to 1
> on
> > a table.
> >
> > Mike
> >
> >
> > On Thu, Jun 2, 2016 at 12:45 AM, Mike Percy <mpe...@apache.org> wrote:
> >
> > > To spare you the wall of text let me quickly summarize the scale factor
> > 10
> > > results:
> > >
> > > Insert: local avg 268 sec, raft avg 282 sec (raft has a 1% slowdown)
> but
> > > there's quite a bit of variance in there
> > > Query: local avg 13.99 sec, raft avg 13.53 sec (raft has a 3% speedup)
> > but
> > > again, there's a bit of variance
> > >
> > > Doesn't really look any different to me.
> > >
> > > Mike
> > >
> > > On Thu, Jun 2, 2016 at 12:37 AM, Mike Percy <mpe...@apache.org> wrote:
> > >
> > >> I still have to test migration (pretty sure it's a no-op though).
> > >> However, I got all tests passing with LocalConsensus disabled in
> > TabletPeer.
> > >>
> > >> To test performance, I ran TPC-H Q1 on a single node (via MiniCluster)
> > >> using the tpch.sh default settings (except for scale factor).
> > >> The summary is that the perf looks pretty similar between the two
> > >> Consensus implementations. I don't really see a major difference.
> > >>
> > >> Machine specs:
> > >>
> > >> CPU(s): 48 (4x6 core w/ HT)
> > >> RAM: 96 GB
> > >> OS: Centos 6.6 (final)
> > >> Kernel: Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09
> UTC
> > >> 2015 x86_64 x86_64 GNU/Linux
> > >>
> > >> The numbers:
> > >>
> > >> *INSERT*
> > >>
> > >> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
> > >> averages
> > >> local 1 26.557 26.557 -
> > >> raft 1 25.843 25.843 - 1.027628371
> > >> local 10 271.410
> > >> local 10 282.738
> > >> local 10 283.580 279.243 6.79634029
> > >> raft 10 281.986
> > >> raft 10 281.551
> > >> raft 10 283.049 282.195 0.7706272337 0.9895367984
> > >>
> > >> *QUERY*
> > >>
> > >> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
> > >> averages
> > >> local 1 1.281
> > >> local 1 1.325
> > >> local 1 1.340
> > >> local 1 1.280 1.31 0.03
> > >> raft 1 1.304
> > >> raft 1 1.334
> > >> raft 1 1.293
> > >> raft 1 1.331 1.32 0.02 0.9931584949
> > >> local 10 14.879
> > >> local 10 14.333
> > >> local 10 14.397
> > >> local 10 14.040
> > >> local 10 13.573
> > >> local 10 13.216
> > >> local 10 13.597
> > >> local 10 13.858 13.99 0.54
> > >> raft 10 12.455
> > >> raft 10 13.998
> > >> raft 10 13.367
> > >> raft 10 13.759
> > >> raft 10 14.301
> > >> raft 10 13.919
> > >> raft 10 13.036
> > >> raft 10 13.410 13.53 0.59 1.033701326
> > >>
> > >> Is there some other measurement I should take or does this seem
> > >> sufficient from a performance perspective?
> > >>
> > >> Thanks,
> > >> Mike
> > >>
> > >>
> > >>
> > >> On Wed, Jun 1, 2016 at 2:01 PM, Mike Percy <mpe...@apache.org> wrote:
> > >>
> > >>> I don't think we want to take much of a perf hit. I'll check it out.
> > >>>
> > >>> Another reason to have one version of Consensus is that it's
> currently
> > >>> not possible to go from 1 node to 3.
> > >>>
> > >>> MIke
> > >>>
> > >>> On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <t...@cloudera.com>
> > wrote:
> > >>>
> > >>>> I'm curious also what kind of perf impact we are willing to take for
> > the
> > >>>> un-replicated case. I think single-node Kudu performing well is
> > actually
> > >>>> nice from an adoption standpoint (many people have workloads which
> fit
> > >>>> on a
> > >>>> single machine). Would be good to have some simple verification that
> > the
> > >>>> write perf of single-node raft isn't substantially worse.
> > >>>>
> > >>>> -Todd
> > >>>>
> > >>>> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mpe...@apache.org>
> wrote:
> > >>>>
> > >>>> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves <
> davidral...@gmail.com
> > >
> > >>>> > wrote:
> > >>>> >
> > >>>> > > My (and I suspect Todd's) fear here is that we _think_ it's ok
> but
> > >>>> we're
> > >>>> > > not totally sure it works in all cases.
> > >>>> > >
> > >>>> >
> > >>>> > Yep, I'm in the same boat. I haven't seen recent evidence that it
> > >>>> doesn't
> > >>>> > work, though.
> > >>>> >
> > >>>> >
> > >>>> > > Regarding the tests, I guess just flip it and see what happens
> on
> > >>>> ctest?
> > >>>> > >
> > >>>> >
> > >>>> > Yeah, it fails of course but mostly for silly reasons related to
> > test
> > >>>> > setup. Working on that.
> > >>>> >
> > >>>> >
> > >>>> > > Regarding the upgrade path, I think we'd need to test this at
> some
> > >>>> scale,
> > >>>> > > i.e. fill up a cluster using the current version, with local
> > >>>> consensus,
> > >>>> > and
> > >>>> > > then replace the binaries with the new version, without it.
> > >>>> > >
> > >>>> >
> > >>>> > +1 SGTM. I don't mind doing that.
> > >>>> >
> > >>>> > Mike
> > >>>> >
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Todd Lipcon
> > >>>> Software Engineer, Cloudera
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
>

Re: LocalConsensus

Reply via email to