Yeah basically all I did was use the create-demo-table tool to create a "twitter" table and pounded on it for about a minute with the insert-loadgen tool from kudu-examples. Then I killed the processes, swapped the binaries for Raft-only ones and started back up the load generator. Everything seemed fine.
If no one has concerns I'll wrap up the patches for this and try a longer run on a cluster with a more thorough verification step. Mike -- Mike Percy Software Engineer, Cloudera On Tue, Jun 7, 2016 at 6:08 PM, David Alves <davidral...@gmail.com> wrote: > I think the slowdown is a reasonable tradeoff for the simplicity and code > cleaning, so I'd be +1 on merging with it. > How "quick" was the migration test? Was it a cluster that we'd be hammering > with writes on local and then boot with raft? Ideally we'd do this a few > times on a reasonably sized cluster. > > -david > > On Tue, Jun 7, 2016 at 5:56 PM, Mike Percy <mpe...@apache.org> wrote: > > > I did some more benchmarking yesterday and today and got the following > > results: > > > > # 4 runs each of the YCSB Workload A "load" job, in QPS (inserts only) > > local_nums = [72031.838, 73134.16462, 69772.715379, 72666.4971115] > > raft_nums = [67037.6080981, 66876.2121313, 65779.7365522, 65876.1528327] > > > > Min slowdown: 3.9200241327 % > > Max slowdown: 10.0560772192 % > > Average slowdown: 7.66171972498 % > > > > So it looks like a 4-10% write slowdown on tables with replication > disabled > > if we remove LocalConsensus. > > > > FWIW, this is a pure insert workload only. When comparing performance on > > YCSB runs with a mixed read / write workload there is essentially no > > difference. > > > > Worth mentioning the settings used. Same hardware as before, with the > > following flags: > > > > ycsb_opts: > > recordcount: 4000000 > > operationcount: 1000000 > > threads: 16 > > max_execution_time: 1800 > > load_sync: true > > ts_flags: > > cfile_do_on_finish: "flush" > > flush_threshold_mb: "1000" > > maintenance_manager_num_threads: "2" > > > > (I also tuned election timeouts to be near zero to make leader election > > instantaneous) > > > > I did a quick test of migrating from a version of Kudu with support for > > LocalConsensus and a version without support for it and it worked. > > > > What do you guys think? Is this too large of a hit to take to remove our > > "fake" version of Consensus? > > > > As mentioned previously, the drawback to keeping LocalConsensus is that > > there is currently no way to add nodes to a system running with it. It's > > currently the default choice for people who set replication factor to 1 > on > > a table. > > > > Mike > > > > > > On Thu, Jun 2, 2016 at 12:45 AM, Mike Percy <mpe...@apache.org> wrote: > > > > > To spare you the wall of text let me quickly summarize the scale factor > > 10 > > > results: > > > > > > Insert: local avg 268 sec, raft avg 282 sec (raft has a 1% slowdown) > but > > > there's quite a bit of variance in there > > > Query: local avg 13.99 sec, raft avg 13.53 sec (raft has a 3% speedup) > > but > > > again, there's a bit of variance > > > > > > Doesn't really look any different to me. > > > > > > Mike > > > > > > On Thu, Jun 2, 2016 at 12:37 AM, Mike Percy <mpe...@apache.org> wrote: > > > > > >> I still have to test migration (pretty sure it's a no-op though). > > >> However, I got all tests passing with LocalConsensus disabled in > > TabletPeer. > > >> > > >> To test performance, I ran TPC-H Q1 on a single node (via MiniCluster) > > >> using the tpch.sh default settings (except for scale factor). > > >> The summary is that the perf looks pretty similar between the two > > >> Consensus implementations. I don't really see a major difference. > > >> > > >> Machine specs: > > >> > > >> CPU(s): 48 (4x6 core w/ HT) > > >> RAM: 96 GB > > >> OS: Centos 6.6 (final) > > >> Kernel: Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09 > UTC > > >> 2015 x86_64 x86_64 GNU/Linux > > >> > > >> The numbers: > > >> > > >> *INSERT* > > >> > > >> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of > > >> averages > > >> local 1 26.557 26.557 - > > >> raft 1 25.843 25.843 - 1.027628371 > > >> local 10 271.410 > > >> local 10 282.738 > > >> local 10 283.580 279.243 6.79634029 > > >> raft 10 281.986 > > >> raft 10 281.551 > > >> raft 10 283.049 282.195 0.7706272337 0.9895367984 > > >> > > >> *QUERY* > > >> > > >> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of > > >> averages > > >> local 1 1.281 > > >> local 1 1.325 > > >> local 1 1.340 > > >> local 1 1.280 1.31 0.03 > > >> raft 1 1.304 > > >> raft 1 1.334 > > >> raft 1 1.293 > > >> raft 1 1.331 1.32 0.02 0.9931584949 > > >> local 10 14.879 > > >> local 10 14.333 > > >> local 10 14.397 > > >> local 10 14.040 > > >> local 10 13.573 > > >> local 10 13.216 > > >> local 10 13.597 > > >> local 10 13.858 13.99 0.54 > > >> raft 10 12.455 > > >> raft 10 13.998 > > >> raft 10 13.367 > > >> raft 10 13.759 > > >> raft 10 14.301 > > >> raft 10 13.919 > > >> raft 10 13.036 > > >> raft 10 13.410 13.53 0.59 1.033701326 > > >> > > >> Is there some other measurement I should take or does this seem > > >> sufficient from a performance perspective? > > >> > > >> Thanks, > > >> Mike > > >> > > >> > > >> > > >> On Wed, Jun 1, 2016 at 2:01 PM, Mike Percy <mpe...@apache.org> wrote: > > >> > > >>> I don't think we want to take much of a perf hit. I'll check it out. > > >>> > > >>> Another reason to have one version of Consensus is that it's > currently > > >>> not possible to go from 1 node to 3. > > >>> > > >>> MIke > > >>> > > >>> On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <t...@cloudera.com> > > wrote: > > >>> > > >>>> I'm curious also what kind of perf impact we are willing to take for > > the > > >>>> un-replicated case. I think single-node Kudu performing well is > > actually > > >>>> nice from an adoption standpoint (many people have workloads which > fit > > >>>> on a > > >>>> single machine). Would be good to have some simple verification that > > the > > >>>> write perf of single-node raft isn't substantially worse. > > >>>> > > >>>> -Todd > > >>>> > > >>>> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mpe...@apache.org> > wrote: > > >>>> > > >>>> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves < > davidral...@gmail.com > > > > > >>>> > wrote: > > >>>> > > > >>>> > > My (and I suspect Todd's) fear here is that we _think_ it's ok > but > > >>>> we're > > >>>> > > not totally sure it works in all cases. > > >>>> > > > > >>>> > > > >>>> > Yep, I'm in the same boat. I haven't seen recent evidence that it > > >>>> doesn't > > >>>> > work, though. > > >>>> > > > >>>> > > > >>>> > > Regarding the tests, I guess just flip it and see what happens > on > > >>>> ctest? > > >>>> > > > > >>>> > > > >>>> > Yeah, it fails of course but mostly for silly reasons related to > > test > > >>>> > setup. Working on that. > > >>>> > > > >>>> > > > >>>> > > Regarding the upgrade path, I think we'd need to test this at > some > > >>>> scale, > > >>>> > > i.e. fill up a cluster using the current version, with local > > >>>> consensus, > > >>>> > and > > >>>> > > then replace the binaries with the new version, without it. > > >>>> > > > > >>>> > > > >>>> > +1 SGTM. I don't mind doing that. > > >>>> > > > >>>> > Mike > > >>>> > > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Todd Lipcon > > >>>> Software Engineer, Cloudera > > >>>> > > >>> > > >>> > > >> > > > > > >