Re: Review Request 58355: Removed unnecessary Registry copying.

2017-04-19 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58355/#review172422
---


Ship it!




Looks great, nice work Ilya! I'll make a few very minor edits based on the 
comments below and get this committed for you.


src/master/registrar.cpp
Line 22 (original)


Seems like this should be including  since we 
generally don't rely on transitives includes?



src/master/registrar.cpp
Lines 375 (patched)


Maybe s/value/deserialized/?

As an aside, it seems like the use of `::protobuf::deserialize` also incurs 
a copy due to immovability into the Try:

```
template 
Try deserialize(const std::string& value)
{
  T t;
  (void) static_cast(&t);

  google::protobuf::io::ArrayInputStream stream(value.data(), value.size());
  if (!t.ParseFromZeroCopyStream(&stream)) {
return Error("Failed to deserialize " + t.GetDescriptor()->full_name());
  }
  return t; // Try needs to implicitly be constructed.
}
```

I suppose we could avoid this with a TODO if you want to speed up the 
recovery path, not sure how much of a benefit we'll get:

```
template 
Try deserialize(const std::string& value)
{
  T t;
  (void) static_cast(&t);

  google::protobuf::io::ArrayInputStream stream(value.data(), value.size());
  if (!t.ParseFromZeroCopyStream(&stream)) {
return Error("Failed to deserialize " + t.GetDescriptor()->full_name());
  }
  
  // Since protobuf currently does not support move semantics,
  // we use Swap here to avoid copying `t` into the returned `Try`.
  Try result;
  result->Swap(&t);
  return result;
}
```



src/master/registrar.cpp
Lines 495 (patched)


How about s/value/serialized/?



src/tests/mock_registrar.cpp
Line 21 (original)


Seems like this include should be of  now? Or 
forward declare it?


- Benjamin Mahler


On April 18, 2017, 9:11 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58355/
> ---
> 
> (Updated April 18, 2017, 9:11 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7376
> https://issues.apache.org/jira/browse/MESOS-7376
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When the number of agents is large every `Registry` copy operation takes
> a lot of time (~0.4 sec with 55k agents), because it involves deep
> copying a big object tree. Because of that, the use of `protobuf::State`
> in `Registrar` incurs a dramatic performance cost from multiple protobuf
> copying.
> 
> This patch drops the use of `protobuf::State` in `Registrar` in favor of
> "untyped" `State` and manual serialization/deserialization in order to
> minimize `Registry` copying and keep registry update timings at
> acceptable values.
> 
> Performance improvements to `protobuf::State` should be explored in
> order to make it usable in the registrar without regressing on the
> performance of this approach.
> 
> 
> Diffs
> -
> 
>   src/master/main.cpp 90d159e248924f6806271dda1ced6d37dfa5f5c9 
>   src/master/registrar.hpp a70132b0901aedf2c562a561a9d140b58fdfb4a0 
>   src/master/registrar.cpp 0029cc77628b5bb2a7b1ff551fb42b3eaf7b4fb1 
>   src/tests/cluster.hpp 250b12fcffd035834817ff6060eb80c2cf3e0246 
>   src/tests/cluster.cpp 02590a27aadd0583177f21a57ec3d05fe1542f42 
>   src/tests/mock_registrar.hpp cdcc6990a183e8c4911d3353584ed808ee88dd72 
>   src/tests/mock_registrar.cpp 8643e4cb06cef44fa0e342df0aa0f00823e1e268 
>   src/tests/registrar_tests.cpp 5c6fb565ee48378224bab091ccdf9d5d8f3c7614 
> 
> 
> Diff: https://reviews.apache.org/r/58355/diff/2/
> 
> 
> Testing
> ---
> 
> Benchmark
> =
> Ran `MESOS_VERBOSE=1 ./src/mesos-tests --benchmark 
> --gtest_filter='*Registrar_BENCHMARK_Test.Performance/3'`.
> 
> Before:
> ```
> I0411 10:04:11.726016 11802 registrar.cpp:508] Successfully updated the 
> registry in 89.478144ms
> I0411 10:04:13.860827 11803 registrar.cpp:508] Successfully updated the 
> registry in 216.688896ms
> I0411 10:04:15.167768 11803 registrar.cpp:508] Successfully updated the 
> registry in 1.29364096secs
> I0411 10:04:18.967394 11803 registrar.cpp:508] Successfully updated the 
> registry in 3.696552192secs
> I0411 10:04:25.631009 11803 registrar.cpp:508] Successfully updated the 
> registry in 6.267425024secs
> I0411 10:04:42.625507 11803 registrar.cpp:508] Successfully updated the 
> registry in 15.876419072secs
> I0411 10:04:44.209377 11787 r

Re: Review Request 58355: Removed unnecessary Registry copying.

2017-04-18 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58355/#review172286
---



Patch looks great!

Reviews applied: [58355]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On April 18, 2017, 9:11 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58355/
> ---
> 
> (Updated April 18, 2017, 9:11 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7376
> https://issues.apache.org/jira/browse/MESOS-7376
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When the number of agents is large every `Registry` copy operation takes
> a lot of time (~0.4 sec with 55k agents), because it involves deep
> copying a big object tree. Because of that, the use of `protobuf::State`
> in `Registrar` incurs a dramatic performance cost from multiple protobuf
> copying.
> 
> This patch drops the use of `protobuf::State` in `Registrar` in favor of
> "untyped" `State` and manual serialization/deserialization in order to
> minimize `Registry` copying and keep registry update timings at
> acceptable values.
> 
> Performance improvements to `protobuf::State` should be explored in
> order to make it usable in the registrar without regressing on the
> performance of this approach.
> 
> 
> Diffs
> -
> 
>   src/master/main.cpp 90d159e248924f6806271dda1ced6d37dfa5f5c9 
>   src/master/registrar.hpp a70132b0901aedf2c562a561a9d140b58fdfb4a0 
>   src/master/registrar.cpp 0029cc77628b5bb2a7b1ff551fb42b3eaf7b4fb1 
>   src/tests/cluster.hpp 250b12fcffd035834817ff6060eb80c2cf3e0246 
>   src/tests/cluster.cpp 02590a27aadd0583177f21a57ec3d05fe1542f42 
>   src/tests/mock_registrar.hpp cdcc6990a183e8c4911d3353584ed808ee88dd72 
>   src/tests/mock_registrar.cpp 8643e4cb06cef44fa0e342df0aa0f00823e1e268 
>   src/tests/registrar_tests.cpp 5c6fb565ee48378224bab091ccdf9d5d8f3c7614 
> 
> 
> Diff: https://reviews.apache.org/r/58355/diff/2/
> 
> 
> Testing
> ---
> 
> Benchmark
> =
> Ran `MESOS_VERBOSE=1 ./src/mesos-tests --benchmark 
> --gtest_filter='*Registrar_BENCHMARK_Test.Performance/3'`.
> 
> Before:
> ```
> I0411 10:04:11.726016 11802 registrar.cpp:508] Successfully updated the 
> registry in 89.478144ms
> I0411 10:04:13.860827 11803 registrar.cpp:508] Successfully updated the 
> registry in 216.688896ms
> I0411 10:04:15.167768 11803 registrar.cpp:508] Successfully updated the 
> registry in 1.29364096secs
> I0411 10:04:18.967394 11803 registrar.cpp:508] Successfully updated the 
> registry in 3.696552192secs
> I0411 10:04:25.631009 11803 registrar.cpp:508] Successfully updated the 
> registry in 6.267425024secs
> I0411 10:04:42.625507 11803 registrar.cpp:508] Successfully updated the 
> registry in 15.876419072secs
> I0411 10:04:44.209377 11787 registrar_tests.cpp:1262] Admitted 5 agents 
> in 30.479743816secs
> I0411 10:05:04.446650 11820 registrar.cpp:508] Successfully updated the 
> registry in 18.338545152secs
> I0411 10:05:21.171001 11820 registrar.cpp:508] Successfully updated the 
> registry in 15.31903872secs
> I0411 10:05:37.592319 11820 registrar.cpp:508] Successfully updated the 
> registry in 14.863101952secs
> I0411 10:05:39.099174 11787 registrar_tests.cpp:1276] Marked 5 agents 
> reachable in 53.593596102secs
> ../../src/tests/registrar_tests.cpp:1287: Failure
> Failed to wait 15secs for registry
> ```
> After:
> ```
> I0411 15:19:12.228904 40643 registrar.cpp:524] Successfully updated the 
> registry in 91.262208ms
> I0411 15:19:14.543190 40660 registrar.cpp:524] Successfully updated the 
> registry in 377.45408ms
> I0411 15:19:15.707006 40660 registrar.cpp:524] Successfully updated the 
> registry in 1.138724096secs
> I0411 15:19:18.267305 40660 registrar.cpp:524] Successfully updated the 
> registry in 2.466145792secs
> I0411 15:19:19.092073 40660 registrar.cpp:524] Successfully updated the 
> registry in 523.11296ms
> I0411 15:19:20.809330 40648 registrar.cpp:524] Successfully updated the 
> registry in 892.141824ms
> I0411 15:19:21.194135 40622 registrar_tests.cpp:1262] Admitted 5 agents 
> in 6.938952085secs
> I0411 15:19:26.973904 40637 registrar.cpp:524] Successfully updated the 
> registry in 3.938064128secs
> I0411 15:19:28.631865 40637 registrar.cpp:524] Successfully updated the 
> registry in 1.116326144secs
> I0411 15:19:30.222944 40660 registrar.cpp:524] Successfully updated the 
> registry in 911.86688ms
> I0411 15:19:30.678509 40622 registrar_tests.cpp:1276] Marked 5 agents 
> reachable in 8.249523305secs
> I0411 15:19:35.138797 40645 registrar.cpp:524] Successfully updat

Re: Review Request 58355: Removed unnecessary Registry copying.

2017-04-18 Thread Ilya Pronin


> On April 18, 2017, 12:21 a.m., Benjamin Mahler wrote:
> > Thanks for taking this on Ilya! Just a few suggestions below per our 
> > offline discussion. It would be great to update the description to include 
> > the context, mainly that we avoid the `protobuf::State` wrapper in favor of 
> > performing manual serialization/deserialization in the registrar, in order 
> > to avoid expensive protobuf copying. Also that, we should explore 
> > performance improvements to `protobuf::State` in order to make it usable in 
> > the registrar without regressing on the performance of this approach.

Thanks for the review! Done.


> On April 18, 2017, 12:21 a.m., Benjamin Mahler wrote:
> > src/master/registrar.cpp
> > Line 215 (original), 217-218 (patched)
> > 
> >
> > It would be nice to store the `State` class alongside these and put the 
> > TODO here:
> > 
> > ```
> >   // TODO(): We use the "untyped" `State` class here and 
> > perform
> >   // the protobuf (de)serialization manually within the Registrar, 
> > because
> >   // the use of `protobuf::State` incurs a dramatic peformance cost from
> >   // protobuf copying. We should explore using `protobuf::State`, which 
> > will
> >   // require move support and other copy elimination to maintain the
> >   // performance of the current approach.
> >   State* state;
> >   
> >   // Per the TODO above, we store both serialized and deserialized 
> > versions
> >   // of the `Registry` protobuf. If we're able to move to 
> > `protobuf::State`,
> >   // we could just store a single `protobuf::state::Variable`.
> >   Option variable;
> >   Option registry;
> > ```
> > 
> > By the way, would be nice if these both use or don't use underscores 
> > consistently.

Done.

`registry` without underscore conflicted with private `registry()` method. 
Renamed the method.


> On April 18, 2017, 12:21 a.m., Benjamin Mahler wrote:
> > src/master/registrar.cpp
> > Line 334 (original), 337 (patched)
> > 
> >
> > Chatted with Ilya offline about using the raw untyped `State` class 
> > rather than using the protobuf class and bypassing the overridden method 
> > here.
> > 
> > We can then frame this patch as being an avoidance of the protobuf 
> > `State` wrapper due to the extra expense incurred from protobuf copying, 
> > and leave a TODO in the code that captures this for posterity.

Avoided `protobuf::State` in favor of "untyped" `State`.


> On April 18, 2017, 12:21 a.m., Benjamin Mahler wrote:
> > src/master/registrar.cpp
> > Lines 449-450 (original), 464-465 (patched)
> > 
> >
> > Should we say we use an Owned here to avoid copying, since protobuf 
> > doesn't support move construction?

Added a comment.


- Ilya


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58355/#review172147
---


On April 18, 2017, 10:11 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58355/
> ---
> 
> (Updated April 18, 2017, 10:11 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7376
> https://issues.apache.org/jira/browse/MESOS-7376
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When the number of agents is large every `Registry` copy operation takes
> a lot of time (~0.4 sec with 55k agents), because it involves deep
> copying a big object tree. Because of that, the use of `protobuf::State`
> in `Registrar` incurs a dramatic performance cost from multiple protobuf
> copying.
> 
> This patch drops the use of `protobuf::State` in `Registrar` in favor of
> "untyped" `State` and manual serialization/deserialization in order to
> minimize `Registry` copying and keep registry update timings at
> acceptable values.
> 
> Performance improvements to `protobuf::State` should be explored in
> order to make it usable in the registrar without regressing on the
> performance of this approach.
> 
> 
> Diffs
> -
> 
>   src/master/main.cpp 90d159e248924f6806271dda1ced6d37dfa5f5c9 
>   src/master/registrar.hpp a70132b0901aedf2c562a561a9d140b58fdfb4a0 
>   src/master/registrar.cpp 0029cc77628b5bb2a7b1ff551fb42b3eaf7b4fb1 
>   src/tests/cluster.hpp 250b12fcffd035834817ff6060eb80c2cf3e0246 
>   src/tests/cluster.cpp 02590a27aadd0583177f21a57ec3d05fe1542f42 
>   src/tests/mock_registrar.hpp cdcc6990a183e8c4911d3353584ed808ee88dd72 
>   src/tests/mock_registrar.cpp 8643e4cb06cef44fa0e342df0aa0f00823e1e268 
>   sr

Re: Review Request 58355: Removed unnecessary Registry copying.

2017-04-18 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58355/
---

(Updated April 18, 2017, 10:11 p.m.)


Review request for mesos and Benjamin Mahler.


Changes
---

Addressed review comments


Bugs: MESOS-7376
https://issues.apache.org/jira/browse/MESOS-7376


Repository: mesos


Description (updated)
---

When the number of agents is large every `Registry` copy operation takes
a lot of time (~0.4 sec with 55k agents), because it involves deep
copying a big object tree. Because of that, the use of `protobuf::State`
in `Registrar` incurs a dramatic performance cost from multiple protobuf
copying.

This patch drops the use of `protobuf::State` in `Registrar` in favor of
"untyped" `State` and manual serialization/deserialization in order to
minimize `Registry` copying and keep registry update timings at
acceptable values.

Performance improvements to `protobuf::State` should be explored in
order to make it usable in the registrar without regressing on the
performance of this approach.


Diffs (updated)
-

  src/master/main.cpp 90d159e248924f6806271dda1ced6d37dfa5f5c9 
  src/master/registrar.hpp a70132b0901aedf2c562a561a9d140b58fdfb4a0 
  src/master/registrar.cpp 0029cc77628b5bb2a7b1ff551fb42b3eaf7b4fb1 
  src/tests/cluster.hpp 250b12fcffd035834817ff6060eb80c2cf3e0246 
  src/tests/cluster.cpp 02590a27aadd0583177f21a57ec3d05fe1542f42 
  src/tests/mock_registrar.hpp cdcc6990a183e8c4911d3353584ed808ee88dd72 
  src/tests/mock_registrar.cpp 8643e4cb06cef44fa0e342df0aa0f00823e1e268 
  src/tests/registrar_tests.cpp 5c6fb565ee48378224bab091ccdf9d5d8f3c7614 


Diff: https://reviews.apache.org/r/58355/diff/2/

Changes: https://reviews.apache.org/r/58355/diff/1-2/


Testing
---

Benchmark
=
Ran `MESOS_VERBOSE=1 ./src/mesos-tests --benchmark 
--gtest_filter='*Registrar_BENCHMARK_Test.Performance/3'`.

Before:
```
I0411 10:04:11.726016 11802 registrar.cpp:508] Successfully updated the 
registry in 89.478144ms
I0411 10:04:13.860827 11803 registrar.cpp:508] Successfully updated the 
registry in 216.688896ms
I0411 10:04:15.167768 11803 registrar.cpp:508] Successfully updated the 
registry in 1.29364096secs
I0411 10:04:18.967394 11803 registrar.cpp:508] Successfully updated the 
registry in 3.696552192secs
I0411 10:04:25.631009 11803 registrar.cpp:508] Successfully updated the 
registry in 6.267425024secs
I0411 10:04:42.625507 11803 registrar.cpp:508] Successfully updated the 
registry in 15.876419072secs
I0411 10:04:44.209377 11787 registrar_tests.cpp:1262] Admitted 5 agents in 
30.479743816secs
I0411 10:05:04.446650 11820 registrar.cpp:508] Successfully updated the 
registry in 18.338545152secs
I0411 10:05:21.171001 11820 registrar.cpp:508] Successfully updated the 
registry in 15.31903872secs
I0411 10:05:37.592319 11820 registrar.cpp:508] Successfully updated the 
registry in 14.863101952secs
I0411 10:05:39.099174 11787 registrar_tests.cpp:1276] Marked 5 agents 
reachable in 53.593596102secs
../../src/tests/registrar_tests.cpp:1287: Failure
Failed to wait 15secs for registry
```
After:
```
I0411 15:19:12.228904 40643 registrar.cpp:524] Successfully updated the 
registry in 91.262208ms
I0411 15:19:14.543190 40660 registrar.cpp:524] Successfully updated the 
registry in 377.45408ms
I0411 15:19:15.707006 40660 registrar.cpp:524] Successfully updated the 
registry in 1.138724096secs
I0411 15:19:18.267305 40660 registrar.cpp:524] Successfully updated the 
registry in 2.466145792secs
I0411 15:19:19.092073 40660 registrar.cpp:524] Successfully updated the 
registry in 523.11296ms
I0411 15:19:20.809330 40648 registrar.cpp:524] Successfully updated the 
registry in 892.141824ms
I0411 15:19:21.194135 40622 registrar_tests.cpp:1262] Admitted 5 agents in 
6.938952085secs
I0411 15:19:26.973904 40637 registrar.cpp:524] Successfully updated the 
registry in 3.938064128secs
I0411 15:19:28.631865 40637 registrar.cpp:524] Successfully updated the 
registry in 1.116326144secs
I0411 15:19:30.222944 40660 registrar.cpp:524] Successfully updated the 
registry in 911.86688ms
I0411 15:19:30.678509 40622 registrar_tests.cpp:1276] Marked 5 agents 
reachable in 8.249523305secs
I0411 15:19:35.138797 40645 registrar.cpp:524] Successfully updated the 
registry in 815.439104ms
I0411 15:19:41.783651 40622 registrar_tests.cpp:1288] Recovered 5 agents 
(8238915B) in 10.963297677secs
I0411 15:19:47.431670 40657 registrar.cpp:524] Successfully updated the 
registry in 3.960920064secs
I0411 15:20:13.769872 40657 registrar.cpp:524] Successfully updated the 
registry in 1.169234944secs
I0411 15:21:49.685801 40657 registrar.cpp:524] Successfully updated the 
registry in 264.850688ms
Removed 5 agents in 2.12256788111667mins
```
Agents removal seems to be O(n^2) because of linear lookup, hence 2 mins.

Unit

Ran `make check`.


Thanks,

Ilya Pronin



Re: Review Request 58355: Removed unnecessary Registry copying.

2017-04-17 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58355/#review172166
---



Patch looks great!

Reviews applied: [58355]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On April 11, 2017, 11:33 a.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58355/
> ---
> 
> (Updated April 11, 2017, 11:33 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7376
> https://issues.apache.org/jira/browse/MESOS-7376
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When the number of agents is large every `Registry` copy operation takes
> a lot of time (~0.4 sec with 55k agents) because it involves deep
> copying a big object graph.
> 
> This patch minimizes `Registry` copying to keep registry update timings
> at acceptable values.
> 
> 
> Diffs
> -
> 
>   include/mesos/state/protobuf.hpp c15696ab79b61f5487ee4a849d62b34b91ca1614 
>   src/master/registrar.cpp 0029cc77628b5bb2a7b1ff551fb42b3eaf7b4fb1 
> 
> 
> Diff: https://reviews.apache.org/r/58355/diff/1/
> 
> 
> Testing
> ---
> 
> Benchmark
> =
> Ran `MESOS_VERBOSE=1 ./src/mesos-tests --benchmark 
> --gtest_filter='*Registrar_BENCHMARK_Test.Performance/3'`.
> 
> Before:
> ```
> I0411 10:04:11.726016 11802 registrar.cpp:508] Successfully updated the 
> registry in 89.478144ms
> I0411 10:04:13.860827 11803 registrar.cpp:508] Successfully updated the 
> registry in 216.688896ms
> I0411 10:04:15.167768 11803 registrar.cpp:508] Successfully updated the 
> registry in 1.29364096secs
> I0411 10:04:18.967394 11803 registrar.cpp:508] Successfully updated the 
> registry in 3.696552192secs
> I0411 10:04:25.631009 11803 registrar.cpp:508] Successfully updated the 
> registry in 6.267425024secs
> I0411 10:04:42.625507 11803 registrar.cpp:508] Successfully updated the 
> registry in 15.876419072secs
> I0411 10:04:44.209377 11787 registrar_tests.cpp:1262] Admitted 5 agents 
> in 30.479743816secs
> I0411 10:05:04.446650 11820 registrar.cpp:508] Successfully updated the 
> registry in 18.338545152secs
> I0411 10:05:21.171001 11820 registrar.cpp:508] Successfully updated the 
> registry in 15.31903872secs
> I0411 10:05:37.592319 11820 registrar.cpp:508] Successfully updated the 
> registry in 14.863101952secs
> I0411 10:05:39.099174 11787 registrar_tests.cpp:1276] Marked 5 agents 
> reachable in 53.593596102secs
> ../../src/tests/registrar_tests.cpp:1287: Failure
> Failed to wait 15secs for registry
> ```
> After:
> ```
> I0411 15:19:12.228904 40643 registrar.cpp:524] Successfully updated the 
> registry in 91.262208ms
> I0411 15:19:14.543190 40660 registrar.cpp:524] Successfully updated the 
> registry in 377.45408ms
> I0411 15:19:15.707006 40660 registrar.cpp:524] Successfully updated the 
> registry in 1.138724096secs
> I0411 15:19:18.267305 40660 registrar.cpp:524] Successfully updated the 
> registry in 2.466145792secs
> I0411 15:19:19.092073 40660 registrar.cpp:524] Successfully updated the 
> registry in 523.11296ms
> I0411 15:19:20.809330 40648 registrar.cpp:524] Successfully updated the 
> registry in 892.141824ms
> I0411 15:19:21.194135 40622 registrar_tests.cpp:1262] Admitted 5 agents 
> in 6.938952085secs
> I0411 15:19:26.973904 40637 registrar.cpp:524] Successfully updated the 
> registry in 3.938064128secs
> I0411 15:19:28.631865 40637 registrar.cpp:524] Successfully updated the 
> registry in 1.116326144secs
> I0411 15:19:30.222944 40660 registrar.cpp:524] Successfully updated the 
> registry in 911.86688ms
> I0411 15:19:30.678509 40622 registrar_tests.cpp:1276] Marked 5 agents 
> reachable in 8.249523305secs
> I0411 15:19:35.138797 40645 registrar.cpp:524] Successfully updated the 
> registry in 815.439104ms
> I0411 15:19:41.783651 40622 registrar_tests.cpp:1288] Recovered 5 agents 
> (8238915B) in 10.963297677secs
> I0411 15:19:47.431670 40657 registrar.cpp:524] Successfully updated the 
> registry in 3.960920064secs
> I0411 15:20:13.769872 40657 registrar.cpp:524] Successfully updated the 
> registry in 1.169234944secs
> I0411 15:21:49.685801 40657 registrar.cpp:524] Successfully updated the 
> registry in 264.850688ms
> Removed 5 agents in 2.12256788111667mins
> ```
> Agents removal seems to be O(n^2) because of linear lookup, hence 2 mins.
> 
> Unit
> 
> Ran `make check`.
> 
> 
> Thanks,
> 
> Ilya Pronin
> 
>



Re: Review Request 58355: Removed unnecessary Registry copying.

2017-04-17 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58355/#review172147
---



Thanks for taking this on Ilya! Just a few suggestions below per our offline 
discussion. It would be great to update the description to include the context, 
mainly that we avoid the `protobuf::State` wrapper in favor of performing 
manual serialization/deserialization in the registrar, in order to avoid 
expensive protobuf copying. Also that, we should explore performance 
improvements to `protobuf::State` in order to make it usable in the registrar 
without regressing on the performance of this approach.


src/master/registrar.cpp
Line 215 (original), 217-218 (patched)


It would be nice to store the `State` class alongside these and put the 
TODO here:

```
  // TODO(): We use the "untyped" `State` class here and perform
  // the protobuf (de)serialization manually within the Registrar, because
  // the use of `protobuf::State` incurs a dramatic peformance cost from
  // protobuf copying. We should explore using `protobuf::State`, which will
  // require move support and other copy elimination to maintain the
  // performance of the current approach.
  State* state;
  
  // Per the TODO above, we store both serialized and deserialized versions
  // of the `Registry` protobuf. If we're able to move to `protobuf::State`,
  // we could just store a single `protobuf::state::Variable`.
  Option variable;
  Option registry;
```

By the way, would be nice if these both use or don't use underscores 
consistently.



src/master/registrar.cpp
Line 215 (original), 217-218 (patched)






src/master/registrar.cpp
Line 334 (original), 337 (patched)


Chatted with Ilya offline about using the raw untyped `State` class rather 
than using the protobuf class and bypassing the overridden method here.

We can then frame this patch as being an avoidance of the protobuf `State` 
wrapper due to the extra expense incurred from protobuf copying, and leave a 
TODO in the code that captures this for posterity.



src/master/registrar.cpp
Lines 415 (patched)






src/master/registrar.cpp
Lines 449-450 (original), 464-465 (patched)


Should we say we use an Owned here to avoid copying, since protobuf doesn't 
support move construction?


- Benjamin Mahler


On April 11, 2017, 3:33 p.m., Ilya Pronin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58355/
> ---
> 
> (Updated April 11, 2017, 3:33 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7376
> https://issues.apache.org/jira/browse/MESOS-7376
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When the number of agents is large every `Registry` copy operation takes
> a lot of time (~0.4 sec with 55k agents) because it involves deep
> copying a big object graph.
> 
> This patch minimizes `Registry` copying to keep registry update timings
> at acceptable values.
> 
> 
> Diffs
> -
> 
>   include/mesos/state/protobuf.hpp c15696ab79b61f5487ee4a849d62b34b91ca1614 
>   src/master/registrar.cpp 0029cc77628b5bb2a7b1ff551fb42b3eaf7b4fb1 
> 
> 
> Diff: https://reviews.apache.org/r/58355/diff/1/
> 
> 
> Testing
> ---
> 
> Benchmark
> =
> Ran `MESOS_VERBOSE=1 ./src/mesos-tests --benchmark 
> --gtest_filter='*Registrar_BENCHMARK_Test.Performance/3'`.
> 
> Before:
> ```
> I0411 10:04:11.726016 11802 registrar.cpp:508] Successfully updated the 
> registry in 89.478144ms
> I0411 10:04:13.860827 11803 registrar.cpp:508] Successfully updated the 
> registry in 216.688896ms
> I0411 10:04:15.167768 11803 registrar.cpp:508] Successfully updated the 
> registry in 1.29364096secs
> I0411 10:04:18.967394 11803 registrar.cpp:508] Successfully updated the 
> registry in 3.696552192secs
> I0411 10:04:25.631009 11803 registrar.cpp:508] Successfully updated the 
> registry in 6.267425024secs
> I0411 10:04:42.625507 11803 registrar.cpp:508] Successfully updated the 
> registry in 15.876419072secs
> I0411 10:04:44.209377 11787 registrar_tests.cpp:1262] Admitted 5 agents 
> in 30.479743816secs
> I0411 10:05:04.446650 11820 registrar.cpp:508] Successfully updated the 
> registry in 18.338545152secs
> I0411 10:05:21.171001 11820 registrar.cpp:508] Successfully updated the 
> registry in 15.31903872secs
> I0411 10:05:37.592319 11820 registrar.cpp:508] Successfully updated the 
> registry in 14.86310195

Re: Review Request 58355: Removed unnecessary Registry copying.

2017-04-11 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58355/
---

(Updated April 11, 2017, 4:33 p.m.)


Review request for mesos and Benjamin Mahler.


Changes
---

Fixed a typo.


Bugs: MESOS-7376
https://issues.apache.org/jira/browse/MESOS-7376


Repository: mesos


Description
---

When the number of agents is large every `Registry` copy operation takes
a lot of time (~0.4 sec with 55k agents) because it involves deep
copying a big object graph.

This patch minimizes `Registry` copying to keep registry update timings
at acceptable values.


Diffs
-

  include/mesos/state/protobuf.hpp c15696ab79b61f5487ee4a849d62b34b91ca1614 
  src/master/registrar.cpp 0029cc77628b5bb2a7b1ff551fb42b3eaf7b4fb1 


Diff: https://reviews.apache.org/r/58355/diff/1/


Testing (updated)
---

Benchmark
=
Ran `MESOS_VERBOSE=1 ./src/mesos-tests --benchmark 
--gtest_filter='*Registrar_BENCHMARK_Test.Performance/3'`.

Before:
```
I0411 10:04:11.726016 11802 registrar.cpp:508] Successfully updated the 
registry in 89.478144ms
I0411 10:04:13.860827 11803 registrar.cpp:508] Successfully updated the 
registry in 216.688896ms
I0411 10:04:15.167768 11803 registrar.cpp:508] Successfully updated the 
registry in 1.29364096secs
I0411 10:04:18.967394 11803 registrar.cpp:508] Successfully updated the 
registry in 3.696552192secs
I0411 10:04:25.631009 11803 registrar.cpp:508] Successfully updated the 
registry in 6.267425024secs
I0411 10:04:42.625507 11803 registrar.cpp:508] Successfully updated the 
registry in 15.876419072secs
I0411 10:04:44.209377 11787 registrar_tests.cpp:1262] Admitted 5 agents in 
30.479743816secs
I0411 10:05:04.446650 11820 registrar.cpp:508] Successfully updated the 
registry in 18.338545152secs
I0411 10:05:21.171001 11820 registrar.cpp:508] Successfully updated the 
registry in 15.31903872secs
I0411 10:05:37.592319 11820 registrar.cpp:508] Successfully updated the 
registry in 14.863101952secs
I0411 10:05:39.099174 11787 registrar_tests.cpp:1276] Marked 5 agents 
reachable in 53.593596102secs
../../src/tests/registrar_tests.cpp:1287: Failure
Failed to wait 15secs for registry
```
After:
```
I0411 15:19:12.228904 40643 registrar.cpp:524] Successfully updated the 
registry in 91.262208ms
I0411 15:19:14.543190 40660 registrar.cpp:524] Successfully updated the 
registry in 377.45408ms
I0411 15:19:15.707006 40660 registrar.cpp:524] Successfully updated the 
registry in 1.138724096secs
I0411 15:19:18.267305 40660 registrar.cpp:524] Successfully updated the 
registry in 2.466145792secs
I0411 15:19:19.092073 40660 registrar.cpp:524] Successfully updated the 
registry in 523.11296ms
I0411 15:19:20.809330 40648 registrar.cpp:524] Successfully updated the 
registry in 892.141824ms
I0411 15:19:21.194135 40622 registrar_tests.cpp:1262] Admitted 5 agents in 
6.938952085secs
I0411 15:19:26.973904 40637 registrar.cpp:524] Successfully updated the 
registry in 3.938064128secs
I0411 15:19:28.631865 40637 registrar.cpp:524] Successfully updated the 
registry in 1.116326144secs
I0411 15:19:30.222944 40660 registrar.cpp:524] Successfully updated the 
registry in 911.86688ms
I0411 15:19:30.678509 40622 registrar_tests.cpp:1276] Marked 5 agents 
reachable in 8.249523305secs
I0411 15:19:35.138797 40645 registrar.cpp:524] Successfully updated the 
registry in 815.439104ms
I0411 15:19:41.783651 40622 registrar_tests.cpp:1288] Recovered 5 agents 
(8238915B) in 10.963297677secs
I0411 15:19:47.431670 40657 registrar.cpp:524] Successfully updated the 
registry in 3.960920064secs
I0411 15:20:13.769872 40657 registrar.cpp:524] Successfully updated the 
registry in 1.169234944secs
I0411 15:21:49.685801 40657 registrar.cpp:524] Successfully updated the 
registry in 264.850688ms
Removed 5 agents in 2.12256788111667mins
```
Agents removal seems to be O(n^2) because of linear lookup, hence 2 mins.

Unit

Ran `make check`.


Thanks,

Ilya Pronin



Review Request 58355: Removed unnecessary Registry copying.

2017-04-11 Thread Ilya Pronin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58355/
---

Review request for mesos and Benjamin Mahler.


Bugs: MESOS-7376
https://issues.apache.org/jira/browse/MESOS-7376


Repository: mesos


Description
---

When the number of agents is large every `Registry` copy operation takes
a lot of time (~0.4 sec with 55k agents) because it involves deep
copying a big object graph.

This patch minimizes `Registry` copying to keep registry update timings
at acceptable values.


Diffs
-

  include/mesos/state/protobuf.hpp c15696ab79b61f5487ee4a849d62b34b91ca1614 
  src/master/registrar.cpp 0029cc77628b5bb2a7b1ff551fb42b3eaf7b4fb1 


Diff: https://reviews.apache.org/r/58355/diff/1/


Testing
---

Benchmark
=
Ran `MESOS_VERBOSE=1 ./src/mesos-tests --benchmark 
--gtest_filter='*Registrar_BENCHMARK_Test.Performance/3'`.

Before:
```
I0411 10:04:11.726016 11802 registrar.cpp:508] Successfully updated the 
registry in 89.478144ms
I0411 10:04:13.860827 11803 registrar.cpp:508] Successfully updated the 
registry in 216.688896ms
I0411 10:04:15.167768 11803 registrar.cpp:508] Successfully updated the 
registry in 1.29364096secs
I0411 10:04:18.967394 11803 registrar.cpp:508] Successfully updated the 
registry in 3.696552192secs
I0411 10:04:25.631009 11803 registrar.cpp:508] Successfully updated the 
registry in 6.267425024secs
I0411 10:04:42.625507 11803 registrar.cpp:508] Successfully updated the 
registry in 15.876419072secs
I0411 10:04:44.209377 11787 registrar_tests.cpp:1262] Admitted 5 agents in 
30.479743816secs
I0411 10:05:04.446650 11820 registrar.cpp:508] Successfully updated the 
registry in 18.338545152secs
I0411 10:05:21.171001 11820 registrar.cpp:508] Successfully updated the 
registry in 15.31903872secs
I0411 10:05:37.592319 11820 registrar.cpp:508] Successfully updated the 
registry in 14.863101952secs
I0411 10:05:39.099174 11787 registrar_tests.cpp:1276] Marked 5 agents 
reachable in 53.593596102secs
../../src/tests/registrar_tests.cpp:1287: Failure
Failed to wait 15secs for registry
```
After:
```
I0411 15:19:12.228904 40643 registrar.cpp:524] Successfully updated the 
registry in 91.262208ms
I0411 15:19:14.543190 40660 registrar.cpp:524] Successfully updated the 
registry in 377.45408ms
I0411 15:19:15.707006 40660 registrar.cpp:524] Successfully updated the 
registry in 1.138724096secs
I0411 15:19:18.267305 40660 registrar.cpp:524] Successfully updated the 
registry in 2.466145792secs
I0411 15:19:19.092073 40660 registrar.cpp:524] Successfully updated the 
registry in 523.11296ms
I0411 15:19:20.809330 40648 registrar.cpp:524] Successfully updated the 
registry in 892.141824ms
I0411 15:19:21.194135 40622 registrar_tests.cpp:1262] Admitted 5 agents in 
6.938952085secs
I0411 15:19:26.973904 40637 registrar.cpp:524] Successfully updated the 
registry in 3.938064128secs
I0411 15:19:28.631865 40637 registrar.cpp:524] Successfully updated the 
registry in 1.116326144secs
I0411 15:19:30.222944 40660 registrar.cpp:524] Successfully updated the 
registry in 911.86688ms
I0411 15:19:30.678509 40622 registrar_tests.cpp:1276] Marked 5 agents 
reachable in 8.249523305secs
I0411 15:19:35.138797 40645 registrar.cpp:524] Successfully updated the 
registry in 815.439104ms
I0411 15:19:41.783651 40622 registrar_tests.cpp:1288] Recovered 5 agents 
(8238915B) in 10.963297677secs
I0411 15:19:47.431670 40657 registrar.cpp:524] Successfully updated the 
registry in 3.960920064secs
I0411 15:20:13.769872 40657 registrar.cpp:524] Successfully updated the 
registry in 1.169234944secs
I0411 15:21:49.685801 40657 registrar.cpp:524] Successfully updated the 
registry in 264.850688ms
Removed 5 agents in 2.12256788111667mins
```
Agent removal seems to be O(n^2) because of lookup, hence 2 mins.

Unit

Ran `make check`.


Thanks,

Ilya Pronin