Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-21 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review205200
---


Ship it!




Ship It!

- Gilbert Song


On June 20, 2018, 1:17 p.m., bin zheng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67597/
> ---
> 
> (Updated June 20, 2018, 1:17 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8871
> https://issues.apache.org/jira/browse/MESOS-8871
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an issue where agent may fail to recover if the agent dies before image 
> store cache to the checkpoint. when images file is empty, ignore it and 
> continue.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
> 98c8fc769f2525c66539f08e2aa82506912e8a59 
>   src/tests/containerizer/provisioner_docker_tests.cpp 
> 71247c308b205de3d20a41ceb06eed6aa70fb25d 
> 
> 
> Diff: https://reviews.apache.org/r/67597/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> bin zheng
> 
>



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-20 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review205138
---



Patch looks great!

Reviews applied: [67597]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On June 20, 2018, 1:17 p.m., bin zheng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67597/
> ---
> 
> (Updated June 20, 2018, 1:17 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8871
> https://issues.apache.org/jira/browse/MESOS-8871
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an issue where agent may fail to recover if the agent dies before image 
> store cache to the checkpoint. when images file is empty, ignore it and 
> continue.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
> 98c8fc769f2525c66539f08e2aa82506912e8a59 
>   src/tests/containerizer/provisioner_docker_tests.cpp 
> 71247c308b205de3d20a41ceb06eed6aa70fb25d 
> 
> 
> Diff: https://reviews.apache.org/r/67597/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> bin zheng
> 
>



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-20 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review205128
---



PASS: Mesos patch 67597 was successfully built and tested.

Reviews applied: `['67597']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67597

- Mesos Reviewbot Windows


On June 20, 2018, 8:17 p.m., bin zheng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67597/
> ---
> 
> (Updated June 20, 2018, 8:17 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8871
> https://issues.apache.org/jira/browse/MESOS-8871
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an issue where agent may fail to recover if the agent dies before image 
> store cache to the checkpoint. when images file is empty, ignore it and 
> continue.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
> 98c8fc769f2525c66539f08e2aa82506912e8a59 
>   src/tests/containerizer/provisioner_docker_tests.cpp 
> 71247c308b205de3d20a41ceb06eed6aa70fb25d 
> 
> 
> Diff: https://reviews.apache.org/r/67597/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> bin zheng
> 
>



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-20 Thread bin zheng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/
---

(Updated 六月 20, 2018, 8:17 p.m.)


Review request for mesos and Gilbert Song.


Bugs: MESOS-8871
https://issues.apache.org/jira/browse/MESOS-8871


Repository: mesos


Description (updated)
---

Fixed an issue where agent may fail to recover if the agent dies before image 
store cache to the checkpoint. when images file is empty, ignore it and 
continue.


Diffs (updated)
-

  src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
98c8fc769f2525c66539f08e2aa82506912e8a59 
  src/tests/containerizer/provisioner_docker_tests.cpp 
71247c308b205de3d20a41ceb06eed6aa70fb25d 


Diff: https://reviews.apache.org/r/67597/diff/3/

Changes: https://reviews.apache.org/r/67597/diff/2-3/


Testing
---


Thanks,

bin zheng



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-18 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review204986
---


Fix it, then Ship it!




LGTM!


src/tests/containerizer/provisioner_docker_tests.cpp
Lines 244 (patched)


Could you add "This is a regression test for MESOS-8871."



src/tests/containerizer/provisioner_docker_tests.cpp
Lines 246 (patched)


I would name it as `MetadataManagerRecoveryWithEmptyImagesFile`



src/tests/containerizer/provisioner_docker_tests.cpp
Lines 252 (patched)


1. const string ...

2. s/empty_images/storedImagesPath/g (usually we don't use `snake` style 
for variable names.



src/tests/containerizer/provisioner_docker_tests.cpp
Lines 254 (patched)


do we need this mkdir?



src/tests/containerizer/provisioner_docker_tests.cpp
Lines 260 (patched)


newline above.


- Gilbert Song


On June 15, 2018, 10:44 a.m., bin zheng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67597/
> ---
> 
> (Updated June 15, 2018, 10:44 a.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8871
> https://issues.apache.org/jira/browse/MESOS-8871
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an issue where agent may fail to recover.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
> 98c8fc769f2525c66539f08e2aa82506912e8a59 
>   src/tests/containerizer/provisioner_docker_tests.cpp 
> 71247c308b205de3d20a41ceb06eed6aa70fb25d 
> 
> 
> Diff: https://reviews.apache.org/r/67597/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> bin zheng
> 
>



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-16 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review204893
---



Patch looks great!

Reviews applied: [67597]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On June 15, 2018, 10:44 a.m., bin zheng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67597/
> ---
> 
> (Updated June 15, 2018, 10:44 a.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8871
> https://issues.apache.org/jira/browse/MESOS-8871
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an issue where agent may fail to recover.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
> 98c8fc769f2525c66539f08e2aa82506912e8a59 
>   src/tests/containerizer/provisioner_docker_tests.cpp 
> 71247c308b205de3d20a41ceb06eed6aa70fb25d 
> 
> 
> Diff: https://reviews.apache.org/r/67597/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> bin zheng
> 
>



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-15 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review204865
---



PASS: Mesos patch 67597 was successfully built and tested.

Reviews applied: `['67597']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67597

- Mesos Reviewbot Windows


On June 15, 2018, 5:44 p.m., bin zheng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67597/
> ---
> 
> (Updated June 15, 2018, 5:44 p.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8871
> https://issues.apache.org/jira/browse/MESOS-8871
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an issue where agent may fail to recover.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
> 98c8fc769f2525c66539f08e2aa82506912e8a59 
>   src/tests/containerizer/provisioner_docker_tests.cpp 
> 71247c308b205de3d20a41ceb06eed6aa70fb25d 
> 
> 
> Diff: https://reviews.apache.org/r/67597/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> bin zheng
> 
>



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-15 Thread bin zheng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/
---

(Updated 六月 15, 2018, 5:44 p.m.)


Review request for mesos and Gilbert Song.


Changes
---

Ok, Updated according to new idea


Bugs: MESOS-8871
https://issues.apache.org/jira/browse/MESOS-8871


Repository: mesos


Description (updated)
---

Fixed an issue where agent may fail to recover.


Diffs (updated)
-

  src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
98c8fc769f2525c66539f08e2aa82506912e8a59 
  src/tests/containerizer/provisioner_docker_tests.cpp 
71247c308b205de3d20a41ceb06eed6aa70fb25d 


Diff: https://reviews.apache.org/r/67597/diff/2/

Changes: https://reviews.apache.org/r/67597/diff/1-2/


Testing
---


Thanks,

bin zheng



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-15 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review204824
---



Patch looks great!

Reviews applied: [67597]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On June 14, 2018, 7:48 a.m., bin zheng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67597/
> ---
> 
> (Updated June 14, 2018, 7:48 a.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8871
> https://issues.apache.org/jira/browse/MESOS-8871
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an issue where agent may fail to recover if the agent dies before image 
> store cache to the checkpoint. when images file is empty, remove it and 
> continue.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
> 98c8fc769f2525c66539f08e2aa82506912e8a59 
>   src/tests/containerizer/provisioner_docker_tests.cpp 
> 71247c308b205de3d20a41ceb06eed6aa70fb25d 
> 
> 
> Diff: https://reviews.apache.org/r/67597/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> bin zheng
> 
>



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-14 Thread Gilbert Song

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review204806
---




src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
Lines 269-273 (patched)


I don't think we need to remove this file since it will be overwritten once 
there is next successful container launch.

Let's return a `WARNING` and unblock the agent from recovery.



src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
Lines 274 (patched)


newline above.


- Gilbert Song


On June 14, 2018, 12:48 a.m., bin zheng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67597/
> ---
> 
> (Updated June 14, 2018, 12:48 a.m.)
> 
> 
> Review request for mesos and Gilbert Song.
> 
> 
> Bugs: MESOS-8871
> https://issues.apache.org/jira/browse/MESOS-8871
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed an issue where agent may fail to recover if the agent dies before image 
> store cache to the checkpoint. when images file is empty, remove it and 
> continue.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
> 98c8fc769f2525c66539f08e2aa82506912e8a59 
>   src/tests/containerizer/provisioner_docker_tests.cpp 
> 71247c308b205de3d20a41ceb06eed6aa70fb25d 
> 
> 
> Diff: https://reviews.apache.org/r/67597/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> bin zheng
> 
>



Re: Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-14 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/#review204765
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['67597']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67597

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67597/logs/mesos-tests-stdout.log):

```
[   OK ] Endpoint/SlaveEndpointTest.NoAuthorizer/2 (112 ms)
[--] 9 tests from Endpoint/SlaveEndpointTest (1018 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (44 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (50 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (94 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (768 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (791 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (705 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (732 ms total)

[--] Global test environment tear-down
[==] 988 tests from 97 test cases ran. (464218 ms total)
[  PASSED  ] 987 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveTest.RestartSlaveRequireExecutorAuthentication

 1 FAILED TEST
  YOU HAVE 220 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67597/logs/mesos-tests-stderr.log):

```
I0614 10:14:26.748006  8396 slave.cpp:3939] Shutting down framework 
272f115b-fed2-40e9-a311-f860ead0937b-
I0614 10:14:26.748006  2880 master.cpp:10863] Updating the state of task 
08d3eead-f6ae-45cc-8681-f1c7fc798916 of framework 
272f115b-fed2-40e9-a311-f860ead0937b-I0614 10:14:26.588014  8080 
exec.cpp:162] Version: 1.7.0
I0614 10:14:26.615022  6024 exec.cpp:236] Executor registered on agent 
272f115b-fed2-40e9-a311-f860ead0937b-S0
I0614 10:14:26.619009  3772 executor.cpp:178] Received SUBSCRIBED event
I0614 10:14:26.623009  3772 executor.cpp:182] Subscribed executor on 
windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net
I0614 10:14:26.624009  3772 executor.cpp:178] Received LAUNCH event
I0614 10:14:26.628007  3772 executor.cpp:665] Starting task 
08d3eead-f6ae-45cc-8681-f1c7fc798916
I0614 10:14:26.709031  3772 executor.cpp:485] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0614 10:14:26.721025  3772 executor.cpp:678] Forked command at 7404
I0614 10:14:26.750013  1728 exec.cpp:445] Executor asked to shutdown
I0614 10:14:26.751024  3772 executor.cpp:178] Received SHUTDOWN event
I0614 10:14:26.751024  3772 executor.cpp:781] Shutting down
I0614 10:14:26.751024  3772 executor.cpp:894] Sending SIGTERM to process tree 
at pid 740 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0614 10:14:26.748006  8396 slave.cpp:6660] Shutting down executor 
'08d3eead-f6ae-45cc-8681-f1c7fc798916' of framework 
272f115b-fed2-40e9-a311-f860ead0937b- at executor(1)@192.10.1.6:55351
I0614 10:14:26.750013  1464 slave.cpp:931] Agent terminating
W0614 10:14:26.751024  1464 slave.cpp:3935] Ignoring shutdown framework 
272f115b-fed2-40e9-a311-f860ead0937b- because it is terminating
I0614 10:14:26.753013  2880 master.cpp:10962] Removing task 
08d3eead-f6ae-45cc-8681-f1c7fc798916 with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework 272f115b-fed2-40e9-a311-f860ead0937b- on 
agent 272f115b-fed2-40e9-a311-f860ead0937b-S0 at slave(449)@192.10.1.6:55330 
(windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0614 10:14:26.755023  2880 master.cpp:1293] Agent 
272f115b-fed2-40e9-a311-f860ead0937b-S0 at slave(449)@192.10.1.6:55330 
(windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net) disconnected
I0614 10:14:26.756023  2880 master.cpp:3303] Disconnecting agent 
272f115b-fed2-40e9-a311-f860ead0937b-S0 at slave(449)@192.10.1.6:55330 
(windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0614 10:14:26.756023  6884 containerizer.cpp:2406] Destroying container 
8b9f84b5-a603-460b-bbc6-eaa8f0d5e5f4 in RUNNING state
I0614 10:14:26.756023  6884 containerizer.cpp:3020] Transitioning the state of 
container 8b9f84b5-a603-460b-bbc6-eaa8f0d5e5

Review Request 67597: Fixed an issue where agent may fail to recover.

2018-06-14 Thread bin zheng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67597/
---

Review request for mesos and Gilbert Song.


Bugs: MESOS-8871
https://issues.apache.org/jira/browse/MESOS-8871


Repository: mesos


Description
---

Fixed an issue where agent may fail to recover if the agent dies before image 
store cache to the checkpoint. when images file is empty, remove it and 
continue.


Diffs
-

  src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp 
98c8fc769f2525c66539f08e2aa82506912e8a59 
  src/tests/containerizer/provisioner_docker_tests.cpp 
71247c308b205de3d20a41ceb06eed6aa70fb25d 


Diff: https://reviews.apache.org/r/67597/diff/1/


Testing
---


Thanks,

bin zheng