[jira] [Commented] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-25 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167831#comment-15167831
 ] 

Joseph Wu commented on MESOS-4676:
--

{code}
commit d0d4d5a64e8aa17f0bc364060d98690b49037550
Author: Joseph Wu 
Date:   Thu Feb 25 12:22:07 2016 +0100

Fixed flakiness in DockerContainerizerTest.ROOT_DOCKER_Logs.

Adds the `unbuffer` utility in front of each `echo` in the test.
Since Docker appears to handle simultaneous stdout/stderr in a
non-robust fashion, this mitigates the amount of overlap the two
streams will have in the test.

Review: https://reviews.apache.org/r/43963/
{code}

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, test
> Fix For: 0.28.0
>
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 

[jira] [Commented] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-24 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166618#comment-15166618
 ] 

haosdent commented on MESOS-4676:
-

I also find this issue https://github.com/docker/docker/issues/19950 It said 
should be 
{quote}
This error is from stdcopy package which muxes stdout/stderr streams. It seems 
like now it writes something weird; I think it can also be golang version 
change.
{quote}
And I could reproduce through the example code in the issue 
https://gist.github.com/dpiddy/0c460a8bb297ee19a7a0

Verify that add {{-t}} when {{docker run}} also could avoid this problem.

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] 

[jira] [Commented] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-24 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163720#comment-15163720
 ] 

Joseph Wu commented on MESOS-4676:
--

Based on the linked issue ("Bug report for Docker 1.9.1 on Fedora"), it looks 
like docker has some sort of race when the containerized process writes to both 
stdout & stderr at the same time.

To mitigate the test hitting this:
* Try separating the two {{echo}} commands.
* Try using the {{unbuffer}} utility. i.e. {{unbuffer echo foo; unbuffer echo 
bar 1>&2}}.  See https://github.com/docker/docker/issues/1385

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] I0215 17:06:25.264010  1757 master.cpp:1712] The newly 
> elected leader is 

[jira] [Commented] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158030#comment-15158030
 ] 

Joseph Wu commented on MESOS-4676:
--

Confirmed that this is a docker issue.  I fished out a command string from a 
failed test with {{GLOG_v=1}}, then ran it independently repeatedly.

On Ubuntu12 (another place we're seeing the failure):
{code}
sh -c 'while true; do 
  docker -H unix:///var/run/docker.sock run --cpu-shares 2048 --memory 
1073741824 \
  -e MESOS_SANDBOX=/mnt/mesos/sandbox \
  -e 
MESOS_CONTAINER_NAME=mesos-3672e44c-2c92-48d5-825e-e8475227ad88-S0.bdb7f52c-5d3e-46f9-b676-4e693fb0d1f2
 \
  -v 
/tmp/DockerContainerizerTest_ROOT_DOCKER_Logs_vSwYXT/slaves/3672e44c-2c92-48d5-825e-e8475227ad88-S0/frameworks/3672e44c-2c92-48d5-825e-e8475227ad88-/executors/1/runs/bdb7f52c-5d3e-46f9-b676-4e693fb0d1f2:/mnt/mesos/sandbox
 \
  --net host \
  --entrypoint /bin/sh \
  alpine -c "echo outd5d895af-0c86-41bc-9f27-037ab12d8035 ; echo 
errd5d895af-0c86-41bc-9f27-037ab12d8035 1>&2"; 
done' 2>&1 | grep -v \
  -e "^outd5d895af-0c86-41bc-9f27-037ab12d8035$" \
  -e "^errd5d895af-0c86-41bc-9f27-037ab12d8035$" \
  -e "^WARNING: Your kernel does not support swap limit capabilities, memory 
limited without swap.$"
{code}

After about an hour (don't know exactly how many iterations), got the following 
output:
{code}
(outd5d895af-0c86-41bc-9f27-037abUnrecognized input header
(outd5d895af-0c86-41bc-9f27-037abUnrecognized input header
(errd5d895af-0c86-41bc-9f27-037abUnrecognized input header
(errd5d895af-0c86-41bc-9f27-037abUnrecognized input header
...
{code}

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> 

[jira] [Commented] (MESOS-4676) ROOT_DOCKER_Logs is flaky.

2016-02-17 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150895#comment-15150895
 ] 

haosdent commented on MESOS-4676:
-

I think this is related to docker issue. When I try to reproduce this, I saw 
{{Unrecognized input header}} in stderr.
{code}
I0218 01:44:26.620074 17540 docker.cpp:766] Running docker -H 
unix:///var/run/docker.sock inspect 
mesos-d2bdb09d-f546-4c5a-9385-628ebce457d9-S0.5978ddad-425c-4641-a5e8-2a62d6c45753
Unrecognized input header
{code}

And stdout looks weird.
{code}
Starting task 1
^B^@^@^@^@^@^@(errba84f3ab-0f60-4747-8451-56d82Shutting down
{code}

> ROOT_DOCKER_Logs is flaky.
> --
>
> Key: MESOS-4676
> URL: https://issues.apache.org/jira/browse/MESOS-4676
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
> Environment: CentOS 7 with SSL.
>Reporter: Bernd Mathiske
>  Labels: flaky, mesosphere, test
>
> {noformat}
> [18:06:25][Step 8/8] [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_Logs
> [18:06:25][Step 8/8] I0215 17:06:25.256103  1740 leveldb.cpp:174] Opened db 
> in 6.548327ms
> [18:06:25][Step 8/8] I0215 17:06:25.258002  1740 leveldb.cpp:181] Compacted 
> db in 1.837816ms
> [18:06:25][Step 8/8] I0215 17:06:25.258059  1740 leveldb.cpp:196] Created db 
> iterator in 22044ns
> [18:06:25][Step 8/8] I0215 17:06:25.258076  1740 leveldb.cpp:202] Seeked to 
> beginning of db in 2347ns
> [18:06:25][Step 8/8] I0215 17:06:25.258091  1740 leveldb.cpp:271] Iterated 
> through 0 keys in the db in 571ns
> [18:06:25][Step 8/8] I0215 17:06:25.258152  1740 replica.cpp:779] Replica 
> recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [18:06:25][Step 8/8] I0215 17:06:25.258936  1758 recover.cpp:447] Starting 
> replica recovery
> [18:06:25][Step 8/8] I0215 17:06:25.259177  1758 recover.cpp:473] Replica is 
> in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.260327  1757 replica.cpp:673] Replica in 
> EMPTY status received a broadcasted recover request from 
> (13608)@172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.260545  1758 recover.cpp:193] Received a 
> recover response from a replica in EMPTY status
> [18:06:25][Step 8/8] I0215 17:06:25.261065  1757 master.cpp:376] Master 
> 112363e2-c680-4946-8fee-d0626ed8b21e (ip-172-30-2-239.mesosphere.io) started 
> on 172.30.2.239:39785
> [18:06:25][Step 8/8] I0215 17:06:25.261209  1761 recover.cpp:564] Updating 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.261086  1757 master.cpp:378] Flags at 
> startup: --acls="" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate="true" 
> --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/HncLLj/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/HncLLj/master" 
> --zk_session_timeout="10secs"
> [18:06:25][Step 8/8] I0215 17:06:25.261446  1757 master.cpp:423] Master only 
> allowing authenticated frameworks to register
> [18:06:25][Step 8/8] I0215 17:06:25.261456  1757 master.cpp:428] Master only 
> allowing authenticated slaves to register
> [18:06:25][Step 8/8] I0215 17:06:25.261462  1757 credentials.hpp:35] Loading 
> credentials for authentication from '/tmp/HncLLj/credentials'
> [18:06:25][Step 8/8] I0215 17:06:25.261723  1757 master.cpp:468] Using 
> default 'crammd5' authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.261855  1757 master.cpp:537] Using 
> default 'basic' HTTP authenticator
> [18:06:25][Step 8/8] I0215 17:06:25.262022  1757 master.cpp:571] 
> Authorization enabled
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1755 hierarchical.cpp:144] 
> Initialized hierarchical allocator process
> [18:06:25][Step 8/8] I0215 17:06:25.262177  1758 whitelist_watcher.cpp:77] No 
> whitelist given
> [18:06:25][Step 8/8] I0215 17:06:25.262899  1760 leveldb.cpp:304] Persisting 
> metadata (8 bytes) to leveldb took 1.517992ms
> [18:06:25][Step 8/8] I0215 17:06:25.262924  1760 replica.cpp:320] Persisted 
> replica status to STARTING
> [18:06:25][Step 8/8] I0215 17:06:25.263144  1754 recover.cpp:473] Replica is 
> in STARTING status
> [18:06:25][Step 8/8] I0215 17:06:25.264010  1757