[jira] [Commented] (MESOS-6629) Add master validation of FrameworkInfo.roles.

2016-11-24 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692580#comment-15692580
 ] 

Jay Guo commented on MESOS-6629:


bq. {{FrameworkInfo.roles}} must not contain duplicate entries.
Do we want subscription to fail for duplicate roles or we simply deduplicate it 
and generate a warning? For the latter case we could change {{roles::parse}} to 
return std::set and reuse it for both framework and master's {{-- roles}}. 
Currently, master throws out warning for duplicate roles in {{-- roles}}: 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L662

> Add master validation of FrameworkInfo.roles.
> -
>
> Key: MESOS-6629
> URL: https://issues.apache.org/jira/browse/MESOS-6629
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Jay Guo
>
> The master should disallow frameworks from subscribing based on the following:
> (1) Only one of {{FrameworkInfo.role}} and {{FrameworkInfo.roles}} must be 
> set at a time.
> (2) If {{FrameworkInfo.roles}} is set, then the MULTI_ROLE framework 
> capability must be provided.
> (3) If the MULTI_ROLE framework capability is provided, then 
> {{FrameworkInfo.role}} must not be set.
> (4) {{FrameworkInfo.roles}} must not contain duplicate entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6473) Build support for ATTACH_CONTAINER_OUTPUT into the Agent API in Mesos

2016-11-24 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15692583#comment-15692583
 ] 

Adam B commented on MESOS-6473:
---

https://reviews.apache.org/r/53995/

> Build support for ATTACH_CONTAINER_OUTPUT into the Agent API in Mesos
> -
>
> Key: MESOS-6473
> URL: https://issues.apache.org/jira/browse/MESOS-6473
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Vinod Kone
>  Labels: debugging, mesosphere
>
> Coupled with the ATTACH_CONTAINER_INPUT call, this call will attach a remote 
> client to the the input/output of the entrypoint of a container. All 
> input/output data will be packed into I/O messages and interleaved with 
> control messages sent between a client and the agent. A single chunked 
> request will be used to stream messages to the agent over the input stream, 
> and a single chunked response will be used to stream messages to the client 
> over the output stream.
> This call will integrate with the I/O switchboard to stream data between the 
> container and the HTTP stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6002) The whiteout file cannot be removed correctly using aufs backend.

2016-11-24 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6002:
---
Target Version/s: 1.2.0

> The whiteout file cannot be removed correctly using aufs backend.
> -
>
> Key: MESOS-6002
> URL: https://issues.apache.org/jira/browse/MESOS-6002
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any os with aufs module
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>  Labels: aufs, backend, containerizer
> Fix For: 1.1.1
>
> Attachments: whiteout.diff
>
>
> The whiteout file is not removed correctly when using the aufs backend in 
> unified containerizer. It can be verified by this unit test with the aufs 
> manually specified.
> {noformat}
> [20:11:24] :   [Step 10/10] [ RUN  ] 
> ProvisionerDockerPullerTest.ROOT_INTERNET_CURL_Whiteout
> [20:11:24]W:   [Step 10/10] I0805 20:11:24.986734 24295 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.001153 24295 leveldb.cpp:174] 
> Opened db in 14.308627ms
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003731 24295 leveldb.cpp:181] 
> Compacted db in 2.558329ms
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003749 24295 leveldb.cpp:196] 
> Created db iterator in 3086ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003754 24295 leveldb.cpp:202] 
> Seeked to beginning of db in 595ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003758 24295 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 314ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003769 24295 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004086 24315 recover.cpp:451] 
> Starting replica recovery
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004251 24312 recover.cpp:477] 
> Replica is in EMPTY status
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004546 24314 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5640)@172.30.2.105:36006
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004607 24312 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004762 24313 recover.cpp:568] 
> Updating replica status to STARTING
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004776 24314 master.cpp:375] 
> Master 21665992-d47e-402f-a00c-6f8fab613019 (ip-172-30-2-105.mesosphere.io) 
> started on 172.30.2.105:36006
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004787 24314 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/0z753P/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/0z753P/master" --zk_session_timeout="10secs"
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004920 24314 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004930 24314 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004935 24314 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004942 24314 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/0z753P/credentials'
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005018 24314 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005101 24314 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005152 24314 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> [20:11:25]W: 

[jira] [Updated] (MESOS-6360) The handling of whiteout files in provisioner is not correct.

2016-11-24 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6360:
---
Target Version/s: 1.2.0  (was: 1.1.1)
   Fix Version/s: 1.2.0
  1.1.1

> The handling of whiteout files in provisioner is not correct.
> -
>
> Key: MESOS-6360
> URL: https://issues.apache.org/jira/browse/MESOS-6360
> Project: Mesos
>  Issue Type: Bug
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Blocker
> Fix For: 1.1.1, 1.2.0
>
>
> Currently when user launches a container from a Docker image via universal 
> containerizer, we always handle the whiteout files in 
> {{ProvisionerProcess::__provision()}} regardless of which backend is used.
> However this is actually not correct, because the way to handle whiteout 
> files is backend dependent, that means for different backends, we need to 
> handle whiteout files in different ways, e.g.:
> * AUFS backend: It seems the AUFS whiteout ({{.wh.}} and 
> {{.wh..wh..opq}}) is the whiteout standard in Docker (see [this comment | 
> https://github.com/docker/docker/blob/v1.12.1/pkg/archive/archive.go#L259:L262]
>  for details), so that means after the Docker image is pulled, its whiteout 
> files in the store are already in aufs format, then we do not need to do 
> anything about whiteout file handling because the aufs mount done in 
> {{AufsBackendProcess::provision()}} will handle it automatically.
> * Overlay backend: Overlayfs has its own whiteout files (see [this doc | 
> https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt] for 
> details), so we need to convert the aufs whiteout files to overlayfs whiteout 
> files before we do the overlay mount in {{OverlayBackendProcess::provision}} 
> which will automatically handle the overlayfs whiteout files.
> * Copy backend: We need to manually handle the aufs whiteout files when we 
> copy each layer in {{CopyBackendProcess::_provision()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6002) The whiteout file cannot be removed correctly using aufs backend.

2016-11-24 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6002:
---
Fix Version/s: 1.1.1

> The whiteout file cannot be removed correctly using aufs backend.
> -
>
> Key: MESOS-6002
> URL: https://issues.apache.org/jira/browse/MESOS-6002
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any os with aufs module
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>  Labels: aufs, backend, containerizer
> Fix For: 1.1.1
>
> Attachments: whiteout.diff
>
>
> The whiteout file is not removed correctly when using the aufs backend in 
> unified containerizer. It can be verified by this unit test with the aufs 
> manually specified.
> {noformat}
> [20:11:24] :   [Step 10/10] [ RUN  ] 
> ProvisionerDockerPullerTest.ROOT_INTERNET_CURL_Whiteout
> [20:11:24]W:   [Step 10/10] I0805 20:11:24.986734 24295 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.001153 24295 leveldb.cpp:174] 
> Opened db in 14.308627ms
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003731 24295 leveldb.cpp:181] 
> Compacted db in 2.558329ms
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003749 24295 leveldb.cpp:196] 
> Created db iterator in 3086ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003754 24295 leveldb.cpp:202] 
> Seeked to beginning of db in 595ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003758 24295 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 314ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003769 24295 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004086 24315 recover.cpp:451] 
> Starting replica recovery
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004251 24312 recover.cpp:477] 
> Replica is in EMPTY status
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004546 24314 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5640)@172.30.2.105:36006
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004607 24312 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004762 24313 recover.cpp:568] 
> Updating replica status to STARTING
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004776 24314 master.cpp:375] 
> Master 21665992-d47e-402f-a00c-6f8fab613019 (ip-172-30-2-105.mesosphere.io) 
> started on 172.30.2.105:36006
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004787 24314 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/0z753P/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/0z753P/master" --zk_session_timeout="10secs"
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004920 24314 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004930 24314 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004935 24314 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004942 24314 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/0z753P/credentials'
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005018 24314 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005101 24314 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readonly'
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005152 24314 http.cpp:883] Using 
> default 'basic' HTTP authenticator for realm 'mesos-master-readwrite'
> [20:11:25]W:   [

[jira] [Created] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)
Artem Harutyunyan created MESOS-6640:


 Summary: mesos-local doesn't hande --work_dir correctly
 Key: MESOS-6640
 URL: https://issues.apache.org/jira/browse/MESOS-6640
 Project: Mesos
  Issue Type: Bug
Reporter: Artem Harutyunyan
 Fix For: 1.2.0


After {{{--work_dir}}} was made required for {{{mesos-agent}}} it's only 
possible to launch {{{mesos-local}}} if MESOS_WORK_DIR environment variable is 
set. 

Using {{{--work_dir}}} does not work:

{{{
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
}}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{{--work_dir}}} was made required for {mesos-agent} it's only possible 
to launch {mesos-local} if MESOS_WORK_DIR environment variable is set. 

Using {--work_dir} does not work:

{{
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
}}

  was:
After {{{--work_dir}}} was made required for {{{mesos-agent}}} it's only 
possible to launch {{{mesos-local}}} if MESOS_WORK_DIR environment variable is 
set. 

Using {{{--work_dir}}} does not work:

{{{
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{work_dir}} was made required for {{mesos-agent}} it's only possible to 
launch {{mesos-local}} if MESOS_WORK_DIR environment variable is set. 

Using {{work_dir}} does not work:

{quote}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{quote}

  was:
After {{{--work_dir}}} was made required for {mesos-agent} it's only possible 
to launch {mesos-local} if MESOS_WORK_DIR environment variable is set. 

Using {--work_dir} does not work:

{{
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--ve

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:

{quote}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{quote}

  was:
After {{work_dir}} was made required for {{mesos-agent}} it's only possible to 
launch {{mesos-local}} if MESOS_WORK_DIR environment variable is set. 

Using {{work_dir}} does not work:

{quote}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--regis

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:



{code}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{code}

  was:
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:

{code}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly

2016-11-24 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6640:
-
Description: 
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:

{code}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{code}

  was:
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:

{quote}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_

[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly.

2016-11-24 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6640:
---
Summary: mesos-local doesn't hande --work_dir correctly.  (was: mesos-local 
doesn't hande --work_dir correctly)

> mesos-local doesn't hande --work_dir correctly.
> ---
>
> Key: MESOS-6640
> URL: https://issues.apache.org/jira/browse/MESOS-6640
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>  Labels: beginner, newbie
> Fix For: 1.2.0
>
>
> After {{work_dir}} became a required command line flag for {{mesos-agent}} 
> it's only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment 
> variable is set.  Using {{work_dir}} that {{mesos-local}} presumably allows 
> to set does not work:
> {code}
> ~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
> I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
> I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
> I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
> I1124 13:26:42.617058 1064960 master.cpp:380] Master 
> 73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
> 10.204.3.193:5050
> I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="false" --authenticate_frameworks="false" 
> --authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticators="crammd5" 
> --authorizers="local" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
> --registry_max_agent_count="102400" --registry_store_timeout="20secs" 
> --registry_strict="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" 
> --webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
> --work_dir="/tmp/foo" --zk_session_timeout="10secs"
> I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response 
> from a replica in EMPTY status
> I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
> frameworks to register
> I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
> agents to register
> I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
> to register without authentication
> I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
> authenticator
> W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
> authentication requests will be refused
> I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
> Failed to start a local cluster while loading agent flags from the 
> environment: Flag 'work_dir' is required, but it was not provided
> ~/src/mesos-install  $
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6640) mesos-local doesn't hande --work_dir correctly.

2016-11-24 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6640:
---
Description: 
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if {{MESOS_WORK_DIR}} environment 
variable is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to 
set does not work:



{code}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="20secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" 
--webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
--work_dir="/tmp/foo" --zk_session_timeout="10secs"
I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response from 
a replica in EMPTY status
I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
frameworks to register
I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
agents to register
I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
to register without authentication
I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
authenticator
W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
authentication requests will be refused
I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
Failed to start a local cluster while loading agent flags from the environment: 
Flag 'work_dir' is required, but it was not provided
~/src/mesos-install  $
{code}

  was:
After {{work_dir}} became a required command line flag for {{mesos-agent}} it's 
only possible to launch {{mesos-local}} if MESOS_WORK_DIR environment variable 
is set.  Using {{work_dir}} that {{mesos-local}} presumably allows to set does 
not work:



{code}
~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status received 
a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
I1124 13:26:42.617058 1064960 master.cpp:380] Master 
73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
10.204.3.193:5050
I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="false" --authenticate_frameworks="false" 
--authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--re

[jira] [Commented] (MESOS-6640) mesos-local doesn't hande --work_dir correctly.

2016-11-24 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693488#comment-15693488
 ] 

haosdent commented on MESOS-6640:
-

This is resolved at https://reviews.apache.org/r/52787/ 

> mesos-local doesn't hande --work_dir correctly.
> ---
>
> Key: MESOS-6640
> URL: https://issues.apache.org/jira/browse/MESOS-6640
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>  Labels: beginner, newbie
> Fix For: 1.2.0
>
>
> After {{work_dir}} became a required command line flag for {{mesos-agent}} 
> it's only possible to launch {{mesos-local}} if {{MESOS_WORK_DIR}} 
> environment variable is set.  Using {{work_dir}} that {{mesos-local}} 
> presumably allows to set does not work:
> {code}
> ~/src/mesos-install  $ ./bin/mesos-local --work_dir=/tmp/foo
> I1124 13:26:42.609170 2103623680 replica.cpp:776] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1124 13:26:42.610047 1601536 recover.cpp:451] Starting replica recovery
> I1124 13:26:42.610213 1601536 recover.cpp:477] Replica is in EMPTY status
> I1124 13:26:42.615016 2138112 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from __req_res__(1)@10.204.3.193:5050
> I1124 13:26:42.617058 1064960 master.cpp:380] Master 
> 73762f1c-314b-4e7c-a7e9-b820bfd9dde7 (xkcd2358.railnet.train) started on 
> 10.204.3.193:5050
> I1124 13:26:42.617082 1064960 master.cpp:382] Flags at startup: 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="false" --authenticate_frameworks="false" 
> --authenticate_http_frameworks="false" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticators="crammd5" 
> --authorizers="local" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
> --registry_max_agent_count="102400" --registry_store_timeout="20secs" 
> --registry_strict="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" 
> --webui_dir="/Users/xkcd2358/src/mesos-install/share/mesos/webui" 
> --work_dir="/tmp/foo" --zk_session_timeout="10secs"
> I1124 13:26:42.617246 2138112 recover.cpp:197] Received a recover response 
> from a replica in EMPTY status
> I1124 13:26:42.617292 1064960 master.cpp:434] Master allowing unauthenticated 
> frameworks to register
> I1124 13:26:42.617301 1064960 master.cpp:448] Master allowing unauthenticated 
> agents to register
> I1124 13:26:42.617306 1064960 master.cpp:462] Master allowing HTTP frameworks 
> to register without authentication
> I1124 13:26:42.617316 1064960 master.cpp:504] Using default 'crammd5' 
> authenticator
> W1124 13:26:42.617328 1064960 authenticator.cpp:512] No credentials provided, 
> authentication requests will be refused
> I1124 13:26:42.617334 1064960 authenticator.cpp:519] Initializing server SASL
> Failed to start a local cluster while loading agent flags from the 
> environment: Flag 'work_dir' is required, but it was not provided
> ~/src/mesos-install  $
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6641) Remove deprecated hooks from our module API.

2016-11-24 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-6641:
-

 Summary: Remove deprecated hooks from our module API.
 Key: MESOS-6641
 URL: https://issues.apache.org/jira/browse/MESOS-6641
 Project: Mesos
  Issue Type: Improvement
  Components: modules
Reporter: Till Toenshoff
Priority: Minor


By now we have at least one deprecated hook in our modules API which is 
{{slavePreLaunchDockerHook}}. 

There is a new one coming in now which is deprecating 
{{slavePreLaunchDockerEnvironmentDecorator}}.

We need to actually remove those deprecations while making the community aware 
- this ticket is meant for tracking this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6642) Provide a slave hook for adding volumes to a task.

2016-11-24 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-6642:
-

 Summary: Provide a slave hook for adding volumes to a task. 
 Key: MESOS-6642
 URL: https://issues.apache.org/jira/browse/MESOS-6642
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6642) Provide a slave hook for adding volumes to a task.

2016-11-24 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6642:
--
 Labels: hooks  (was: )
Description: We should allow a hook to provide additional volumes before 
launching a task.
Component/s: modules

> Provide a slave hook for adding volumes to a task. 
> ---
>
> Key: MESOS-6642
> URL: https://issues.apache.org/jira/browse/MESOS-6642
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Till Toenshoff
>  Labels: hooks
>
> We should allow a hook to provide additional volumes before launching a task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6642) Provide a slave hook for adding volumes to a task.

2016-11-24 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6642:
--
Description: We should allow a hook to provide additional volumes before 
launching a container.  (was: We should allow a hook to provide additional 
volumes before launching a task.)

> Provide a slave hook for adding volumes to a task. 
> ---
>
> Key: MESOS-6642
> URL: https://issues.apache.org/jira/browse/MESOS-6642
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Till Toenshoff
>  Labels: hooks
>
> We should allow a hook to provide additional volumes before launching a 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6642) Provide a slave hook for adding volumes to a task's container.

2016-11-24 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6642:
--
Summary: Provide a slave hook for adding volumes to a task's container.   
(was: Provide a slave hook for adding volumes to a task. )

> Provide a slave hook for adding volumes to a task's container. 
> ---
>
> Key: MESOS-6642
> URL: https://issues.apache.org/jira/browse/MESOS-6642
> Project: Mesos
>  Issue Type: Improvement
>  Components: modules
>Reporter: Till Toenshoff
>  Labels: hooks
>
> We should allow a hook to provide additional volumes before launching a 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-11-24 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694334#comment-15694334
 ] 

Alexander Rukletsov commented on MESOS-1802:


I can reproduce it relatively easy by running _parallel_ {{make check}}. Here 
is a fresh log:
{noformat}
[ RUN  ] HealthCheckTest.HealthStatusChange
I1124 23:20:48.351884 4284416 exec.cpp:162] Version: 1.2.0
I1124 23:20:48.375592 3747840 exec.cpp:237] Executor registered on agent 
6db7ef4d-7211-47be-98ba-ad590b528c69-S0
Received SUBSCRIBED event
Subscribed executor on alexr.speedportneo09012801000249
Received LAUNCH event
Starting task 1
/Users/alex/Projects/mesos/build/parallel/src/mesos-containerizer launch 
--command="{"shell":true,"value":"sleep 120"}" --help="false"
Forked command at 73286
Received task health update, healthy: true
rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory
W1124 23:20:48.544631 3211264 health_checker.cpp:245] Health check failed 1 
times consecutively: COMMAND health check failed: Command returned exited with 
status 1
Received task health update, healthy: false
Received task health update, healthy: true
rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory
../../../src/tests/health_check_tests.cpp:790: Failure
Value of: (find).get()
  Actual: 16-byte object <05-00 00-00 00-00 00-00 60-A9 62-1B B2-7F 00-00>
Expected: false
Which is: false
I1124 23:20:48.732457 4284416 exec.cpp:414] Executor asked to shutdown
Received SHUTDOWN event
Shutting down
Sending SIGTERM to process tree at pid 73286
W1124 23:20:48.747885 1064960 health_checker.cpp:245] Health check failed 1 
times consecutively: COMMAND health check failed: Command returned exited with 
status 1
rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory
W1124 23:20:48.948562 3747840 health_checker.cpp:245] Health check failed 1 
times consecutively: COMMAND health check failed: Command returned exited with 
status 1
Sent SIGTERM to the following process trees:
[ 
--- 73286 sleep 120
]
Scheduling escalation to SIGKILL in 3secs from now
[  FAILED  ] HealthCheckTest.HealthStatusChange (1639 ms)
{noformat}

These lines
{noformat}
Received task health update, healthy: true
rm: /private/tmp/z1PbfH/rG7Gha: No such file or directory
../../../src/tests/health_check_tests.cpp:790: Failure
{noformat}
obviously hint that we've queried the HTTP endpoint _after_ the next health 
status change.

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky, health-check, mesosphere
> Attachments: health_check_flaky_test_log.txt
>
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 2

[jira] [Commented] (MESOS-6467) Build a Container I/O Switchboard

2016-11-24 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694338#comment-15694338
 ] 

Jie Yu commented on MESOS-6467:
---

commit 9d73fe7e46251a624443962aad69d5a582a738e5
Author: Kevin Klues 
Date:   Thu Nov 24 11:46:38 2016 -0800

Added a level of indirection for logger through an IO Switchboard.

The purpose of this component is to feed stdin to a container from an
external source, as well as redirect the stdin/stdout of a container
to multiple targets.

In this commit, we simply add the IOSwitchboard as a component that
interposes on the fds set up to communicate between the logger and a
container.

In the future, we will expand this component to (optionaly) launch a
sidecar HTTP server process which will be responsible for handling
'ATTACH_CONTAINER_INPUT' and 'ATTACH_CONTAINER_OUTPUT' calls on behalf
of a container to redirect the stdin/stdout/sderr of a container to
external clients.

Review: https://reviews.apache.org/r/53704/

> Build a Container I/O Switchboard
> -
>
> Key: MESOS-6467
> URL: https://issues.apache.org/jira/browse/MESOS-6467
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> In order to facilitate attach operations for a running container, we plan to 
> introduce a new component into Mesos known as an “I/O switchboard”. The goal 
> of this switchboard is to allow external components to *dynamically* 
> interpose on the {{stdin}}, {{stdout}} and {{stderr}} of the init process of 
> a running Mesos container. It will be implemented as a per-container, 
> stand-alone process launched by the mesos containerizer at the time a 
> container is first launched.
> Each per-container switchboard will be responsible for the following:
>  * Accepting a single dynamic request to register an fd for streaming data to 
> the {{stdin}} of a container’s init process.
>  * Accepting *multiple* dynamic requests to register fds for streaming data 
> from the {{stdout}} and {{stderr}} of a container’s init process to those fds.
>  * Allocating a pty for the new process (if requested), and directing data 
> through the master fd of the pty as necessary.
>  * Passing the *actual* set of file descriptors that should be dup’d onto the 
> {{stdin}}, {{stdout}} and {{stderr}} of a container’s init process back to 
> the containerizer. 
> The idea being that the switchboard will maintain three asynchronous loops 
> (one each for {{stdin}}, {{stdout}} and {{stderr}}) that constantly pipe data 
> to/from a container’s init process to/from all of the file descriptors that 
> have been dynamically registered with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1802) HealthCheckTest.HealthStatusChange is flaky on jenkins.

2016-11-24 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1802:
---
 Shepherd: Alexander Rukletsov
   Sprint: Mesosphere Sprint 48
Affects Version/s: 0.27.3
   0.28.2
   1.0.0
   1.0.1
 Story Points: 5
 Target Version/s: 1.2.0
 Priority: Minor  (was: Major)

> HealthCheckTest.HealthStatusChange is flaky on jenkins.
> ---
>
> Key: MESOS-1802
> URL: https://issues.apache.org/jira/browse/MESOS-1802
> Project: Mesos
>  Issue Type: Bug
>  Components: test, tests
>Affects Versions: 0.26.0, 0.27.3, 0.28.2, 1.0.0, 1.0.1
>Reporter: Benjamin Mahler
>Assignee: haosdent
>Priority: Minor
>  Labels: flaky, health-check, mesosphere
> Attachments: health_check_flaky_test_log.txt
>
>
> https://builds.apache.org/job/Mesos-Trunk-Ubuntu-Build-Out-Of-Src-Disable-Java-Disable-Python-Disable-Webui/2374/consoleFull
> {noformat}
> [ RUN  ] HealthCheckTest.HealthStatusChange
> Using temporary directory '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2'
> I0916 22:56:14.034612 21026 leveldb.cpp:176] Opened db in 2.155713ms
> I0916 22:56:14.034965 21026 leveldb.cpp:183] Compacted db in 332489ns
> I0916 22:56:14.034984 21026 leveldb.cpp:198] Created db iterator in 3710ns
> I0916 22:56:14.034996 21026 leveldb.cpp:204] Seeked to beginning of db in 
> 642ns
> I0916 22:56:14.035006 21026 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 343ns
> I0916 22:56:14.035023 21026 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0916 22:56:14.035200 21054 recover.cpp:425] Starting replica recovery
> I0916 22:56:14.035403 21041 recover.cpp:451] Replica is in EMPTY status
> I0916 22:56:14.035888 21045 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0916 22:56:14.035969 21052 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0916 22:56:14.036118 21042 recover.cpp:542] Updating replica status to 
> STARTING
> I0916 22:56:14.036603 21046 master.cpp:286] Master 
> 20140916-225614-3125920579-47865-21026 (penates.apache.org) started on 
> 67.195.81.186:47865
> I0916 22:56:14.036634 21046 master.cpp:332] Master only allowing 
> authenticated frameworks to register
> I0916 22:56:14.036648 21046 master.cpp:337] Master only allowing 
> authenticated slaves to register
> I0916 22:56:14.036659 21046 credentials.hpp:36] Loading credentials for 
> authentication from 
> '/tmp/HealthCheckTest_HealthStatusChange_IYnlu2/credentials'
> I0916 22:56:14.036686 21045 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 480322ns
> I0916 22:56:14.036700 21045 replica.cpp:320] Persisted replica status to 
> STARTING
> I0916 22:56:14.036769 21046 master.cpp:366] Authorization enabled
> I0916 22:56:14.036826 21045 recover.cpp:451] Replica is in STARTING status
> I0916 22:56:14.036944 21052 master.cpp:120] No whitelist given. Advertising 
> offers for all slaves
> I0916 22:56:14.036968 21049 hierarchical_allocator_process.hpp:299] 
> Initializing hierarchical allocator process with master : 
> master@67.195.81.186:47865
> I0916 22:56:14.037284 21054 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0916 22:56:14.037312 21046 master.cpp:1212] The newly elected leader is 
> master@67.195.81.186:47865 with id 20140916-225614-3125920579-47865-21026
> I0916 22:56:14.037333 21046 master.cpp:1225] Elected as the leading master!
> I0916 22:56:14.037345 21046 master.cpp:1043] Recovering from registrar
> I0916 22:56:14.037504 21040 registrar.cpp:313] Recovering registrar
> I0916 22:56:14.037505 21053 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0916 22:56:14.037681 21047 recover.cpp:542] Updating replica status to VOTING
> I0916 22:56:14.038072 21052 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 330251ns
> I0916 22:56:14.038087 21052 replica.cpp:320] Persisted replica status to 
> VOTING
> I0916 22:56:14.038127 21053 recover.cpp:556] Successfully joined the Paxos 
> group
> I0916 22:56:14.038202 21053 recover.cpp:440] Recover process terminated
> I0916 22:56:14.038364 21048 log.cpp:656] Attempting to start the writer
> I0916 22:56:14.038812 21053 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0916 22:56:14.038925 21053 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 92623ns
> I0916 22:56:14.038944 21053 replica.cpp:342] Persisted promised to 1
> I0916 22:56:14.039201 21052 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0916 22:56:14.039676 21047 replica.cpp:37

[jira] [Updated] (MESOS-6467) Build a Container I/O Switchboard

2016-11-24 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6467:
--
Shepherd: Jie Yu

> Build a Container I/O Switchboard
> -
>
> Key: MESOS-6467
> URL: https://issues.apache.org/jira/browse/MESOS-6467
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> In order to facilitate attach operations for a running container, we plan to 
> introduce a new component into Mesos known as an “I/O switchboard”. The goal 
> of this switchboard is to allow external components to *dynamically* 
> interpose on the {{stdin}}, {{stdout}} and {{stderr}} of the init process of 
> a running Mesos container. It will be implemented as a per-container, 
> stand-alone process launched by the mesos containerizer at the time a 
> container is first launched.
> Each per-container switchboard will be responsible for the following:
>  * Accepting a single dynamic request to register an fd for streaming data to 
> the {{stdin}} of a container’s init process.
>  * Accepting *multiple* dynamic requests to register fds for streaming data 
> from the {{stdout}} and {{stderr}} of a container’s init process to those fds.
>  * Allocating a pty for the new process (if requested), and directing data 
> through the master fd of the pty as necessary.
>  * Passing the *actual* set of file descriptors that should be dup’d onto the 
> {{stdin}}, {{stdout}} and {{stderr}} of a container’s init process back to 
> the containerizer. 
> The idea being that the switchboard will maintain three asynchronous loops 
> (one each for {{stdin}}, {{stdout}} and {{stderr}}) that constantly pipe data 
> to/from a container’s init process to/from all of the file descriptors that 
> have been dynamically registered with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6467) Build a Container I/O Switchboard

2016-11-24 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694455#comment-15694455
 ] 

Jie Yu commented on MESOS-6467:
---

commit 448a88d337fe01ac1c42c170147c33c249d48be6
Author: Jie Yu 
Date:   Thu Nov 24 14:48:08 2016 -0800

Redirected mesos containerizer 'local' flag through IOSwitchboard.

Review: https://reviews.apache.org/r/54067/

> Build a Container I/O Switchboard
> -
>
> Key: MESOS-6467
> URL: https://issues.apache.org/jira/browse/MESOS-6467
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> In order to facilitate attach operations for a running container, we plan to 
> introduce a new component into Mesos known as an “I/O switchboard”. The goal 
> of this switchboard is to allow external components to *dynamically* 
> interpose on the {{stdin}}, {{stdout}} and {{stderr}} of the init process of 
> a running Mesos container. It will be implemented as a per-container, 
> stand-alone process launched by the mesos containerizer at the time a 
> container is first launched.
> Each per-container switchboard will be responsible for the following:
>  * Accepting a single dynamic request to register an fd for streaming data to 
> the {{stdin}} of a container’s init process.
>  * Accepting *multiple* dynamic requests to register fds for streaming data 
> from the {{stdout}} and {{stderr}} of a container’s init process to those fds.
>  * Allocating a pty for the new process (if requested), and directing data 
> through the master fd of the pty as necessary.
>  * Passing the *actual* set of file descriptors that should be dup’d onto the 
> {{stdin}}, {{stdout}} and {{stderr}} of a container’s init process back to 
> the containerizer. 
> The idea being that the switchboard will maintain three asynchronous loops 
> (one each for {{stdin}}, {{stdout}} and {{stderr}}) that constantly pipe data 
> to/from a container’s init process to/from all of the file descriptors that 
> have been dynamically registered with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6467) Build a Container I/O Switchboard

2016-11-24 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694477#comment-15694477
 ] 

Jie Yu commented on MESOS-6467:
---

commit bd1bd150dcb23d65dcadb4903822ffd88cfb
Author: Kevin Klues 
Date:   Thu Nov 24 16:04:15 2016 -0800

Added agent flags to enable/disable launching an io switchboard server.

Review: https://reviews.apache.org/r/53936/

> Build a Container I/O Switchboard
> -
>
> Key: MESOS-6467
> URL: https://issues.apache.org/jira/browse/MESOS-6467
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> In order to facilitate attach operations for a running container, we plan to 
> introduce a new component into Mesos known as an “I/O switchboard”. The goal 
> of this switchboard is to allow external components to *dynamically* 
> interpose on the {{stdin}}, {{stdout}} and {{stderr}} of the init process of 
> a running Mesos container. It will be implemented as a per-container, 
> stand-alone process launched by the mesos containerizer at the time a 
> container is first launched.
> Each per-container switchboard will be responsible for the following:
>  * Accepting a single dynamic request to register an fd for streaming data to 
> the {{stdin}} of a container’s init process.
>  * Accepting *multiple* dynamic requests to register fds for streaming data 
> from the {{stdout}} and {{stderr}} of a container’s init process to those fds.
>  * Allocating a pty for the new process (if requested), and directing data 
> through the master fd of the pty as necessary.
>  * Passing the *actual* set of file descriptors that should be dup’d onto the 
> {{stdin}}, {{stdout}} and {{stderr}} of a container’s init process back to 
> the containerizer. 
> The idea being that the switchboard will maintain three asynchronous loops 
> (one each for {{stdin}}, {{stdout}} and {{stderr}}) that constantly pipe data 
> to/from a container’s init process to/from all of the file descriptors that 
> have been dynamically registered with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6643) Improve logging of slave docker hook task decoration modules.

2016-11-24 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-6643:
-

 Summary: Improve logging of slave docker hook task decoration 
modules.
 Key: MESOS-6643
 URL: https://issues.apache.org/jira/browse/MESOS-6643
 Project: Mesos
  Issue Type: Improvement
  Components: modules
Reporter: Till Toenshoff
Priority: Minor


When multiple modules are in effect, implementing the docker task environment 
variables hook, we should tell the user about environment overrides possibly 
happening. The displayed information should contain the module identifier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6629) Add master validation of FrameworkInfo.roles.

2016-11-24 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-6629:
---
Comment: was deleted

(was: bq. {{FrameworkInfo.roles}} must not contain duplicate entries.
Do we want subscription to fail for duplicate roles or we simply deduplicate it 
and generate a warning? For the latter case we could change {{roles::parse}} to 
return std::set and reuse it for both framework and master's {{-- roles}}. 
Currently, master throws out warning for duplicate roles in {{-- roles}}: 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L662)

> Add master validation of FrameworkInfo.roles.
> -
>
> Key: MESOS-6629
> URL: https://issues.apache.org/jira/browse/MESOS-6629
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Jay Guo
>
> The master should disallow frameworks from subscribing based on the following:
> (1) Only one of {{FrameworkInfo.role}} and {{FrameworkInfo.roles}} must be 
> set at a time.
> (2) If {{FrameworkInfo.roles}} is set, then the MULTI_ROLE framework 
> capability must be provided.
> (3) If the MULTI_ROLE framework capability is provided, then 
> {{FrameworkInfo.role}} must not be set.
> (4) {{FrameworkInfo.roles}} must not contain duplicate entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)