[jira] [Commented] (MESOS-5491) Implement GET_AGENTS Call in v1 master API.

2016-06-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345919#comment-15345919
 ] 

Jay Guo commented on MESOS-5491:


[~vinodkone] In {{GET_STATE_SUMMARY}} API, we also need proto message {{Agent}} 
and {{Framework}}. I think it makes more sense to make these two message public 
and more inclusive. Fields required by {{GET_STATE_SUMMARY}} are:
{code}
  message GetStateSummary {
message Agent {
  optional string id = 1;
  optional string pid = 2;
  optional string hostname = 3;
  optional string registered_time = 4;
  optional string reregistered_time =5;
  repeated Resource resources = 6;
  repeated Resource used_resources = 7;
  repeated Resource reserved_resources = 8;
  repeated Resource unreserved_resources = 9;
  repeated Attribute attributes = 10;
  optional bool active = 11;
  optional string version = 12;
}

message Framework {
  optional string id = 1;
  optional string name = 2;
  optional string pid = 3;
  optional string hostname = 4;
  repeated Resource used_resources = 5;
  repeated Resource offered_resources = 6;
  repeated FrameworkInfo.Capability capabilities = 7;
  optional string webui_url = 8;
  optional bool active = 9;
}

optional string hostname = 1;
optional string cluster = 2;
repeated Agent agents = 3;
repeated Framework frameworks = 4;
  }
{code}

> Implement GET_AGENTS Call in v1 master API.
> ---
>
> Key: MESOS-5491
> URL: https://issues.apache.org/jira/browse/MESOS-5491
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: zhou xing
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5684) Master captures `this` when creating authorization callback

2016-06-22 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345842#comment-15345842
 ] 

Greg Mann commented on MESOS-5684:
--

Review here: https://reviews.apache.org/r/49132/

> Master captures `this` when creating authorization callback
> ---
>
> Key: MESOS-5684
> URL: https://issues.apache.org/jira/browse/MESOS-5684
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> When exposing its log file, the master currently installs an authorization 
> callback for the log file which captures the master's {{this}} pointer. Such 
> captures have previously caused bugs (MESOS-5629), and this one should be 
> fixed as well. The callback should be dispatched to the master process, and 
> it should be dispatched via the {{self()}} PID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5684) Master captures `this` when creating authorization callback

2016-06-22 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5684:
-
Shepherd: Vinod Kone

> Master captures `this` when creating authorization callback
> ---
>
> Key: MESOS-5684
> URL: https://issues.apache.org/jira/browse/MESOS-5684
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> When exposing its log file, the master currently installs an authorization 
> callback for the log file which captures the master's {{this}} pointer. Such 
> captures have previously caused bugs (MESOS-5629), and this one should be 
> fixed as well. The callback should be dispatched to the master process, and 
> it should be dispatched via the {{self()}} PID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5685) The /files/download endpoint's authorization can be compromised

2016-06-22 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345719#comment-15345719
 ] 

Greg Mann commented on MESOS-5685:
--

Review here: https://reviews.apache.org/r/49131/

> The /files/download endpoint's authorization can be compromised
> ---
>
> Key: MESOS-5685
> URL: https://issues.apache.org/jira/browse/MESOS-5685
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
>
> If a forward slash is appended to the path of a file a user wishes to 
> download via {{/files/download}}, the authorization logic for that path will 
> be bypassed and the file will be downloaded regardless of permissions. This 
> is because we store the authorization callbacks for these paths in a map 
> which is keyed by the path name, so a request to {{/master/log/}} fails to 
> find the callback which is installed for {{/master/log}}. When the master 
> fails to find the callback, it assumes authorization is not required for that 
> path and authorizes the action.
> Consider the following excerpt:
> {code}
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar
> HTTP/1.1 403 Forbidden
> Content-Length: 0
> Date: Wed, 22 Jun 2016 21:28:53 GMT
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar
> HTTP/1.1 200 OK
> Content-Disposition: attachment; 
> filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615
> Content-Length: 14432
> Content-Type: application/octet-stream
> Date: Wed, 22 Jun 2016 21:28:56 GMT
> Log file created at: 2016/06/22 14:28:43
> Running on machine: gmac
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started!
> I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' 
> allocator
> I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us
> I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us
> I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us
> I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db 
> in 9us
> I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in 
> the db in 8us
> I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery
> I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status
> I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' 
> authorizer
> I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master
> I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from (4)@127.0.0.1:5050
> I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response 
> from a replica in EMPTY status
> I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to 
> STARTING
> {code}
> We could consider disallowing paths which end in trailing slashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5685) The /files/download endpoint's authorization can be compromised

2016-06-22 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5685:
-
Shepherd: Vinod Kone

> The /files/download endpoint's authorization can be compromised
> ---
>
> Key: MESOS-5685
> URL: https://issues.apache.org/jira/browse/MESOS-5685
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
>
> If a forward slash is appended to the path of a file a user wishes to 
> download via {{/files/download}}, the authorization logic for that path will 
> be bypassed and the file will be downloaded regardless of permissions. This 
> is because we store the authorization callbacks for these paths in a map 
> which is keyed by the path name, so a request to {{/master/log/}} fails to 
> find the callback which is installed for {{/master/log}}. When the master 
> fails to find the callback, it assumes authorization is not required for that 
> path and authorizes the action.
> Consider the following excerpt:
> {code}
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar
> HTTP/1.1 403 Forbidden
> Content-Length: 0
> Date: Wed, 22 Jun 2016 21:28:53 GMT
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar
> HTTP/1.1 200 OK
> Content-Disposition: attachment; 
> filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615
> Content-Length: 14432
> Content-Type: application/octet-stream
> Date: Wed, 22 Jun 2016 21:28:56 GMT
> Log file created at: 2016/06/22 14:28:43
> Running on machine: gmac
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started!
> I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' 
> allocator
> I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us
> I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us
> I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us
> I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db 
> in 9us
> I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in 
> the db in 8us
> I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery
> I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status
> I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' 
> authorizer
> I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master
> I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from (4)@127.0.0.1:5050
> I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response 
> from a replica in EMPTY status
> I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to 
> STARTING
> {code}
> We could consider disallowing paths which end in trailing slashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5685) The /files/download endpoint's authorization can be compromised

2016-06-22 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-5685:


Assignee: Greg Mann

> The /files/download endpoint's authorization can be compromised
> ---
>
> Key: MESOS-5685
> URL: https://issues.apache.org/jira/browse/MESOS-5685
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> If a forward slash is appended to the path of a file a user wishes to 
> download via {{/files/download}}, the authorization logic for that path will 
> be bypassed and the file will be downloaded regardless of permissions. This 
> is because we store the authorization callbacks for these paths in a map 
> which is keyed by the path name, so a request to {{/master/log/}} fails to 
> find the callback which is installed for {{/master/log}}. When the master 
> fails to find the callback, it assumes authorization is not required for that 
> path and authorizes the action.
> Consider the following excerpt:
> {code}
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar
> HTTP/1.1 403 Forbidden
> Content-Length: 0
> Date: Wed, 22 Jun 2016 21:28:53 GMT
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar
> HTTP/1.1 200 OK
> Content-Disposition: attachment; 
> filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615
> Content-Length: 14432
> Content-Type: application/octet-stream
> Date: Wed, 22 Jun 2016 21:28:56 GMT
> Log file created at: 2016/06/22 14:28:43
> Running on machine: gmac
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started!
> I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' 
> allocator
> I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us
> I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us
> I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us
> I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db 
> in 9us
> I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in 
> the db in 8us
> I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery
> I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status
> I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' 
> authorizer
> I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master
> I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from (4)@127.0.0.1:5050
> I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response 
> from a replica in EMPTY status
> I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to 
> STARTING
> {code}
> We could consider disallowing paths which end in trailing slashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5685) The /files/download endpoint's authorization can be compromised

2016-06-22 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5685:
-
Priority: Blocker  (was: Major)

> The /files/download endpoint's authorization can be compromised
> ---
>
> Key: MESOS-5685
> URL: https://issues.apache.org/jira/browse/MESOS-5685
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
>
> If a forward slash is appended to the path of a file a user wishes to 
> download via {{/files/download}}, the authorization logic for that path will 
> be bypassed and the file will be downloaded regardless of permissions. This 
> is because we store the authorization callbacks for these paths in a map 
> which is keyed by the path name, so a request to {{/master/log/}} fails to 
> find the callback which is installed for {{/master/log}}. When the master 
> fails to find the callback, it assumes authorization is not required for that 
> path and authorizes the action.
> Consider the following excerpt:
> {code}
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar
> HTTP/1.1 403 Forbidden
> Content-Length: 0
> Date: Wed, 22 Jun 2016 21:28:53 GMT
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar
> HTTP/1.1 200 OK
> Content-Disposition: attachment; 
> filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615
> Content-Length: 14432
> Content-Type: application/octet-stream
> Date: Wed, 22 Jun 2016 21:28:56 GMT
> Log file created at: 2016/06/22 14:28:43
> Running on machine: gmac
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started!
> I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' 
> allocator
> I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us
> I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us
> I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us
> I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db 
> in 9us
> I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in 
> the db in 8us
> I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery
> I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status
> I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' 
> authorizer
> I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master
> I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from (4)@127.0.0.1:5050
> I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response 
> from a replica in EMPTY status
> I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to 
> STARTING
> {code}
> We could consider disallowing paths which end in trailing slashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5692) Add helper function "begin_with/end_with" to strings

2016-06-22 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345680#comment-15345680
 ] 

Gilbert Song commented on MESOS-5692:
-

Any special case that make a diff from using `strings::startsWith(s, "c")` and 
`strings::endsWith(s, "c")`?

> Add helper function "begin_with/end_with" to strings
> 
>
> Key: MESOS-5692
> URL: https://issues.apache.org/jira/browse/MESOS-5692
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Klaus Ma
>
> Add helper function to check whether a string is start/end with special char.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5693) slave delay to forword status update

2016-06-22 Thread zhangfuxing (JIRA)
zhangfuxing created MESOS-5693:
--

 Summary: slave delay to forword status update
 Key: MESOS-5693
 URL: https://issues.apache.org/jira/browse/MESOS-5693
 Project: Mesos
  Issue Type: Improvement
  Components: slave
Affects Versions: 0.22.1
 Environment: debian7 
Reporter: zhangfuxing


we observe that mesos slave delay to forward task status update to master, 

I0615 14:59:10.997902  3890 slave.cpp:2531] Handling status update TASK_KILLED 
(UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of framework 
20150629-151659-3355508746-5060-6173-0001 from executor(1)@10.0.40.189:54304
I0615 14:59:11.001126  3895 status_update_manager.cpp:317] Received status 
update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task 
xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001
I0615 14:59:11.001174  3895 status_update_manager.hpp:346] Checkpointing UPDATE 
for status update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for 
task xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001
I0615 14:59:11.037376  3894 slave.cpp:2709] Sending acknowledgement for status 
update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task 
xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001 to 
executor(1)@10.0.40.189:54304
I0615 15:54:21.352087  3888 slave.cpp:2776] Forwarding the update TASK_KILLED 
(UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of framework 
20150629-151659-3355508746-5060-6173-0001 to master@10.0.1.200:5060
for this example, the task xxx.64554b80 has been killed at 14:59 but the status 
didn't forward to master until 15:54



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5691) SSL downgrade support will leak sockets in CLOSE_WAIT status

2016-06-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5691:
-
Affects Version/s: 0.25.0
   0.26.0
   0.27.0
   0.28.0

> SSL downgrade support will leak sockets in CLOSE_WAIT status
> 
>
> Key: MESOS-5691
> URL: https://issues.apache.org/jira/browse/MESOS-5691
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Blocker
>  Labels: libprocess, mesosphere
> Fix For: 1.0.0
>
>
> Repro steps:
> 1) Start a master:
> {code}
> bin/mesos-master.sh --work_dir=/tmp/master
> {code}
> 2) Start an agent with SSL and downgrade enabled:
> {code}
> # Taken from http://mesos.apache.org/documentation/latest/ssl/
> openssl genrsa -des3 -f4 -passout pass:some_password -out key.pem 4096
> openssl req -new -x509 -passin pass:some_password -days 365 -key key.pem -out 
> cert.pem
> SSL_KEY_FILE=key.pem SSL_CERT_FILE=cert.pem SSL_ENABLED=true 
> SSL_SUPPORT_DOWNGRADE=true sudo -E bin/mesos-agent.sh --master=localhost:5050 
> --work_dir=/tmp/agent
> {code}
> 3) Start a framework that launches lots of executors, one after another:
> {code}
> sudo src/balloon-framework --master=localhost:5050 --task_memory=64mb 
> --task_memory_usage_limit=256mb --long_running
> {code}
> 4) Check FDs, repeatedly
> {code}
> sudo lsof -i | grep mesos | grep CLOSE_WAIT | wc -l
> {code}
> The number of sockets in {{CLOSE_WAIT}} will increase linearly with the 
> number of launched executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5692) Add helper function "begin_with/end_with" to strings

2016-06-22 Thread Klaus Ma (JIRA)
Klaus Ma created MESOS-5692:
---

 Summary: Add helper function "begin_with/end_with" to strings
 Key: MESOS-5692
 URL: https://issues.apache.org/jira/browse/MESOS-5692
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Klaus Ma


Add helper function to check whether a string is start/end with special char.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5691) SSL downgrade support will leak sockets in CLOSE_WAIT status

2016-06-22 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-5691:


 Summary: SSL downgrade support will leak sockets in CLOSE_WAIT 
status
 Key: MESOS-5691
 URL: https://issues.apache.org/jira/browse/MESOS-5691
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Affects Versions: 0.24.0
Reporter: Joseph Wu
Assignee: Joseph Wu
Priority: Blocker
 Fix For: 1.0.0


Repro steps:
1) Start a master:
{code}
bin/mesos-master.sh --work_dir=/tmp/master
{code}

2) Start an agent with SSL and downgrade enabled:
{code}
# Taken from http://mesos.apache.org/documentation/latest/ssl/
openssl genrsa -des3 -f4 -passout pass:some_password -out key.pem 4096
openssl req -new -x509 -passin pass:some_password -days 365 -key key.pem -out 
cert.pem

SSL_KEY_FILE=key.pem SSL_CERT_FILE=cert.pem SSL_ENABLED=true 
SSL_SUPPORT_DOWNGRADE=true sudo -E bin/mesos-agent.sh --master=localhost:5050 
--work_dir=/tmp/agent
{code}

3) Start a framework that launches lots of executors, one after another:
{code}
sudo src/balloon-framework --master=localhost:5050 --task_memory=64mb 
--task_memory_usage_limit=256mb --long_running
{code}

4) Check FDs, repeatedly
{code}
sudo lsof -i | grep mesos | grep CLOSE_WAIT | wc -l
{code}

The number of sockets in {{CLOSE_WAIT}} will increase linearly with the number 
of launched executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5565) Add logging when Offer::Operation::Launch message has no tasks.

2016-06-22 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345514#comment-15345514
 ] 

Klaus Ma commented on MESOS-5565:
-

sure, please help on this :). Thanks very much :).

> Add logging when Offer::Operation::Launch message has no tasks.
> ---
>
> Key: MESOS-5565
> URL: https://issues.apache.org/jira/browse/MESOS-5565
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Priority: Minor
>  Labels: newbie
>
> Currently, when a {{Offer::Accept::Launch}} message has no tasks specified, 
> Mesos would treat such requests as implicitly declining all offers. This can 
> be very counter-intuitive for framework developers since we do not have any 
> logging on the Master around this behavior. It would be good to add some 
> logging on the master to apprise the framework developers that all the offers 
> have been implicitly declined.
> {code}
> if (operation.type() == Offer::Operation::LAUNCH) {
>   if (operation.launch().task_infos().size() > 0) {
> ++metrics->messages_launch_tasks;
>   } else {
> ++metrics->messages_decline_offers;
>   }
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5565) Add logging when Offer::Operation::Launch message has no tasks.

2016-06-22 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-5565:

Assignee: (was: Klaus Ma)

> Add logging when Offer::Operation::Launch message has no tasks.
> ---
>
> Key: MESOS-5565
> URL: https://issues.apache.org/jira/browse/MESOS-5565
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Priority: Minor
>  Labels: newbie
>
> Currently, when a {{Offer::Accept::Launch}} message has no tasks specified, 
> Mesos would treat such requests as implicitly declining all offers. This can 
> be very counter-intuitive for framework developers since we do not have any 
> logging on the Master around this behavior. It would be good to add some 
> logging on the master to apprise the framework developers that all the offers 
> have been implicitly declined.
> {code}
> if (operation.type() == Offer::Operation::LAUNCH) {
>   if (operation.launch().task_infos().size() > 0) {
> ++metrics->messages_launch_tasks;
>   } else {
> ++metrics->messages_decline_offers;
>   }
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5676) A full redesign of the Mesos CLI

2016-06-22 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-5676:
---
Description: 
The current Mesos CLI is in serious need of a upgrade. It was created a long 
time ago and hasn't kept pace with the rest of the code base. In fact, many of 
the current Mesos CLI commands do not work out of the box. For example, using 
any of the python CLI scripts such as “mesos-cat” will result in an error 
because the proper Mesos library is not in not in the $PYTHONPATH by default. 

This Epic proposes a redesign of the Mesos CLI including (but not limited to) 
the following:
  * A complete rewrite of the CLI, from the ground up, with a more pluggable 
architecture, better help information, and bash-autocompletion
  * A full test suite for the CLI that is closely tied with the Mesos unit 
tests so CLI commands will be updated as Mesos features change
  * A new set of container-related commands in the vein of "docker exec", 
"docker ps", "docker top", "docker logs", etc.
  * Both a local and a remote component so you can debug locally using the CLI 
or gather information / launch cluster wide commands from the same CLI

Design doc:
https://docs.google.com/document/d/1r6Iv4Efu8v8IBrcUTjgYkvZ32WVscgYqrD07OyIglsA/edit?ts=57573bba#


  was:
The current Mesos CLI is in serious need of a upgrade. It was created a long 
time ago and hasn't kept pace with the rest of the code base. In fact, many of 
the current Mesos CLI commands do not work out of the box. For example, using 
any of the python CLI scripts such as “mesos-cat” will result in an error 
because the proper Mesos library is not in not in the $PYTHONPATH by default. 

This Epic proposes a redesign of the Mesos CLI including (but not limited to) 
the following:
  * A complete rewrite of the CLI, from the ground up, with a more pluggable 
architecture, better help information, and bash-autocompletion
  * A full test suite for the CLI that is closely tied with the Mesos unit 
tests so CLI commands will be updated as Mesos features change
  * A new set of container-related commands in the vein of "docker exec", 
"docker ps", "docker top", "docker logs", etc.
  * Both a local and a remote component so you can debug locally using the CLI 
or gather information / launch cluster wide commands from the same CLI

A full design doc will be posted soon.


> A full redesign of the Mesos CLI
> 
>
> Key: MESOS-5676
> URL: https://issues.apache.org/jira/browse/MESOS-5676
> Project: Mesos
>  Issue Type: Epic
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: CLI
>
> The current Mesos CLI is in serious need of a upgrade. It was created a long 
> time ago and hasn't kept pace with the rest of the code base. In fact, many 
> of the current Mesos CLI commands do not work out of the box. For example, 
> using any of the python CLI scripts such as “mesos-cat” will result in an 
> error because the proper Mesos library is not in not in the $PYTHONPATH by 
> default. 
> This Epic proposes a redesign of the Mesos CLI including (but not limited to) 
> the following:
>   * A complete rewrite of the CLI, from the ground up, with a more pluggable 
> architecture, better help information, and bash-autocompletion
>   * A full test suite for the CLI that is closely tied with the Mesos unit 
> tests so CLI commands will be updated as Mesos features change
>   * A new set of container-related commands in the vein of "docker exec", 
> "docker ps", "docker top", "docker logs", etc.
>   * Both a local and a remote component so you can debug locally using the 
> CLI or gather information / launch cluster wide commands from the same CLI
> Design doc:
> https://docs.google.com/document/d/1r6Iv4Efu8v8IBrcUTjgYkvZ32WVscgYqrD07OyIglsA/edit?ts=57573bba#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5653) Creating a persistent volume through the operator endpoints fail and doesn't produce meaningful logs.

2016-06-22 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345293#comment-15345293
 ] 

Greg Mann commented on MESOS-5653:
--

Closing this ticket as Won't Fix since we have MESOS-5664 to track the bug, and 
[~ctm3] and I had some luck troubleshooting this on IRC.

> Creating a persistent volume through the operator endpoints fail and doesn't 
> produce meaningful logs.
> -
>
> Key: MESOS-5653
> URL: https://issues.apache.org/jira/browse/MESOS-5653
> Project: Mesos
>  Issue Type: Bug
>  Components: master, volumes
>Affects Versions: 0.28.2
> Environment: Centos 7 - 3.10.0-327.13.1.el7.x86_64, Mesos 0.28.2
>Reporter: cliff
>Assignee: Greg Mann
>  Labels: persistent-volumes
>
> When attempting to create a persistent volume via the /create-volumes 
> operator endpoint. I get a HTTP 200  from the master and in the logs on the 
> master I see:
> {noformat}
> http.cpp:312] HTTP POST for /master/create-volumes from "172.16.10.11:40686 
> with User-Agent='curl/7.29.0' "
> {noformat}
> then next line I see on the master is:
> {noformat}
> "master.cpp:6560] Sending checkpointed resources  to slave 
> 0ef7d2e1-8b0d-44d4-8db0-cc58ac2058af-S0 at slave(1)@172.16.10.4:5051"
> {noformat}
> Now if I look in the logs on the slave that was specified in the request to 
> create a persistent volume I see:
> then on the slave I see:
> {noformat}
>  "1572 slave.cpp:2327] Updated checkpointed resources from  to   "
> {noformat}
> Notice that from destination and a to destination are both missing 
> specifically, they should be the valueos of:
> checkpointedResources and newCheckpointedResources, from here:
> https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2582
> I am currently running only one slave for troubleshooting purposes, the 
> resource file on the slave with the disk resource looks like the following:
> #resources=file:///etc/default/mesos.resources.json
> {noformat}
> [
>{
> "name": "disk",
> "type": "SCALAR",
> "scalar": {
>   "value": 5
> }
>   },
>{
>   "name":"disk",
>   "type":"SCALAR",
>   "scalar":{
>  "value":100
>   },
>   "role":"testing",
>   "disk":{
>  "source":{
> "type":"MOUNT",
> "mount":{
>"root":"/data"
> }
>  }
>   }
>},
>{
>   "name":"cpus",
>   "type":"SCALAR",
>   "scalar":{
>  "value":16
>   },
>   "role":"testing"
>},
>{
>   "name":"mem",
>   "type":"SCALAR",
>   "scalar":{
>  "value":128000
>   },
>   "role":"testing"
>},
>{
>   "name":"ports",
>   "type":"RANGES",
>   "ranges":{
>  "range":[
> {
>"begin":31000,
>"end":32000
> }
>  ]
>   },
>   "role":"testing"
>}
> ]
> {noformat}
> When I {{curl master:5050/slaves | jq '.'}} and look under the key 
> {{reserved_resources_full}}, I see the above resources on that slave. 
> Here is my request to via the operator endpoint {{/create-resources}}, I am 
> trying to create a persistent volume on the disk of type MOUNT above, which 
> is in {{/proc/mounts}} as {{/data}}:
> {noformat}
> curl -i  -d slaveId=0ee7d2e7-8b0d-44d4-8d80-cc58ac2058ae-S4 \  
>   -d volumes='[
>   {
> "name": "testvol",
> "type": "SCALAR",
> "scalar": { "value": 1 },
> "role": "testing",
> "disk": {
>  "source": {
>"type" : "MOUNT",
> "path" : { "root" : "/data" }
>  },
>   "persistence": {
>"id" : "cliff"
>  },
>   "volume": {
>"mode": "RW",
>"container_path": "/data"
>   }
> }
>   }
> ]' -X POST http://master:5050/master/create-volumes
> {noformat}
> 
> {noformat}
> HTTP/1.1 200 OK
> Date: Sun, 19 Jun 2016 04:38:45 GMT
> {noformat}
> If look at the slave specified with slaveID above via:
> {noformat}
> curl - http://slave1:5051/state  
> {noformat}
> I will not see the volume created. Also here are no errors in the INFO logs 
> on either the master or slave relating to this request. The only log entries 
> are those that I have provided. 
> The same problem/behavior seems to exist when trying creating persistent 
> volumes on dynamically reserved resources as well.
> My steps were:
> systemctl stop meso-slave
> cd /var/mesos
> rm -rf meta
> systemctl start mesos-slave
> then I issued the following to the /reserve operator endpoint:
> {noformat}
> curl -i \
>  

[jira] [Created] (MESOS-5690) `PortMappingIsolatorTest.ROOT_NC_HostToContainerUDP` fails on Fedora 23.

2016-06-22 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5690:
---

 Summary: `PortMappingIsolatorTest.ROOT_NC_HostToContainerUDP` 
fails on Fedora 23.
 Key: MESOS-5690
 URL: https://issues.apache.org/jira/browse/MESOS-5690
 Project: Mesos
  Issue Type: Bug
  Components: isolation, network
 Environment: Fedora 23 with network isolation
Reporter: Gilbert Song


{noformat}
[20:17:53] : [Step 10/10] [ RUN  ] 
PortMappingIsolatorTest.ROOT_NC_HostToContainerUDP
[20:17:53]W: [Step 10/10] I0622 20:17:53.323252 28395 
port_mapping_tests.cpp:229] Using eth0 as the public interface
[20:17:53]W: [Step 10/10] I0622 20:17:53.323557 28395 
port_mapping_tests.cpp:237] Using lo as the loopback interface
[20:17:53]W: [Step 10/10] I0622 20:17:53.337299 28395 resources.cpp:572] 
Parsing resources as JSON failed: 
cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
[20:17:53]W: [Step 10/10] Trying semicolon-delimited string format instead
[20:17:53]W: [Step 10/10] I0622 20:17:53.338345 28395 
port_mapping.cpp:1557] Using eth0 as the public interface
[20:17:53]W: [Step 10/10] I0622 20:17:53.338675 28395 
port_mapping.cpp:1582] Using lo as the loopback interface
[20:17:53]W: [Step 10/10] I0622 20:17:53.339855 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024'
[20:17:53]W: [Step 10/10] I0622 20:17:53.339901 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128'
[20:17:53]W: [Step 10/10] I0622 20:17:53.339938 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384   4194304'
[20:17:53]W: [Step 10/10] I0622 20:17:53.339972 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340010 28395 
port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340044 28395 
port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340073 28395 
port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340103 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380   6291456'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340136 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340165 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340196 28395 
port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340229 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340260 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340289 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512'
[20:17:53]W: [Step 10/10] I0622 20:17:53.340327 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15'
[20:17:53]W: [Step 10/10] I0622 20:17:53.349139 28395 
linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
[20:17:53]W: [Step 10/10] I0622 20:17:53.349345 28395 resources.cpp:572] 
Parsing resources as JSON failed: ports:[31000-31499]
[20:17:53]W: [Step 10/10] Trying semicolon-delimited string format instead
[20:17:53]W: [Step 10/10] I0622 20:17:53.349858 28409 
port_mapping.cpp:2512] Using non-ephemeral ports {[31000,31500)} and ephemeral 
ports [30016,30032) for container container1 of executor ''
[20:17:53]W: [Step 10/10] I0622 20:17:53.350821 28395 
linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | 
CLONE_NEWNET
[20:17:53]W: [Step 10/10] I0622 20:17:53.393595 28416 
port_mapping.cpp:2576] Bind mounted '/proc/15478/ns/net' to '/run/netns/15478' 
for container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.393739 28416 
port_mapping.cpp:2607] Created network namespace handle symlink 
'/var/run/mesos/netns/container1' -> '/run/netns/15478'
[20:17:53]W: [Step 10/10] I0622 20:17:53.395191 28416 
port_mapping.cpp:2667] Adding IP packet filters with ports [30016,30031] for 
container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.397608 28416 
port_mapping.cpp:2667] Adding IP packet filters with ports [31000,31007] for 
container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.402170 28416 
port_mapping.cpp:2667] Adding IP packet filters with ports [31008,31039] for 
container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.405225 28416 
port_mapping.cpp:2667] Adding IP packet filters with ports [31040,31103] for 
container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.408541 28416 
port_map

[jira] [Created] (MESOS-5689) `PortMappingIsolatorTest.ROOT_ContainerICMPExternal` fails on Fedora 23.

2016-06-22 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5689:
---

 Summary: `PortMappingIsolatorTest.ROOT_ContainerICMPExternal` 
fails on Fedora 23.
 Key: MESOS-5689
 URL: https://issues.apache.org/jira/browse/MESOS-5689
 Project: Mesos
  Issue Type: Bug
  Components: isolation, network
 Environment: Fedora 23 with network isolation
Reporter: Gilbert Song


Here is the log:
{noformat}
[20:17:53] : [Step 10/10] [ RUN  ] 
PortMappingIsolatorTest.ROOT_ContainerICMPExternal
[20:17:53]W: [Step 10/10] I0622 20:17:53.890225 28395 
port_mapping_tests.cpp:229] Using eth0 as the public interface
[20:17:53]W: [Step 10/10] I0622 20:17:53.890532 28395 
port_mapping_tests.cpp:237] Using lo as the loopback interface
[20:17:53]W: [Step 10/10] I0622 20:17:53.904742 28395 resources.cpp:572] 
Parsing resources as JSON failed: 
cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
[20:17:53]W: [Step 10/10] Trying semicolon-delimited string format instead
[20:17:53]W: [Step 10/10] I0622 20:17:53.905855 28395 
port_mapping.cpp:1557] Using eth0 as the public interface
[20:17:53]W: [Step 10/10] I0622 20:17:53.906159 28395 
port_mapping.cpp:1582] Using lo as the loopback interface
[20:17:53]W: [Step 10/10] I0622 20:17:53.907315 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907362 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907418 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384   4194304'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907454 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907491 28395 
port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907524 28395 
port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907557 28395 
port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907588 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380   6291456'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907618 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907649 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907680 28395 
port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907711 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907742 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907773 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512'
[20:17:53]W: [Step 10/10] I0622 20:17:53.907802 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15'
[20:17:53]W: [Step 10/10] I0622 20:17:53.916348 28395 
linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
[20:17:53]W: [Step 10/10] I0622 20:17:53.916575 28395 resources.cpp:572] 
Parsing resources as JSON failed: ports:[31000-31499]
[20:17:53]W: [Step 10/10] Trying semicolon-delimited string format instead
[20:17:53]W: [Step 10/10] I0622 20:17:53.917032 28412 
port_mapping.cpp:2512] Using non-ephemeral ports {[31000,31500)} and ephemeral 
ports [30016,30032) for container container1 of executor ''
[20:17:53]W: [Step 10/10] I0622 20:17:53.918092 28395 
linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | 
CLONE_NEWNET
[20:17:53]W: [Step 10/10] I0622 20:17:53.951756 28410 
port_mapping.cpp:2576] Bind mounted '/proc/15611/ns/net' to '/run/netns/15611' 
for container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.951918 28410 
port_mapping.cpp:2607] Created network namespace handle symlink 
'/var/run/mesos/netns/container1' -> '/run/netns/15611'
[20:17:53]W: [Step 10/10] I0622 20:17:53.952893 28410 
port_mapping.cpp:2667] Adding IP packet filters with ports [30016,30031] for 
container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.956142 28410 
port_mapping.cpp:2667] Adding IP packet filters with ports [31000,31007] for 
container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.961453 28410 
port_mapping.cpp:2667] Adding IP packet filters with ports [31008,31039] for 
container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.965399 28410 
port_mapping.cpp:2667] Adding IP packet filters with ports [31040,31103] for 
container container1
[20:17:53]W: [Step 10/10] I0622 20:17:53.96956

[jira] [Created] (MESOS-5688) `PortMappingIsolatorTest.ROOT_DNS` fails on Fedora 23.

2016-06-22 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5688:
---

 Summary: `PortMappingIsolatorTest.ROOT_DNS` fails on Fedora 23.
 Key: MESOS-5688
 URL: https://issues.apache.org/jira/browse/MESOS-5688
 Project: Mesos
  Issue Type: Bug
  Components: isolation, network
 Environment: Fedora 23 with network isolation
Reporter: Gilbert Song


Here is the log:
{noformat}
[20:18:04] : [Step 10/10] [ RUN  ] PortMappingIsolatorTest.ROOT_DNS
[20:18:04]W: [Step 10/10] I0622 20:18:04.877822 28395 
port_mapping_tests.cpp:229] Using eth0 as the public interface
[20:18:04]W: [Step 10/10] I0622 20:18:04.878106 28395 
port_mapping_tests.cpp:237] Using lo as the loopback interface
[20:18:04]W: [Step 10/10] I0622 20:18:04.891363 28395 resources.cpp:572] 
Parsing resources as JSON failed: 
cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
[20:18:04]W: [Step 10/10] Trying semicolon-delimited string format instead
[20:18:04]W: [Step 10/10] I0622 20:18:04.892331 28395 
port_mapping.cpp:1557] Using eth0 as the public interface
[20:18:04]W: [Step 10/10] I0622 20:18:04.892638 28395 
port_mapping.cpp:1582] Using lo as the loopback interface
[20:18:04]W: [Step 10/10] I0622 20:18:04.893723 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024'
[20:18:04]W: [Step 10/10] I0622 20:18:04.893770 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128'
[20:18:04]W: [Step 10/10] I0622 20:18:04.893806 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384   4194304'
[20:18:04]W: [Step 10/10] I0622 20:18:04.893838 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5'
[20:18:04]W: [Step 10/10] I0622 20:18:04.893875 28395 
port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992'
[20:18:04]W: [Step 10/10] I0622 20:18:04.893908 28395 
port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128'
[20:18:04]W: [Step 10/10] I0622 20:18:04.893937 28395 
port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992'
[20:18:04]W: [Step 10/10] I0622 20:18:04.893968 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380   6291456'
[20:18:04]W: [Step 10/10] I0622 20:18:04.893999 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200'
[20:18:04]W: [Step 10/10] I0622 20:18:04.894029 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512'
[20:18:04]W: [Step 10/10] I0622 20:18:04.894060 28395 
port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000'
[20:18:04]W: [Step 10/10] I0622 20:18:04.894093 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75'
[20:18:04]W: [Step 10/10] I0622 20:18:04.894124 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9'
[20:18:04]W: [Step 10/10] I0622 20:18:04.894153 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512'
[20:18:04]W: [Step 10/10] I0622 20:18:04.894186 28395 
port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15'
[20:18:04]W: [Step 10/10] I0622 20:18:04.902745 28395 
linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
[20:18:04]W: [Step 10/10] I0622 20:18:04.902940 28395 resources.cpp:572] 
Parsing resources as JSON failed: ports:[31000-31499]
[20:18:04]W: [Step 10/10] Trying semicolon-delimited string format instead
[20:18:04]W: [Step 10/10] I0622 20:18:04.903404 28412 
port_mapping.cpp:2512] Using non-ephemeral ports {[31000,31500)} and ephemeral 
ports [30016,30032) for container container1 of executor ''
[20:18:04]W: [Step 10/10] I0622 20:18:04.904423 28395 
linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | 
CLONE_NEWNET
[20:18:04]W: [Step 10/10] I0622 20:18:04.977530 28416 
port_mapping.cpp:2576] Bind mounted '/proc/15781/ns/net' to '/run/netns/15781' 
for container container1
[20:18:04]W: [Step 10/10] I0622 20:18:04.977715 28416 
port_mapping.cpp:2607] Created network namespace handle symlink 
'/var/run/mesos/netns/container1' -> '/run/netns/15781'
[20:18:04]W: [Step 10/10] I0622 20:18:04.978752 28416 
port_mapping.cpp:2667] Adding IP packet filters with ports [30016,30031] for 
container container1
[20:18:04]W: [Step 10/10] I0622 20:18:04.981956 28416 
port_mapping.cpp:2667] Adding IP packet filters with ports [31000,31007] for 
container container1
[20:18:04]W: [Step 10/10] I0622 20:18:04.985674 28416 
port_mapping.cpp:2667] Adding IP packet filters with ports [31008,31039] for 
container container1
[20:18:04]W: [Step 10/10] I0622 20:18:04.989276 28416 
port_mapping.cpp:2667] Adding IP packet filters with ports [31040,31103] for 
container container1
[20:18:04]W: [Step 10/10] I0622 20:18:04.993069 28416 
port_mapping.cpp:2667] Adding

[jira] [Updated] (MESOS-5687) Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified.

2016-06-22 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-5687:

Priority: Major  (was: Critical)

> Port mapping isolator may cause segfault if the agent flag 
> `egress_rate_limit_per_container` is specified.
> --
>
> Key: MESOS-5687
> URL: https://issues.apache.org/jira/browse/MESOS-5687
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network
>Affects Versions: 0.27.3, 0.28.2
> Environment: Fedora 23 with network isolatrion
>Reporter: Gilbert Song
>  Labels: isolation, mesosphere, networking
>
> The port mapping isolator may get into segfault if the agent flag 
> `egress_rate_limit_per_container` is specified and 
> `/sys/class/net/eth0/speed` is not readable. 
> This can be exposed in this test:
> {noformat}
> PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit
> {noformat}
> Here is the log:
> {noformat}
> [20:18:05] :   [Step 10/10] [ RUN  ] 
> PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.375366 28395 
> port_mapping_tests.cpp:229] Using eth0 as the public interface
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.375664 28395 
> port_mapping_tests.cpp:237] Using lo as the loopback interface
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.33 28395 resources.cpp:572] 
> Parsing resources as JSON failed: 
> cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
> [20:18:05]W:   [Step 10/10] Trying semicolon-delimited string format instead
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.389879 28395 
> port_mapping.cpp:1557] Using eth0 as the public interface
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.390173 28395 
> port_mapping.cpp:1582] Using lo as the loopback interface
> [20:18:05]W:   [Step 10/10] F0622 20:18:05.390365 28395 
> port_mapping_tests.cpp:1496] CHECK_SOME(isolator): Failed to read 
> /sys/class/net/eth0/speed: Invalid argument 
> [20:18:05]W:   [Step 10/10] *** Check failure stack trace: ***
> [20:18:05]W:   [Step 10/10] @ 0x7f11003bdd1a  
> google::LogMessage::Fail()
> [20:18:05]W:   [Step 10/10] @ 0x7f11003bdc73  
> google::LogMessage::SendToLog()
> [20:18:05]W:   [Step 10/10] @ 0x7f11003bd669  
> google::LogMessage::Flush()
> [20:18:05]W:   [Step 10/10] @ 0x7f11003c04da  
> google::LogMessageFatal::~LogMessageFatal()
> [20:18:05]W:   [Step 10/10] @   0xa62ce1  
> _CheckFatal::~_CheckFatal()
> [20:18:05]W:   [Step 10/10] @  0x199a13d  
> mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_SmallEgressLimit_Test::TestBody()
> [20:18:05]W:   [Step 10/10] @  0x1a36fbe  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [20:18:05]W:   [Step 10/10] @  0x1a3206c  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> [20:18:05]W:   [Step 10/10] @  0x1a12ab6  testing::Test::Run()
> [20:18:05]W:   [Step 10/10] @  0x1a1326e  testing::TestInfo::Run()
> [20:18:05]W:   [Step 10/10] @  0x1a138bf  testing::TestCase::Run()
> [20:18:05]W:   [Step 10/10] @  0x1a1a3fd  
> testing::internal::UnitTestImpl::RunAllTests()
> [20:18:05]W:   [Step 10/10] @  0x1a37c85  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [20:18:05]W:   [Step 10/10] @  0x1a32bac  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> [20:18:05]W:   [Step 10/10] @  0x1a190d9  testing::UnitTest::Run()
> [20:18:05]W:   [Step 10/10] @  0x1004b7f  RUN_ALL_TESTS()
> [20:18:05]W:   [Step 10/10] @  0x1004765  main
> [20:18:05]W:   [Step 10/10] @ 0x7f10f9aa4580  __libc_start_main
> [20:18:05]W:   [Step 10/10] @   0xa61339  _start
> [20:18:06]W:   [Step 10/10] 
> /mnt/teamcity/temp/agentTmp/custom_script8081387914816808529: line 3: 28395 
> Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose 
> --gtest_filter="$GTEST_FILTER"
> [20:18:06]W:   [Step 10/10] Process exited with code 134
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5687) Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified.

2016-06-22 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5687:
--
Affects Version/s: 0.27.3
   0.28.2

> Port mapping isolator may cause segfault if the agent flag 
> `egress_rate_limit_per_container` is specified.
> --
>
> Key: MESOS-5687
> URL: https://issues.apache.org/jira/browse/MESOS-5687
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network
>Affects Versions: 0.27.3, 0.28.2
> Environment: Fedora 23 with network isolatrion
>Reporter: Gilbert Song
>Priority: Critical
>  Labels: isolation, mesosphere, networking
>
> The port mapping isolator may get into segfault if the agent flag 
> `egress_rate_limit_per_container` is specified and 
> `/sys/class/net/eth0/speed` is not readable. 
> This can be exposed in this test:
> {noformat}
> PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit
> {noformat}
> Here is the log:
> {noformat}
> [20:18:05] :   [Step 10/10] [ RUN  ] 
> PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.375366 28395 
> port_mapping_tests.cpp:229] Using eth0 as the public interface
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.375664 28395 
> port_mapping_tests.cpp:237] Using lo as the loopback interface
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.33 28395 resources.cpp:572] 
> Parsing resources as JSON failed: 
> cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
> [20:18:05]W:   [Step 10/10] Trying semicolon-delimited string format instead
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.389879 28395 
> port_mapping.cpp:1557] Using eth0 as the public interface
> [20:18:05]W:   [Step 10/10] I0622 20:18:05.390173 28395 
> port_mapping.cpp:1582] Using lo as the loopback interface
> [20:18:05]W:   [Step 10/10] F0622 20:18:05.390365 28395 
> port_mapping_tests.cpp:1496] CHECK_SOME(isolator): Failed to read 
> /sys/class/net/eth0/speed: Invalid argument 
> [20:18:05]W:   [Step 10/10] *** Check failure stack trace: ***
> [20:18:05]W:   [Step 10/10] @ 0x7f11003bdd1a  
> google::LogMessage::Fail()
> [20:18:05]W:   [Step 10/10] @ 0x7f11003bdc73  
> google::LogMessage::SendToLog()
> [20:18:05]W:   [Step 10/10] @ 0x7f11003bd669  
> google::LogMessage::Flush()
> [20:18:05]W:   [Step 10/10] @ 0x7f11003c04da  
> google::LogMessageFatal::~LogMessageFatal()
> [20:18:05]W:   [Step 10/10] @   0xa62ce1  
> _CheckFatal::~_CheckFatal()
> [20:18:05]W:   [Step 10/10] @  0x199a13d  
> mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_SmallEgressLimit_Test::TestBody()
> [20:18:05]W:   [Step 10/10] @  0x1a36fbe  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [20:18:05]W:   [Step 10/10] @  0x1a3206c  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> [20:18:05]W:   [Step 10/10] @  0x1a12ab6  testing::Test::Run()
> [20:18:05]W:   [Step 10/10] @  0x1a1326e  testing::TestInfo::Run()
> [20:18:05]W:   [Step 10/10] @  0x1a138bf  testing::TestCase::Run()
> [20:18:05]W:   [Step 10/10] @  0x1a1a3fd  
> testing::internal::UnitTestImpl::RunAllTests()
> [20:18:05]W:   [Step 10/10] @  0x1a37c85  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [20:18:05]W:   [Step 10/10] @  0x1a32bac  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> [20:18:05]W:   [Step 10/10] @  0x1a190d9  testing::UnitTest::Run()
> [20:18:05]W:   [Step 10/10] @  0x1004b7f  RUN_ALL_TESTS()
> [20:18:05]W:   [Step 10/10] @  0x1004765  main
> [20:18:05]W:   [Step 10/10] @ 0x7f10f9aa4580  __libc_start_main
> [20:18:05]W:   [Step 10/10] @   0xa61339  _start
> [20:18:06]W:   [Step 10/10] 
> /mnt/teamcity/temp/agentTmp/custom_script8081387914816808529: line 3: 28395 
> Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose 
> --gtest_filter="$GTEST_FILTER"
> [20:18:06]W:   [Step 10/10] Process exited with code 134
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5686) Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified.

2016-06-22 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5686:
---

 Summary: Port mapping isolator may cause segfault if the agent 
flag `egress_rate_limit_per_container` is specified.
 Key: MESOS-5686
 URL: https://issues.apache.org/jira/browse/MESOS-5686
 Project: Mesos
  Issue Type: Bug
  Components: isolation, network
 Environment: Fedora 23 with network isolatrion
Reporter: Gilbert Song
Priority: Critical


The port mapping isolator may get into segfault if the agent flag 
`egress_rate_limit_per_container` is specified and `/sys/class/net/eth0/speed` 
is not readable. 

This can be exposed in this test:
{noformat}
PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit
{noformat}

Here is the log:
{noformat}
[20:18:05] : [Step 10/10] [ RUN  ] 
PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit
[20:18:05]W: [Step 10/10] I0622 20:18:05.375366 28395 
port_mapping_tests.cpp:229] Using eth0 as the public interface
[20:18:05]W: [Step 10/10] I0622 20:18:05.375664 28395 
port_mapping_tests.cpp:237] Using lo as the loopback interface
[20:18:05]W: [Step 10/10] I0622 20:18:05.33 28395 resources.cpp:572] 
Parsing resources as JSON failed: 
cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
[20:18:05]W: [Step 10/10] Trying semicolon-delimited string format instead
[20:18:05]W: [Step 10/10] I0622 20:18:05.389879 28395 
port_mapping.cpp:1557] Using eth0 as the public interface
[20:18:05]W: [Step 10/10] I0622 20:18:05.390173 28395 
port_mapping.cpp:1582] Using lo as the loopback interface
[20:18:05]W: [Step 10/10] F0622 20:18:05.390365 28395 
port_mapping_tests.cpp:1496] CHECK_SOME(isolator): Failed to read 
/sys/class/net/eth0/speed: Invalid argument 
[20:18:05]W: [Step 10/10] *** Check failure stack trace: ***
[20:18:05]W: [Step 10/10] @ 0x7f11003bdd1a  
google::LogMessage::Fail()
[20:18:05]W: [Step 10/10] @ 0x7f11003bdc73  
google::LogMessage::SendToLog()
[20:18:05]W: [Step 10/10] @ 0x7f11003bd669  
google::LogMessage::Flush()
[20:18:05]W: [Step 10/10] @ 0x7f11003c04da  
google::LogMessageFatal::~LogMessageFatal()
[20:18:05]W: [Step 10/10] @   0xa62ce1  
_CheckFatal::~_CheckFatal()
[20:18:05]W: [Step 10/10] @  0x199a13d  
mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_SmallEgressLimit_Test::TestBody()
[20:18:05]W: [Step 10/10] @  0x1a36fbe  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
[20:18:05]W: [Step 10/10] @  0x1a3206c  
testing::internal::HandleExceptionsInMethodIfSupported<>()
[20:18:05]W: [Step 10/10] @  0x1a12ab6  testing::Test::Run()
[20:18:05]W: [Step 10/10] @  0x1a1326e  testing::TestInfo::Run()
[20:18:05]W: [Step 10/10] @  0x1a138bf  testing::TestCase::Run()
[20:18:05]W: [Step 10/10] @  0x1a1a3fd  
testing::internal::UnitTestImpl::RunAllTests()
[20:18:05]W: [Step 10/10] @  0x1a37c85  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
[20:18:05]W: [Step 10/10] @  0x1a32bac  
testing::internal::HandleExceptionsInMethodIfSupported<>()
[20:18:05]W: [Step 10/10] @  0x1a190d9  testing::UnitTest::Run()
[20:18:05]W: [Step 10/10] @  0x1004b7f  RUN_ALL_TESTS()
[20:18:05]W: [Step 10/10] @  0x1004765  main
[20:18:05]W: [Step 10/10] @ 0x7f10f9aa4580  __libc_start_main
[20:18:05]W: [Step 10/10] @   0xa61339  _start
[20:18:06]W: [Step 10/10] 
/mnt/teamcity/temp/agentTmp/custom_script8081387914816808529: line 3: 28395 
Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose 
--gtest_filter="$GTEST_FILTER"
[20:18:06]W: [Step 10/10] Process exited with code 134
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5687) Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified.

2016-06-22 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-5687:
---

 Summary: Port mapping isolator may cause segfault if the agent 
flag `egress_rate_limit_per_container` is specified.
 Key: MESOS-5687
 URL: https://issues.apache.org/jira/browse/MESOS-5687
 Project: Mesos
  Issue Type: Bug
  Components: isolation, network
 Environment: Fedora 23 with network isolatrion
Reporter: Gilbert Song
Priority: Critical


The port mapping isolator may get into segfault if the agent flag 
`egress_rate_limit_per_container` is specified and `/sys/class/net/eth0/speed` 
is not readable. 

This can be exposed in this test:
{noformat}
PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit
{noformat}

Here is the log:
{noformat}
[20:18:05] : [Step 10/10] [ RUN  ] 
PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit
[20:18:05]W: [Step 10/10] I0622 20:18:05.375366 28395 
port_mapping_tests.cpp:229] Using eth0 as the public interface
[20:18:05]W: [Step 10/10] I0622 20:18:05.375664 28395 
port_mapping_tests.cpp:237] Using lo as the loopback interface
[20:18:05]W: [Step 10/10] I0622 20:18:05.33 28395 resources.cpp:572] 
Parsing resources as JSON failed: 
cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
[20:18:05]W: [Step 10/10] Trying semicolon-delimited string format instead
[20:18:05]W: [Step 10/10] I0622 20:18:05.389879 28395 
port_mapping.cpp:1557] Using eth0 as the public interface
[20:18:05]W: [Step 10/10] I0622 20:18:05.390173 28395 
port_mapping.cpp:1582] Using lo as the loopback interface
[20:18:05]W: [Step 10/10] F0622 20:18:05.390365 28395 
port_mapping_tests.cpp:1496] CHECK_SOME(isolator): Failed to read 
/sys/class/net/eth0/speed: Invalid argument 
[20:18:05]W: [Step 10/10] *** Check failure stack trace: ***
[20:18:05]W: [Step 10/10] @ 0x7f11003bdd1a  
google::LogMessage::Fail()
[20:18:05]W: [Step 10/10] @ 0x7f11003bdc73  
google::LogMessage::SendToLog()
[20:18:05]W: [Step 10/10] @ 0x7f11003bd669  
google::LogMessage::Flush()
[20:18:05]W: [Step 10/10] @ 0x7f11003c04da  
google::LogMessageFatal::~LogMessageFatal()
[20:18:05]W: [Step 10/10] @   0xa62ce1  
_CheckFatal::~_CheckFatal()
[20:18:05]W: [Step 10/10] @  0x199a13d  
mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_SmallEgressLimit_Test::TestBody()
[20:18:05]W: [Step 10/10] @  0x1a36fbe  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
[20:18:05]W: [Step 10/10] @  0x1a3206c  
testing::internal::HandleExceptionsInMethodIfSupported<>()
[20:18:05]W: [Step 10/10] @  0x1a12ab6  testing::Test::Run()
[20:18:05]W: [Step 10/10] @  0x1a1326e  testing::TestInfo::Run()
[20:18:05]W: [Step 10/10] @  0x1a138bf  testing::TestCase::Run()
[20:18:05]W: [Step 10/10] @  0x1a1a3fd  
testing::internal::UnitTestImpl::RunAllTests()
[20:18:05]W: [Step 10/10] @  0x1a37c85  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
[20:18:05]W: [Step 10/10] @  0x1a32bac  
testing::internal::HandleExceptionsInMethodIfSupported<>()
[20:18:05]W: [Step 10/10] @  0x1a190d9  testing::UnitTest::Run()
[20:18:05]W: [Step 10/10] @  0x1004b7f  RUN_ALL_TESTS()
[20:18:05]W: [Step 10/10] @  0x1004765  main
[20:18:05]W: [Step 10/10] @ 0x7f10f9aa4580  __libc_start_main
[20:18:05]W: [Step 10/10] @   0xa61339  _start
[20:18:06]W: [Step 10/10] 
/mnt/teamcity/temp/agentTmp/custom_script8081387914816808529: line 3: 28395 
Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose 
--gtest_filter="$GTEST_FILTER"
[20:18:06]W: [Step 10/10] Process exited with code 134
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5330) Agent should backoff before connecting to the master

2016-06-22 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5330:
--
Fix Version/s: 0.28.3
   0.27.4

> Agent should backoff before connecting to the master
> 
>
> Key: MESOS-5330
> URL: https://issues.apache.org/jira/browse/MESOS-5330
> Project: Mesos
>  Issue Type: Bug
>Reporter: David Robinson
>Assignee: David Robinson
> Fix For: 0.28.3, 1.0.0, 0.27.4
>
>
> When an agent is started it starts a background task (libprocess process?) to 
> detect the leading master. When the leading master is detected (or changes) 
> the [SocketManager's link() method is called and a TCP connection to the 
> master is 
> established|https://github.com/apache/mesos/blob/a138e2246a30c4b5c9bc3f7069ad12204dcaffbc/src/slave/slave.cpp#L954].
>  The agent _then_ backs off before sending a ReRegisterSlave message via the 
> newly established connection. The agent needs to backoff _before_ attempting 
> to establish a TCP connection to the master, not before sending the first 
> message over the connection.
> During scale tests at Twitter we discovered that agents can SYN flood the 
> master upon leader changes, then the problem described in MESOS-5200 can 
> occur where ephemeral connections are used, which exacerbates the problem. 
> The end result is a lot of hosts setting up and tearing down TCP connections 
> every slave_ping_timeout seconds (15 by default), connections failing to be 
> established, hosts being marked as unhealthy and being shutdown. We observed 
> ~800 passive TCP connections per second on the leading master during scale 
> tests.
> The problem can be somewhat mitigated by tuning the kernel to handle a 
> thundering herd of TCP connections, but ideally there would not be a 
> thundering herd to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5685) The /files/download endpoint's authorization can be compromised

2016-06-22 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-5685:
-
Summary: The /files/download endpoint's authorization can be compromised  
(was: The /files/download endpoint authorization can be compromised)

> The /files/download endpoint's authorization can be compromised
> ---
>
> Key: MESOS-5685
> URL: https://issues.apache.org/jira/browse/MESOS-5685
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2
>Reporter: Greg Mann
>  Labels: mesosphere
>
> If a forward slash is appended to the path of a file a user wishes to 
> download via {{/files/download}}, the authorization logic for that path will 
> be bypassed and the file will be downloaded regardless of permissions. This 
> is because we store the authorization callbacks for these paths in a map 
> which is keyed by the path name, so a request to {{/master/log/}} fails to 
> find the callback which is installed for {{/master/log}}. When the master 
> fails to find the callback, it assumes authorization is not required for that 
> path and authorizes the action.
> Consider the following excerpt:
> {code}
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar
> HTTP/1.1 403 Forbidden
> Content-Length: 0
> Date: Wed, 22 Jun 2016 21:28:53 GMT
> gmann@gmac:~/src/mesos/build⚡  http GET 
> http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar
> HTTP/1.1 200 OK
> Content-Disposition: attachment; 
> filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615
> Content-Length: 14432
> Content-Type: application/octet-stream
> Date: Wed, 22 Jun 2016 21:28:56 GMT
> Log file created at: 2016/06/22 14:28:43
> Running on machine: gmac
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started!
> I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' 
> allocator
> I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us
> I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us
> I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us
> I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db 
> in 9us
> I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in 
> the db in 8us
> I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery
> I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status
> I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' 
> authorizer
> I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master
> I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from (4)@127.0.0.1:5050
> I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response 
> from a replica in EMPTY status
> I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to 
> STARTING
> {code}
> We could consider disallowing paths which end in trailing slashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5685) The /files/download endpoint authorization can be compromised

2016-06-22 Thread Greg Mann (JIRA)
Greg Mann created MESOS-5685:


 Summary: The /files/download endpoint authorization can be 
compromised
 Key: MESOS-5685
 URL: https://issues.apache.org/jira/browse/MESOS-5685
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.28.2
Reporter: Greg Mann


If a forward slash is appended to the path of a file a user wishes to download 
via {{/files/download}}, the authorization logic for that path will be bypassed 
and the file will be downloaded regardless of permissions. This is because we 
store the authorization callbacks for these paths in a map which is keyed by 
the path name, so a request to {{/master/log/}} fails to find the callback 
which is installed for {{/master/log}}. When the master fails to find the 
callback, it assumes authorization is not required for that path and authorizes 
the action.

Consider the following excerpt:
{code}
gmann@gmac:~/src/mesos/build⚡  http GET 
http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar
HTTP/1.1 403 Forbidden
Content-Length: 0
Date: Wed, 22 Jun 2016 21:28:53 GMT

gmann@gmac:~/src/mesos/build⚡  http GET 
http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar
HTTP/1.1 200 OK
Content-Disposition: attachment; 
filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615
Content-Length: 14432
Content-Type: application/octet-stream
Date: Wed, 22 Jun 2016 21:28:56 GMT

Log file created at: 2016/06/22 14:28:43
Running on machine: gmac
Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started!
I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' allocator
I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us
I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us
I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us
I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db in 
9us
I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in 
the db in 8us
I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery
I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status
I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' 
authorizer
I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master
I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status 
received a broadcasted recover request from (4)@127.0.0.1:5050
I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response 
from a replica in EMPTY status
I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to 
STARTING
{code}

We could consider disallowing paths which end in trailing slashes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5642) Move include/mesos/v1/master/allocator.proto to its own directory and package

2016-06-22 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345006#comment-15345006
 ] 

Vinod Kone commented on MESOS-5642:
---

I don't know of any modules for allocator. But it is best to send an email to 
the dev list about this change and see if someone objects. cc [~karya]

Also currently modules have to be recompiled for every version of mesos, so it 
might not be that big of a problem. For example, we completely changed the 
authorizer interface between 0.28.0 and 1.0.  

> Move include/mesos/v1/master/allocator.proto to its own directory and package
> -
>
> Key: MESOS-5642
> URL: https://issues.apache.org/jira/browse/MESOS-5642
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Zhitao Li
>Assignee: Zhitao Li
> Fix For: 1.0.0
>
>
> Right now, all protobuf used in `include/mesos/v1` is in their own directory 
> and package except for allocator.
> We should do the same for it for both consistency, as well as protobuf 
> compiler friendliness (e.g. golang compiler doesn't work well when two .proto 
> files generates messages into same package).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5684) Master captures `this` when creating authorization callback

2016-06-22 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-5684:
--
Priority: Blocker  (was: Major)

> Master captures `this` when creating authorization callback
> ---
>
> Key: MESOS-5684
> URL: https://issues.apache.org/jira/browse/MESOS-5684
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> When exposing its log file, the master currently installs an authorization 
> callback for the log file which captures the master's {{this}} pointer. Such 
> captures have previously caused bugs (MESOS-5629), and this one should be 
> fixed as well. The callback should be dispatched to the master process, and 
> it should be dispatched via the {{self()}} PID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5642) Move include/mesos/v1/master/allocator.proto to its own directory and package

2016-06-22 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344924#comment-15344924
 ] 

Zhitao Li commented on MESOS-5642:
--

(Moving conversation from patch to here).

Based on conversations in r/48092, we also want to move non-versioned 
allocator.proto into its own directory. I tried to started a separate patch for 
it, but got stuck on how to handle the `mesos::master::allocator::Allocator` 
interface class in `include/mesos/master/allocator.hpp`. Ideally, I want to 
keep it in the new allocator directory too, but that would affect quite some 
places, and my most worry is that custom allocator module compilation would be 
broken because the base class is moved away.

I wonder whether a typedef alias is sufficient to maintain compile 
compatibility and give module maintainer some time to update. 

[~vinodkone], [~haosd...@gmail.com] and [~anandmazumdar], any thought on this?

> Move include/mesos/v1/master/allocator.proto to its own directory and package
> -
>
> Key: MESOS-5642
> URL: https://issues.apache.org/jira/browse/MESOS-5642
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Zhitao Li
>Assignee: Zhitao Li
> Fix For: 1.0.0
>
>
> Right now, all protobuf used in `include/mesos/v1` is in their own directory 
> and package except for allocator.
> We should do the same for it for both consistency, as well as protobuf 
> compiler friendliness (e.g. golang compiler doesn't work well when two .proto 
> files generates messages into same package).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5684) Master captures `this` when creating authorization callback

2016-06-22 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-5684:


Assignee: Greg Mann

> Master captures `this` when creating authorization callback
> ---
>
> Key: MESOS-5684
> URL: https://issues.apache.org/jira/browse/MESOS-5684
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> When exposing its log file, the master currently installs an authorization 
> callback for the log file which captures the master's {{this}} pointer. Such 
> captures have previously caused bugs (MESOS-5629), and this one should be 
> fixed as well. The callback should be dispatched to the master process, and 
> it should be dispatched via the {{self()}} PID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5684) Master captures `this` when creating authorization callback

2016-06-22 Thread Greg Mann (JIRA)
Greg Mann created MESOS-5684:


 Summary: Master captures `this` when creating authorization 
callback
 Key: MESOS-5684
 URL: https://issues.apache.org/jira/browse/MESOS-5684
 Project: Mesos
  Issue Type: Bug
Reporter: Greg Mann
 Fix For: 1.0.0


When exposing its log file, the master currently installs an authorization 
callback for the log file which captures the master's {{this}} pointer. Such 
captures have previously caused bugs (MESOS-5629), and this one should be fixed 
as well. The callback should be dispatched to the master process, and it should 
be dispatched via the {{self()}} PID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5673) Port mapping isolator may cause segfault if it bind mount root does not exist.

2016-06-22 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5673:
--
Fix Version/s: 1.0.0
   0.28.3

> Port mapping isolator may cause segfault if it bind mount root does not exist.
> --
>
> Key: MESOS-5673
> URL: https://issues.apache.org/jira/browse/MESOS-5673
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.28.2
> Environment: Fedora 23 with network isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere, networking, tests
> Fix For: 0.28.3, 1.0.0
>
>
> A check is needed for port mapping isolator for its bind mount root. 
> Otherwise, non-existed port-mapping bind mount root may cause segmentation 
> fault for some cases. Here is the test log:
> {noformat}
> [00:57:42] :   [Step 10/10] [--] 11 tests from PortMappingIsolatorTest
> [00:57:42] :   [Step 10/10] [ RUN  ] 
> PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.723029 24841 
> port_mapping_tests.cpp:229] Using eth0 as the public interface
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.723348 24841 
> port_mapping_tests.cpp:237] Using lo as the loopback interface
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.735090 24841 resources.cpp:572] 
> Parsing resources as JSON failed: 
> cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
> [00:57:42]W:   [Step 10/10] Trying semicolon-delimited string format instead
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.736006 24841 
> port_mapping.cpp:1557] Using eth0 as the public interface
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.736331 24841 
> port_mapping.cpp:1582] Using lo as the loopback interface
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737501 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737545 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737578 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384   4194304'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737608 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737637 24841 
> port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737666 24841 
> port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737694 24841 
> port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737720 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380   6291456'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737746 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737772 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737798 24841 
> port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737828 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737854 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737879 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737905 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15'
> [00:57:42]W:   [Step 10/10] F0604 00:57:42.737968 24841 
> port_mapping_tests.cpp:448] CHECK_SOME(isolator): Failed to get realpath for 
> bind mount root '/var/run/netns': Not found 
> [00:57:42]W:   [Step 10/10] *** Check failure stack trace: ***
> [00:57:42]W:   [Step 10/10] @ 0x7f8bd52583d2  
> google::LogMessage::Fail()
> [00:57:42]W:   [Step 10/10] @ 0x7f8bd525832b  
> google::LogMessage::SendToLog()
> [00:57:42]W:   [Step 10/10] @ 0x7f8bd5257d21  
> google::LogMessage::Flush()
> [00:57:42]W:   [Step 10/10] @ 0x7f8bd525ab92  
> google::LogMessageFatal::~LogMessageFatal()
> [00:57:42]W:   [Step 10/10] @   0xa62171  
> _CheckFatal::~_CheckFatal()
> [00:57:42]W:   [Step 10/10] @  0x1931b17  
> mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_ContainerToContainerTCP_Test::TestBody()
> [00:57:42]W:   [Step 10/10] @  0x19e17b6  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [00:57:42]W:   [Step 10/10] @  0x19dc8

[jira] [Updated] (MESOS-5673) Port mapping isolator may cause segfault if it bind mount root does not exist.

2016-06-22 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-5673:
--
Affects Version/s: 0.28.2

> Port mapping isolator may cause segfault if it bind mount root does not exist.
> --
>
> Key: MESOS-5673
> URL: https://issues.apache.org/jira/browse/MESOS-5673
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.28.2
> Environment: Fedora 23 with network isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere, networking, tests
> Fix For: 0.28.3, 1.0.0
>
>
> A check is needed for port mapping isolator for its bind mount root. 
> Otherwise, non-existed port-mapping bind mount root may cause segmentation 
> fault for some cases. Here is the test log:
> {noformat}
> [00:57:42] :   [Step 10/10] [--] 11 tests from PortMappingIsolatorTest
> [00:57:42] :   [Step 10/10] [ RUN  ] 
> PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.723029 24841 
> port_mapping_tests.cpp:229] Using eth0 as the public interface
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.723348 24841 
> port_mapping_tests.cpp:237] Using lo as the loopback interface
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.735090 24841 resources.cpp:572] 
> Parsing resources as JSON failed: 
> cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000]
> [00:57:42]W:   [Step 10/10] Trying semicolon-delimited string format instead
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.736006 24841 
> port_mapping.cpp:1557] Using eth0 as the public interface
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.736331 24841 
> port_mapping.cpp:1582] Using lo as the loopback interface
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737501 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737545 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737578 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384   4194304'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737608 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737637 24841 
> port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737666 24841 
> port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737694 24841 
> port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737720 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380   6291456'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737746 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737772 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737798 24841 
> port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737828 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737854 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737879 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512'
> [00:57:42]W:   [Step 10/10] I0604 00:57:42.737905 24841 
> port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15'
> [00:57:42]W:   [Step 10/10] F0604 00:57:42.737968 24841 
> port_mapping_tests.cpp:448] CHECK_SOME(isolator): Failed to get realpath for 
> bind mount root '/var/run/netns': Not found 
> [00:57:42]W:   [Step 10/10] *** Check failure stack trace: ***
> [00:57:42]W:   [Step 10/10] @ 0x7f8bd52583d2  
> google::LogMessage::Fail()
> [00:57:42]W:   [Step 10/10] @ 0x7f8bd525832b  
> google::LogMessage::SendToLog()
> [00:57:42]W:   [Step 10/10] @ 0x7f8bd5257d21  
> google::LogMessage::Flush()
> [00:57:42]W:   [Step 10/10] @ 0x7f8bd525ab92  
> google::LogMessageFatal::~LogMessageFatal()
> [00:57:42]W:   [Step 10/10] @   0xa62171  
> _CheckFatal::~_CheckFatal()
> [00:57:42]W:   [Step 10/10] @  0x1931b17  
> mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_ContainerToContainerTCP_Test::TestBody()
> [00:57:42]W:   [Step 10/10] @  0x19e17b6  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [00:57:42]W:   [Step 10/10] @  0x19dc864  
> testing::inter

[jira] [Commented] (MESOS-5565) Add logging when Offer::Operation::Launch message has no tasks.

2016-06-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344681#comment-15344681
 ] 

José Guilherme Vanz commented on MESOS-5565:


[~klaus1982] news? If you're working on it I'll find anything else to work. ;)

> Add logging when Offer::Operation::Launch message has no tasks.
> ---
>
> Key: MESOS-5565
> URL: https://issues.apache.org/jira/browse/MESOS-5565
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Klaus Ma
>Priority: Minor
>  Labels: newbie
>
> Currently, when a {{Offer::Accept::Launch}} message has no tasks specified, 
> Mesos would treat such requests as implicitly declining all offers. This can 
> be very counter-intuitive for framework developers since we do not have any 
> logging on the Master around this behavior. It would be good to add some 
> logging on the master to apprise the framework developers that all the offers 
> have been implicitly declined.
> {code}
> if (operation.type() == Offer::Operation::LAUNCH) {
>   if (operation.launch().task_infos().size() > 0) {
> ++metrics->messages_launch_tasks;
>   } else {
> ++metrics->messages_decline_offers;
>   }
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5683) Can't see the finished tasks when run the Java example framework

2016-06-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344540#comment-15344540
 ] 

Joseph Wu commented on MESOS-5683:
--

[~ZLuo], most of the example frameworks will run and exit relatively quickly.  
When a framework "completes" the associated tasks are moved to a separate 
section of the web UI: {{http://localhost:5050/#/frameworks}}

> Can't see the finished tasks when run the Java example framework
> 
>
> Key: MESOS-5683
> URL: https://issues.apache.org/jira/browse/MESOS-5683
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Zhigang Luo
>
> Following the steps in "Getting Started" and run example framework(Java), 
> then can't see the finished tasks from the mesos wed page 
> (http://127.0.0.1:5050).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5683) Can't see the finished tasks when run the Java example framework

2016-06-22 Thread Zhigang Luo (JIRA)
Zhigang Luo created MESOS-5683:
--

 Summary: Can't see the finished tasks when run the Java example 
framework
 Key: MESOS-5683
 URL: https://issues.apache.org/jira/browse/MESOS-5683
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Zhigang Luo


Following the steps in "Getting Started" and run example framework(Java), then 
can't see the finished tasks from the mesos wed page (http://127.0.0.1:5050).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2679) Slave asked to shut down by master because 'health check timed out'

2016-06-22 Thread vincenzo.lomba...@kydea.com (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344118#comment-15344118
 ] 

vincenzo.lomba...@kydea.com commented on MESOS-2679:


Just to let you know, I had the same problem and I solved changing the “—net” 
option in docker run from “host” to “bridge”. I think it depends on having a 
mess_slave running in the same physical server hosting the mess_master...


> Slave asked to shut down by master because 'health check timed out'
> ---
>
> Key: MESOS-2679
> URL: https://issues.apache.org/jira/browse/MESOS-2679
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Affects Versions: 0.22.1
>Reporter: Littlestar
>
> I run spark 1.3.1 on mesos 0.22.1 rc6 (linux64), some mesos slave node 
> offline.
> slave node logs:
> I0430 15:12:12.737057 32354 slave.cpp:571] Slave asked to shut down by 
> master@192.168.1.10:5050 because 'health check timed out'
> master node logs:
> I0430 15:12:00.615777 19759 master.cpp:237] Shutting down slave 
> 20150430-141442-1214949568-5050-19747-S2 due to health check timeout
> W0430 15:12:00.616083 19751 master.cpp:3417] Shutting down slave 
> 20150430-141442-1214949568-5050-19747-S2 at slave(1)@192.168.1.15:5051 
> (hpblade05) with message 'health check timed out'
> why master-slave offline and not restart itself? 
> Any configurations to increase this timeout interval?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4732) Migrate rest of the endpoints to use `jsonify`

2016-06-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344035#comment-15344035
 ] 

Jay Guo edited comment on MESOS-4732 at 6/22/16 9:56 AM:
-

[~neilconway] [~mcypark]
Is this migration is still active? I'm working on v1 operator API and I 
observed that some of existing endpoints are not transformed to use 
{{jsonify}}. I wonder whether it makes sense at all to rework those endpoints 
to use {{jsonify}}, since we are refactoring them anyway. The particular API 
I'm looking at right now is {{slave/containers}}


was (Author: guoger):
[~neilconway][~mcypark]
Is this migration is still active? I'm working on v1 operator API and I 
observed that some of existing endpoints are not transformed to use 
{{jsonify}}. I wonder whether it makes sense at all to rework those endpoints 
to use {{jsonify}}, since we are refactoring them anyway. The particular API 
I'm looking at right now is {{slave/containers}}

> Migrate rest of the endpoints to use `jsonify`
> --
>
> Key: MESOS-4732
> URL: https://issues.apache.org/jira/browse/MESOS-4732
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Neil Conway
>
> As MVP, we shipped `/state` and `/state-summary` to use `jsonify`. We need to 
> follow through with the migration of the rest of the endpoints to use 
> `jsonify` as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4732) Migrate rest of the endpoints to use `jsonify`

2016-06-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344035#comment-15344035
 ] 

Jay Guo commented on MESOS-4732:


[~neilconway][~mcypark]
Is this migration is still active? I'm working on v1 operator API and I 
observed that some of existing endpoints are not transformed to use 
{{jsonify}}. I wonder whether it makes sense at all to rework those endpoints 
to use {{jsonify}}, since we are refactoring them anyway. The particular API 
I'm looking at right now is {{slave/containers}}

> Migrate rest of the endpoints to use `jsonify`
> --
>
> Key: MESOS-4732
> URL: https://issues.apache.org/jira/browse/MESOS-4732
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Neil Conway
>
> As MVP, we shipped `/state` and `/state-summary` to use `jsonify`. We need to 
> follow through with the migration of the rest of the endpoints to use 
> `jsonify` as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5490) Implement GET_STATE_SUMMARY Call in v1 master API.

2016-06-22 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-5490:
--

Assignee: Jay Guo

> Implement GET_STATE_SUMMARY Call in v1 master API.
> --
>
> Key: MESOS-5490
> URL: https://issues.apache.org/jira/browse/MESOS-5490
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5310) Enable `network/cni` isolator to allow modifications and deletion of CNI config

2016-06-22 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343925#comment-15343925
 ] 

Qian Zhang commented on MESOS-5310:
---

The above patches are to introduce 'cni/config' endpoint in agent to support 
dynamically adding a new CNI network configuration. However, based on the 
discussion with Jie and Avinash, that is not in the scope of MVP, so we will 
hold on and may iterate on it later.

As MVP, we'd like to checkpoint CNI network configuration in container dir when 
launching the container, and also use it when detaching container, this will 
make the container life cycle consistent. Here is the review chain:
https://reviews.apache.org/r/49069/

> Enable `network/cni` isolator to allow modifications and deletion of CNI 
> config
> ---
>
> Key: MESOS-5310
> URL: https://issues.apache.org/jira/browse/MESOS-5310
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: linux
>Reporter: Avinash Sridharan
>Assignee: Qian Zhang
>  Labels: mesosphere
>
> Currently the `network/cni` isolator can only load the CNI configs at 
> startup. This makes the CNI networks immutable. From an operational 
> standpoint this can make deployments painful for operators. 
> To make CNI more flexible the `network/cni` isolator should be able to load 
> configs at run time. 
> The proposal is to add an endpoint to the `network/cni` isolator, to which 
> when the operator sends a PUT request the `network/cni` isolator will reload  
> CNI configs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5682) The /flags endpoints use authorization but there is a bypass to get their content

2016-06-22 Thread Alexander Rojas (JIRA)
Alexander Rojas created MESOS-5682:
--

 Summary: The /flags endpoints use authorization but there is a 
bypass to get their content
 Key: MESOS-5682
 URL: https://issues.apache.org/jira/browse/MESOS-5682
 Project: Mesos
  Issue Type: Bug
  Components: master, slave
Reporter: Alexander Rojas
Priority: Minor


The {{/flags}} endpoints use authorization in both, master and agent. However 
the contents of the flags are available without any need for authorization by 
accessing the {{/state}} endpoints on both, master and agents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)