[jira] [Commented] (MESOS-5491) Implement GET_AGENTS Call in v1 master API.
[ https://issues.apache.org/jira/browse/MESOS-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345919#comment-15345919 ] Jay Guo commented on MESOS-5491: [~vinodkone] In {{GET_STATE_SUMMARY}} API, we also need proto message {{Agent}} and {{Framework}}. I think it makes more sense to make these two message public and more inclusive. Fields required by {{GET_STATE_SUMMARY}} are: {code} message GetStateSummary { message Agent { optional string id = 1; optional string pid = 2; optional string hostname = 3; optional string registered_time = 4; optional string reregistered_time =5; repeated Resource resources = 6; repeated Resource used_resources = 7; repeated Resource reserved_resources = 8; repeated Resource unreserved_resources = 9; repeated Attribute attributes = 10; optional bool active = 11; optional string version = 12; } message Framework { optional string id = 1; optional string name = 2; optional string pid = 3; optional string hostname = 4; repeated Resource used_resources = 5; repeated Resource offered_resources = 6; repeated FrameworkInfo.Capability capabilities = 7; optional string webui_url = 8; optional bool active = 9; } optional string hostname = 1; optional string cluster = 2; repeated Agent agents = 3; repeated Framework frameworks = 4; } {code} > Implement GET_AGENTS Call in v1 master API. > --- > > Key: MESOS-5491 > URL: https://issues.apache.org/jira/browse/MESOS-5491 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: zhou xing > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5684) Master captures `this` when creating authorization callback
[ https://issues.apache.org/jira/browse/MESOS-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345842#comment-15345842 ] Greg Mann commented on MESOS-5684: -- Review here: https://reviews.apache.org/r/49132/ > Master captures `this` when creating authorization callback > --- > > Key: MESOS-5684 > URL: https://issues.apache.org/jira/browse/MESOS-5684 > Project: Mesos > Issue Type: Bug >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Blocker > Labels: mesosphere > Fix For: 1.0.0 > > > When exposing its log file, the master currently installs an authorization > callback for the log file which captures the master's {{this}} pointer. Such > captures have previously caused bugs (MESOS-5629), and this one should be > fixed as well. The callback should be dispatched to the master process, and > it should be dispatched via the {{self()}} PID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5684) Master captures `this` when creating authorization callback
[ https://issues.apache.org/jira/browse/MESOS-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5684: - Shepherd: Vinod Kone > Master captures `this` when creating authorization callback > --- > > Key: MESOS-5684 > URL: https://issues.apache.org/jira/browse/MESOS-5684 > Project: Mesos > Issue Type: Bug >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Blocker > Labels: mesosphere > Fix For: 1.0.0 > > > When exposing its log file, the master currently installs an authorization > callback for the log file which captures the master's {{this}} pointer. Such > captures have previously caused bugs (MESOS-5629), and this one should be > fixed as well. The callback should be dispatched to the master process, and > it should be dispatched via the {{self()}} PID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5685) The /files/download endpoint's authorization can be compromised
[ https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345719#comment-15345719 ] Greg Mann commented on MESOS-5685: -- Review here: https://reviews.apache.org/r/49131/ > The /files/download endpoint's authorization can be compromised > --- > > Key: MESOS-5685 > URL: https://issues.apache.org/jira/browse/MESOS-5685 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.2 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Blocker > Labels: mesosphere > > If a forward slash is appended to the path of a file a user wishes to > download via {{/files/download}}, the authorization logic for that path will > be bypassed and the file will be downloaded regardless of permissions. This > is because we store the authorization callbacks for these paths in a map > which is keyed by the path name, so a request to {{/master/log/}} fails to > find the callback which is installed for {{/master/log}}. When the master > fails to find the callback, it assumes authorization is not required for that > path and authorizes the action. > Consider the following excerpt: > {code} > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar > HTTP/1.1 403 Forbidden > Content-Length: 0 > Date: Wed, 22 Jun 2016 21:28:53 GMT > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar > HTTP/1.1 200 OK > Content-Disposition: attachment; > filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615 > Content-Length: 14432 > Content-Type: application/octet-stream > Date: Wed, 22 Jun 2016 21:28:56 GMT > Log file created at: 2016/06/22 14:28:43 > Running on machine: gmac > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started! > I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' > allocator > I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us > I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us > I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us > I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db > in 9us > I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in > the db in 8us > I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery > I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status > I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' > authorizer > I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master > I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (4)@127.0.0.1:5050 > I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response > from a replica in EMPTY status > I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to > STARTING > {code} > We could consider disallowing paths which end in trailing slashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5685) The /files/download endpoint's authorization can be compromised
[ https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5685: - Shepherd: Vinod Kone > The /files/download endpoint's authorization can be compromised > --- > > Key: MESOS-5685 > URL: https://issues.apache.org/jira/browse/MESOS-5685 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.2 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Blocker > Labels: mesosphere > > If a forward slash is appended to the path of a file a user wishes to > download via {{/files/download}}, the authorization logic for that path will > be bypassed and the file will be downloaded regardless of permissions. This > is because we store the authorization callbacks for these paths in a map > which is keyed by the path name, so a request to {{/master/log/}} fails to > find the callback which is installed for {{/master/log}}. When the master > fails to find the callback, it assumes authorization is not required for that > path and authorizes the action. > Consider the following excerpt: > {code} > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar > HTTP/1.1 403 Forbidden > Content-Length: 0 > Date: Wed, 22 Jun 2016 21:28:53 GMT > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar > HTTP/1.1 200 OK > Content-Disposition: attachment; > filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615 > Content-Length: 14432 > Content-Type: application/octet-stream > Date: Wed, 22 Jun 2016 21:28:56 GMT > Log file created at: 2016/06/22 14:28:43 > Running on machine: gmac > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started! > I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' > allocator > I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us > I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us > I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us > I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db > in 9us > I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in > the db in 8us > I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery > I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status > I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' > authorizer > I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master > I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (4)@127.0.0.1:5050 > I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response > from a replica in EMPTY status > I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to > STARTING > {code} > We could consider disallowing paths which end in trailing slashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5685) The /files/download endpoint's authorization can be compromised
[ https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-5685: Assignee: Greg Mann > The /files/download endpoint's authorization can be compromised > --- > > Key: MESOS-5685 > URL: https://issues.apache.org/jira/browse/MESOS-5685 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.2 >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > > If a forward slash is appended to the path of a file a user wishes to > download via {{/files/download}}, the authorization logic for that path will > be bypassed and the file will be downloaded regardless of permissions. This > is because we store the authorization callbacks for these paths in a map > which is keyed by the path name, so a request to {{/master/log/}} fails to > find the callback which is installed for {{/master/log}}. When the master > fails to find the callback, it assumes authorization is not required for that > path and authorizes the action. > Consider the following excerpt: > {code} > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar > HTTP/1.1 403 Forbidden > Content-Length: 0 > Date: Wed, 22 Jun 2016 21:28:53 GMT > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar > HTTP/1.1 200 OK > Content-Disposition: attachment; > filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615 > Content-Length: 14432 > Content-Type: application/octet-stream > Date: Wed, 22 Jun 2016 21:28:56 GMT > Log file created at: 2016/06/22 14:28:43 > Running on machine: gmac > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started! > I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' > allocator > I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us > I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us > I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us > I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db > in 9us > I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in > the db in 8us > I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery > I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status > I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' > authorizer > I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master > I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (4)@127.0.0.1:5050 > I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response > from a replica in EMPTY status > I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to > STARTING > {code} > We could consider disallowing paths which end in trailing slashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5685) The /files/download endpoint's authorization can be compromised
[ https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5685: - Priority: Blocker (was: Major) > The /files/download endpoint's authorization can be compromised > --- > > Key: MESOS-5685 > URL: https://issues.apache.org/jira/browse/MESOS-5685 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.2 >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Blocker > Labels: mesosphere > > If a forward slash is appended to the path of a file a user wishes to > download via {{/files/download}}, the authorization logic for that path will > be bypassed and the file will be downloaded regardless of permissions. This > is because we store the authorization callbacks for these paths in a map > which is keyed by the path name, so a request to {{/master/log/}} fails to > find the callback which is installed for {{/master/log}}. When the master > fails to find the callback, it assumes authorization is not required for that > path and authorizes the action. > Consider the following excerpt: > {code} > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar > HTTP/1.1 403 Forbidden > Content-Length: 0 > Date: Wed, 22 Jun 2016 21:28:53 GMT > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar > HTTP/1.1 200 OK > Content-Disposition: attachment; > filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615 > Content-Length: 14432 > Content-Type: application/octet-stream > Date: Wed, 22 Jun 2016 21:28:56 GMT > Log file created at: 2016/06/22 14:28:43 > Running on machine: gmac > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started! > I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' > allocator > I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us > I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us > I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us > I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db > in 9us > I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in > the db in 8us > I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery > I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status > I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' > authorizer > I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master > I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (4)@127.0.0.1:5050 > I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response > from a replica in EMPTY status > I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to > STARTING > {code} > We could consider disallowing paths which end in trailing slashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5692) Add helper function "begin_with/end_with" to strings
[ https://issues.apache.org/jira/browse/MESOS-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345680#comment-15345680 ] Gilbert Song commented on MESOS-5692: - Any special case that make a diff from using `strings::startsWith(s, "c")` and `strings::endsWith(s, "c")`? > Add helper function "begin_with/end_with" to strings > > > Key: MESOS-5692 > URL: https://issues.apache.org/jira/browse/MESOS-5692 > Project: Mesos > Issue Type: Bug > Components: stout >Reporter: Klaus Ma > > Add helper function to check whether a string is start/end with special char. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5693) slave delay to forword status update
zhangfuxing created MESOS-5693: -- Summary: slave delay to forword status update Key: MESOS-5693 URL: https://issues.apache.org/jira/browse/MESOS-5693 Project: Mesos Issue Type: Improvement Components: slave Affects Versions: 0.22.1 Environment: debian7 Reporter: zhangfuxing we observe that mesos slave delay to forward task status update to master, I0615 14:59:10.997902 3890 slave.cpp:2531] Handling status update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001 from executor(1)@10.0.40.189:54304 I0615 14:59:11.001126 3895 status_update_manager.cpp:317] Received status update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001 I0615 14:59:11.001174 3895 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001 I0615 14:59:11.037376 3894 slave.cpp:2709] Sending acknowledgement for status update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001 to executor(1)@10.0.40.189:54304 I0615 15:54:21.352087 3888 slave.cpp:2776] Forwarding the update TASK_KILLED (UUID: 17e9c12f-5241-4aca-81fa-67d6830990b0) for task xxx.64554b80 of framework 20150629-151659-3355508746-5060-6173-0001 to master@10.0.1.200:5060 for this example, the task xxx.64554b80 has been killed at 14:59 but the status didn't forward to master until 15:54 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5691) SSL downgrade support will leak sockets in CLOSE_WAIT status
[ https://issues.apache.org/jira/browse/MESOS-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Wu updated MESOS-5691: - Affects Version/s: 0.25.0 0.26.0 0.27.0 0.28.0 > SSL downgrade support will leak sockets in CLOSE_WAIT status > > > Key: MESOS-5691 > URL: https://issues.apache.org/jira/browse/MESOS-5691 > Project: Mesos > Issue Type: Bug > Components: libprocess >Affects Versions: 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0 >Reporter: Joseph Wu >Assignee: Joseph Wu >Priority: Blocker > Labels: libprocess, mesosphere > Fix For: 1.0.0 > > > Repro steps: > 1) Start a master: > {code} > bin/mesos-master.sh --work_dir=/tmp/master > {code} > 2) Start an agent with SSL and downgrade enabled: > {code} > # Taken from http://mesos.apache.org/documentation/latest/ssl/ > openssl genrsa -des3 -f4 -passout pass:some_password -out key.pem 4096 > openssl req -new -x509 -passin pass:some_password -days 365 -key key.pem -out > cert.pem > SSL_KEY_FILE=key.pem SSL_CERT_FILE=cert.pem SSL_ENABLED=true > SSL_SUPPORT_DOWNGRADE=true sudo -E bin/mesos-agent.sh --master=localhost:5050 > --work_dir=/tmp/agent > {code} > 3) Start a framework that launches lots of executors, one after another: > {code} > sudo src/balloon-framework --master=localhost:5050 --task_memory=64mb > --task_memory_usage_limit=256mb --long_running > {code} > 4) Check FDs, repeatedly > {code} > sudo lsof -i | grep mesos | grep CLOSE_WAIT | wc -l > {code} > The number of sockets in {{CLOSE_WAIT}} will increase linearly with the > number of launched executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5692) Add helper function "begin_with/end_with" to strings
Klaus Ma created MESOS-5692: --- Summary: Add helper function "begin_with/end_with" to strings Key: MESOS-5692 URL: https://issues.apache.org/jira/browse/MESOS-5692 Project: Mesos Issue Type: Bug Components: stout Reporter: Klaus Ma Add helper function to check whether a string is start/end with special char. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5691) SSL downgrade support will leak sockets in CLOSE_WAIT status
Joseph Wu created MESOS-5691: Summary: SSL downgrade support will leak sockets in CLOSE_WAIT status Key: MESOS-5691 URL: https://issues.apache.org/jira/browse/MESOS-5691 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.24.0 Reporter: Joseph Wu Assignee: Joseph Wu Priority: Blocker Fix For: 1.0.0 Repro steps: 1) Start a master: {code} bin/mesos-master.sh --work_dir=/tmp/master {code} 2) Start an agent with SSL and downgrade enabled: {code} # Taken from http://mesos.apache.org/documentation/latest/ssl/ openssl genrsa -des3 -f4 -passout pass:some_password -out key.pem 4096 openssl req -new -x509 -passin pass:some_password -days 365 -key key.pem -out cert.pem SSL_KEY_FILE=key.pem SSL_CERT_FILE=cert.pem SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=true sudo -E bin/mesos-agent.sh --master=localhost:5050 --work_dir=/tmp/agent {code} 3) Start a framework that launches lots of executors, one after another: {code} sudo src/balloon-framework --master=localhost:5050 --task_memory=64mb --task_memory_usage_limit=256mb --long_running {code} 4) Check FDs, repeatedly {code} sudo lsof -i | grep mesos | grep CLOSE_WAIT | wc -l {code} The number of sockets in {{CLOSE_WAIT}} will increase linearly with the number of launched executors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5565) Add logging when Offer::Operation::Launch message has no tasks.
[ https://issues.apache.org/jira/browse/MESOS-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345514#comment-15345514 ] Klaus Ma commented on MESOS-5565: - sure, please help on this :). Thanks very much :). > Add logging when Offer::Operation::Launch message has no tasks. > --- > > Key: MESOS-5565 > URL: https://issues.apache.org/jira/browse/MESOS-5565 > Project: Mesos > Issue Type: Improvement >Reporter: Anand Mazumdar >Priority: Minor > Labels: newbie > > Currently, when a {{Offer::Accept::Launch}} message has no tasks specified, > Mesos would treat such requests as implicitly declining all offers. This can > be very counter-intuitive for framework developers since we do not have any > logging on the Master around this behavior. It would be good to add some > logging on the master to apprise the framework developers that all the offers > have been implicitly declined. > {code} > if (operation.type() == Offer::Operation::LAUNCH) { > if (operation.launch().task_infos().size() > 0) { > ++metrics->messages_launch_tasks; > } else { > ++metrics->messages_decline_offers; > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5565) Add logging when Offer::Operation::Launch message has no tasks.
[ https://issues.apache.org/jira/browse/MESOS-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Klaus Ma updated MESOS-5565: Assignee: (was: Klaus Ma) > Add logging when Offer::Operation::Launch message has no tasks. > --- > > Key: MESOS-5565 > URL: https://issues.apache.org/jira/browse/MESOS-5565 > Project: Mesos > Issue Type: Improvement >Reporter: Anand Mazumdar >Priority: Minor > Labels: newbie > > Currently, when a {{Offer::Accept::Launch}} message has no tasks specified, > Mesos would treat such requests as implicitly declining all offers. This can > be very counter-intuitive for framework developers since we do not have any > logging on the Master around this behavior. It would be good to add some > logging on the master to apprise the framework developers that all the offers > have been implicitly declined. > {code} > if (operation.type() == Offer::Operation::LAUNCH) { > if (operation.launch().task_infos().size() > 0) { > ++metrics->messages_launch_tasks; > } else { > ++metrics->messages_decline_offers; > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5676) A full redesign of the Mesos CLI
[ https://issues.apache.org/jira/browse/MESOS-5676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Klues updated MESOS-5676: --- Description: The current Mesos CLI is in serious need of a upgrade. It was created a long time ago and hasn't kept pace with the rest of the code base. In fact, many of the current Mesos CLI commands do not work out of the box. For example, using any of the python CLI scripts such as “mesos-cat” will result in an error because the proper Mesos library is not in not in the $PYTHONPATH by default. This Epic proposes a redesign of the Mesos CLI including (but not limited to) the following: * A complete rewrite of the CLI, from the ground up, with a more pluggable architecture, better help information, and bash-autocompletion * A full test suite for the CLI that is closely tied with the Mesos unit tests so CLI commands will be updated as Mesos features change * A new set of container-related commands in the vein of "docker exec", "docker ps", "docker top", "docker logs", etc. * Both a local and a remote component so you can debug locally using the CLI or gather information / launch cluster wide commands from the same CLI Design doc: https://docs.google.com/document/d/1r6Iv4Efu8v8IBrcUTjgYkvZ32WVscgYqrD07OyIglsA/edit?ts=57573bba# was: The current Mesos CLI is in serious need of a upgrade. It was created a long time ago and hasn't kept pace with the rest of the code base. In fact, many of the current Mesos CLI commands do not work out of the box. For example, using any of the python CLI scripts such as “mesos-cat” will result in an error because the proper Mesos library is not in not in the $PYTHONPATH by default. This Epic proposes a redesign of the Mesos CLI including (but not limited to) the following: * A complete rewrite of the CLI, from the ground up, with a more pluggable architecture, better help information, and bash-autocompletion * A full test suite for the CLI that is closely tied with the Mesos unit tests so CLI commands will be updated as Mesos features change * A new set of container-related commands in the vein of "docker exec", "docker ps", "docker top", "docker logs", etc. * Both a local and a remote component so you can debug locally using the CLI or gather information / launch cluster wide commands from the same CLI A full design doc will be posted soon. > A full redesign of the Mesos CLI > > > Key: MESOS-5676 > URL: https://issues.apache.org/jira/browse/MESOS-5676 > Project: Mesos > Issue Type: Epic >Reporter: Kevin Klues >Assignee: Kevin Klues > Labels: CLI > > The current Mesos CLI is in serious need of a upgrade. It was created a long > time ago and hasn't kept pace with the rest of the code base. In fact, many > of the current Mesos CLI commands do not work out of the box. For example, > using any of the python CLI scripts such as “mesos-cat” will result in an > error because the proper Mesos library is not in not in the $PYTHONPATH by > default. > This Epic proposes a redesign of the Mesos CLI including (but not limited to) > the following: > * A complete rewrite of the CLI, from the ground up, with a more pluggable > architecture, better help information, and bash-autocompletion > * A full test suite for the CLI that is closely tied with the Mesos unit > tests so CLI commands will be updated as Mesos features change > * A new set of container-related commands in the vein of "docker exec", > "docker ps", "docker top", "docker logs", etc. > * Both a local and a remote component so you can debug locally using the > CLI or gather information / launch cluster wide commands from the same CLI > Design doc: > https://docs.google.com/document/d/1r6Iv4Efu8v8IBrcUTjgYkvZ32WVscgYqrD07OyIglsA/edit?ts=57573bba# -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5653) Creating a persistent volume through the operator endpoints fail and doesn't produce meaningful logs.
[ https://issues.apache.org/jira/browse/MESOS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345293#comment-15345293 ] Greg Mann commented on MESOS-5653: -- Closing this ticket as Won't Fix since we have MESOS-5664 to track the bug, and [~ctm3] and I had some luck troubleshooting this on IRC. > Creating a persistent volume through the operator endpoints fail and doesn't > produce meaningful logs. > - > > Key: MESOS-5653 > URL: https://issues.apache.org/jira/browse/MESOS-5653 > Project: Mesos > Issue Type: Bug > Components: master, volumes >Affects Versions: 0.28.2 > Environment: Centos 7 - 3.10.0-327.13.1.el7.x86_64, Mesos 0.28.2 >Reporter: cliff >Assignee: Greg Mann > Labels: persistent-volumes > > When attempting to create a persistent volume via the /create-volumes > operator endpoint. I get a HTTP 200 from the master and in the logs on the > master I see: > {noformat} > http.cpp:312] HTTP POST for /master/create-volumes from "172.16.10.11:40686 > with User-Agent='curl/7.29.0' " > {noformat} > then next line I see on the master is: > {noformat} > "master.cpp:6560] Sending checkpointed resources to slave > 0ef7d2e1-8b0d-44d4-8db0-cc58ac2058af-S0 at slave(1)@172.16.10.4:5051" > {noformat} > Now if I look in the logs on the slave that was specified in the request to > create a persistent volume I see: > then on the slave I see: > {noformat} > "1572 slave.cpp:2327] Updated checkpointed resources from to " > {noformat} > Notice that from destination and a to destination are both missing > specifically, they should be the valueos of: > checkpointedResources and newCheckpointedResources, from here: > https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L2582 > I am currently running only one slave for troubleshooting purposes, the > resource file on the slave with the disk resource looks like the following: > #resources=file:///etc/default/mesos.resources.json > {noformat} > [ >{ > "name": "disk", > "type": "SCALAR", > "scalar": { > "value": 5 > } > }, >{ > "name":"disk", > "type":"SCALAR", > "scalar":{ > "value":100 > }, > "role":"testing", > "disk":{ > "source":{ > "type":"MOUNT", > "mount":{ >"root":"/data" > } > } > } >}, >{ > "name":"cpus", > "type":"SCALAR", > "scalar":{ > "value":16 > }, > "role":"testing" >}, >{ > "name":"mem", > "type":"SCALAR", > "scalar":{ > "value":128000 > }, > "role":"testing" >}, >{ > "name":"ports", > "type":"RANGES", > "ranges":{ > "range":[ > { >"begin":31000, >"end":32000 > } > ] > }, > "role":"testing" >} > ] > {noformat} > When I {{curl master:5050/slaves | jq '.'}} and look under the key > {{reserved_resources_full}}, I see the above resources on that slave. > Here is my request to via the operator endpoint {{/create-resources}}, I am > trying to create a persistent volume on the disk of type MOUNT above, which > is in {{/proc/mounts}} as {{/data}}: > {noformat} > curl -i -d slaveId=0ee7d2e7-8b0d-44d4-8d80-cc58ac2058ae-S4 \ > -d volumes='[ > { > "name": "testvol", > "type": "SCALAR", > "scalar": { "value": 1 }, > "role": "testing", > "disk": { > "source": { >"type" : "MOUNT", > "path" : { "root" : "/data" } > }, > "persistence": { >"id" : "cliff" > }, > "volume": { >"mode": "RW", >"container_path": "/data" > } > } > } > ]' -X POST http://master:5050/master/create-volumes > {noformat} > > {noformat} > HTTP/1.1 200 OK > Date: Sun, 19 Jun 2016 04:38:45 GMT > {noformat} > If look at the slave specified with slaveID above via: > {noformat} > curl - http://slave1:5051/state > {noformat} > I will not see the volume created. Also here are no errors in the INFO logs > on either the master or slave relating to this request. The only log entries > are those that I have provided. > The same problem/behavior seems to exist when trying creating persistent > volumes on dynamically reserved resources as well. > My steps were: > systemctl stop meso-slave > cd /var/mesos > rm -rf meta > systemctl start mesos-slave > then I issued the following to the /reserve operator endpoint: > {noformat} > curl -i \ >
[jira] [Created] (MESOS-5690) `PortMappingIsolatorTest.ROOT_NC_HostToContainerUDP` fails on Fedora 23.
Gilbert Song created MESOS-5690: --- Summary: `PortMappingIsolatorTest.ROOT_NC_HostToContainerUDP` fails on Fedora 23. Key: MESOS-5690 URL: https://issues.apache.org/jira/browse/MESOS-5690 Project: Mesos Issue Type: Bug Components: isolation, network Environment: Fedora 23 with network isolation Reporter: Gilbert Song {noformat} [20:17:53] : [Step 10/10] [ RUN ] PortMappingIsolatorTest.ROOT_NC_HostToContainerUDP [20:17:53]W: [Step 10/10] I0622 20:17:53.323252 28395 port_mapping_tests.cpp:229] Using eth0 as the public interface [20:17:53]W: [Step 10/10] I0622 20:17:53.323557 28395 port_mapping_tests.cpp:237] Using lo as the loopback interface [20:17:53]W: [Step 10/10] I0622 20:17:53.337299 28395 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] [20:17:53]W: [Step 10/10] Trying semicolon-delimited string format instead [20:17:53]W: [Step 10/10] I0622 20:17:53.338345 28395 port_mapping.cpp:1557] Using eth0 as the public interface [20:17:53]W: [Step 10/10] I0622 20:17:53.338675 28395 port_mapping.cpp:1582] Using lo as the loopback interface [20:17:53]W: [Step 10/10] I0622 20:17:53.339855 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' [20:17:53]W: [Step 10/10] I0622 20:17:53.339901 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' [20:17:53]W: [Step 10/10] I0622 20:17:53.339938 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384 4194304' [20:17:53]W: [Step 10/10] I0622 20:17:53.339972 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5' [20:17:53]W: [Step 10/10] I0622 20:17:53.340010 28395 port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992' [20:17:53]W: [Step 10/10] I0622 20:17:53.340044 28395 port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128' [20:17:53]W: [Step 10/10] I0622 20:17:53.340073 28395 port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992' [20:17:53]W: [Step 10/10] I0622 20:17:53.340103 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380 6291456' [20:17:53]W: [Step 10/10] I0622 20:17:53.340136 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' [20:17:53]W: [Step 10/10] I0622 20:17:53.340165 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' [20:17:53]W: [Step 10/10] I0622 20:17:53.340196 28395 port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000' [20:17:53]W: [Step 10/10] I0622 20:17:53.340229 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' [20:17:53]W: [Step 10/10] I0622 20:17:53.340260 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' [20:17:53]W: [Step 10/10] I0622 20:17:53.340289 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' [20:17:53]W: [Step 10/10] I0622 20:17:53.340327 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15' [20:17:53]W: [Step 10/10] I0622 20:17:53.349139 28395 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [20:17:53]W: [Step 10/10] I0622 20:17:53.349345 28395 resources.cpp:572] Parsing resources as JSON failed: ports:[31000-31499] [20:17:53]W: [Step 10/10] Trying semicolon-delimited string format instead [20:17:53]W: [Step 10/10] I0622 20:17:53.349858 28409 port_mapping.cpp:2512] Using non-ephemeral ports {[31000,31500)} and ephemeral ports [30016,30032) for container container1 of executor '' [20:17:53]W: [Step 10/10] I0622 20:17:53.350821 28395 linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | CLONE_NEWNET [20:17:53]W: [Step 10/10] I0622 20:17:53.393595 28416 port_mapping.cpp:2576] Bind mounted '/proc/15478/ns/net' to '/run/netns/15478' for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.393739 28416 port_mapping.cpp:2607] Created network namespace handle symlink '/var/run/mesos/netns/container1' -> '/run/netns/15478' [20:17:53]W: [Step 10/10] I0622 20:17:53.395191 28416 port_mapping.cpp:2667] Adding IP packet filters with ports [30016,30031] for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.397608 28416 port_mapping.cpp:2667] Adding IP packet filters with ports [31000,31007] for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.402170 28416 port_mapping.cpp:2667] Adding IP packet filters with ports [31008,31039] for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.405225 28416 port_mapping.cpp:2667] Adding IP packet filters with ports [31040,31103] for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.408541 28416 port_map
[jira] [Created] (MESOS-5689) `PortMappingIsolatorTest.ROOT_ContainerICMPExternal` fails on Fedora 23.
Gilbert Song created MESOS-5689: --- Summary: `PortMappingIsolatorTest.ROOT_ContainerICMPExternal` fails on Fedora 23. Key: MESOS-5689 URL: https://issues.apache.org/jira/browse/MESOS-5689 Project: Mesos Issue Type: Bug Components: isolation, network Environment: Fedora 23 with network isolation Reporter: Gilbert Song Here is the log: {noformat} [20:17:53] : [Step 10/10] [ RUN ] PortMappingIsolatorTest.ROOT_ContainerICMPExternal [20:17:53]W: [Step 10/10] I0622 20:17:53.890225 28395 port_mapping_tests.cpp:229] Using eth0 as the public interface [20:17:53]W: [Step 10/10] I0622 20:17:53.890532 28395 port_mapping_tests.cpp:237] Using lo as the loopback interface [20:17:53]W: [Step 10/10] I0622 20:17:53.904742 28395 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] [20:17:53]W: [Step 10/10] Trying semicolon-delimited string format instead [20:17:53]W: [Step 10/10] I0622 20:17:53.905855 28395 port_mapping.cpp:1557] Using eth0 as the public interface [20:17:53]W: [Step 10/10] I0622 20:17:53.906159 28395 port_mapping.cpp:1582] Using lo as the loopback interface [20:17:53]W: [Step 10/10] I0622 20:17:53.907315 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' [20:17:53]W: [Step 10/10] I0622 20:17:53.907362 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' [20:17:53]W: [Step 10/10] I0622 20:17:53.907418 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384 4194304' [20:17:53]W: [Step 10/10] I0622 20:17:53.907454 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5' [20:17:53]W: [Step 10/10] I0622 20:17:53.907491 28395 port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992' [20:17:53]W: [Step 10/10] I0622 20:17:53.907524 28395 port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128' [20:17:53]W: [Step 10/10] I0622 20:17:53.907557 28395 port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992' [20:17:53]W: [Step 10/10] I0622 20:17:53.907588 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380 6291456' [20:17:53]W: [Step 10/10] I0622 20:17:53.907618 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' [20:17:53]W: [Step 10/10] I0622 20:17:53.907649 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' [20:17:53]W: [Step 10/10] I0622 20:17:53.907680 28395 port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000' [20:17:53]W: [Step 10/10] I0622 20:17:53.907711 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' [20:17:53]W: [Step 10/10] I0622 20:17:53.907742 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' [20:17:53]W: [Step 10/10] I0622 20:17:53.907773 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' [20:17:53]W: [Step 10/10] I0622 20:17:53.907802 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15' [20:17:53]W: [Step 10/10] I0622 20:17:53.916348 28395 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [20:17:53]W: [Step 10/10] I0622 20:17:53.916575 28395 resources.cpp:572] Parsing resources as JSON failed: ports:[31000-31499] [20:17:53]W: [Step 10/10] Trying semicolon-delimited string format instead [20:17:53]W: [Step 10/10] I0622 20:17:53.917032 28412 port_mapping.cpp:2512] Using non-ephemeral ports {[31000,31500)} and ephemeral ports [30016,30032) for container container1 of executor '' [20:17:53]W: [Step 10/10] I0622 20:17:53.918092 28395 linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | CLONE_NEWNET [20:17:53]W: [Step 10/10] I0622 20:17:53.951756 28410 port_mapping.cpp:2576] Bind mounted '/proc/15611/ns/net' to '/run/netns/15611' for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.951918 28410 port_mapping.cpp:2607] Created network namespace handle symlink '/var/run/mesos/netns/container1' -> '/run/netns/15611' [20:17:53]W: [Step 10/10] I0622 20:17:53.952893 28410 port_mapping.cpp:2667] Adding IP packet filters with ports [30016,30031] for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.956142 28410 port_mapping.cpp:2667] Adding IP packet filters with ports [31000,31007] for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.961453 28410 port_mapping.cpp:2667] Adding IP packet filters with ports [31008,31039] for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.965399 28410 port_mapping.cpp:2667] Adding IP packet filters with ports [31040,31103] for container container1 [20:17:53]W: [Step 10/10] I0622 20:17:53.96956
[jira] [Created] (MESOS-5688) `PortMappingIsolatorTest.ROOT_DNS` fails on Fedora 23.
Gilbert Song created MESOS-5688: --- Summary: `PortMappingIsolatorTest.ROOT_DNS` fails on Fedora 23. Key: MESOS-5688 URL: https://issues.apache.org/jira/browse/MESOS-5688 Project: Mesos Issue Type: Bug Components: isolation, network Environment: Fedora 23 with network isolation Reporter: Gilbert Song Here is the log: {noformat} [20:18:04] : [Step 10/10] [ RUN ] PortMappingIsolatorTest.ROOT_DNS [20:18:04]W: [Step 10/10] I0622 20:18:04.877822 28395 port_mapping_tests.cpp:229] Using eth0 as the public interface [20:18:04]W: [Step 10/10] I0622 20:18:04.878106 28395 port_mapping_tests.cpp:237] Using lo as the loopback interface [20:18:04]W: [Step 10/10] I0622 20:18:04.891363 28395 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] [20:18:04]W: [Step 10/10] Trying semicolon-delimited string format instead [20:18:04]W: [Step 10/10] I0622 20:18:04.892331 28395 port_mapping.cpp:1557] Using eth0 as the public interface [20:18:04]W: [Step 10/10] I0622 20:18:04.892638 28395 port_mapping.cpp:1582] Using lo as the loopback interface [20:18:04]W: [Step 10/10] I0622 20:18:04.893723 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' [20:18:04]W: [Step 10/10] I0622 20:18:04.893770 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' [20:18:04]W: [Step 10/10] I0622 20:18:04.893806 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384 4194304' [20:18:04]W: [Step 10/10] I0622 20:18:04.893838 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5' [20:18:04]W: [Step 10/10] I0622 20:18:04.893875 28395 port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992' [20:18:04]W: [Step 10/10] I0622 20:18:04.893908 28395 port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128' [20:18:04]W: [Step 10/10] I0622 20:18:04.893937 28395 port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992' [20:18:04]W: [Step 10/10] I0622 20:18:04.893968 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380 6291456' [20:18:04]W: [Step 10/10] I0622 20:18:04.893999 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' [20:18:04]W: [Step 10/10] I0622 20:18:04.894029 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' [20:18:04]W: [Step 10/10] I0622 20:18:04.894060 28395 port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000' [20:18:04]W: [Step 10/10] I0622 20:18:04.894093 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' [20:18:04]W: [Step 10/10] I0622 20:18:04.894124 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' [20:18:04]W: [Step 10/10] I0622 20:18:04.894153 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' [20:18:04]W: [Step 10/10] I0622 20:18:04.894186 28395 port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15' [20:18:04]W: [Step 10/10] I0622 20:18:04.902745 28395 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher [20:18:04]W: [Step 10/10] I0622 20:18:04.902940 28395 resources.cpp:572] Parsing resources as JSON failed: ports:[31000-31499] [20:18:04]W: [Step 10/10] Trying semicolon-delimited string format instead [20:18:04]W: [Step 10/10] I0622 20:18:04.903404 28412 port_mapping.cpp:2512] Using non-ephemeral ports {[31000,31500)} and ephemeral ports [30016,30032) for container container1 of executor '' [20:18:04]W: [Step 10/10] I0622 20:18:04.904423 28395 linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | CLONE_NEWNET [20:18:04]W: [Step 10/10] I0622 20:18:04.977530 28416 port_mapping.cpp:2576] Bind mounted '/proc/15781/ns/net' to '/run/netns/15781' for container container1 [20:18:04]W: [Step 10/10] I0622 20:18:04.977715 28416 port_mapping.cpp:2607] Created network namespace handle symlink '/var/run/mesos/netns/container1' -> '/run/netns/15781' [20:18:04]W: [Step 10/10] I0622 20:18:04.978752 28416 port_mapping.cpp:2667] Adding IP packet filters with ports [30016,30031] for container container1 [20:18:04]W: [Step 10/10] I0622 20:18:04.981956 28416 port_mapping.cpp:2667] Adding IP packet filters with ports [31000,31007] for container container1 [20:18:04]W: [Step 10/10] I0622 20:18:04.985674 28416 port_mapping.cpp:2667] Adding IP packet filters with ports [31008,31039] for container container1 [20:18:04]W: [Step 10/10] I0622 20:18:04.989276 28416 port_mapping.cpp:2667] Adding IP packet filters with ports [31040,31103] for container container1 [20:18:04]W: [Step 10/10] I0622 20:18:04.993069 28416 port_mapping.cpp:2667] Adding
[jira] [Updated] (MESOS-5687) Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified.
[ https://issues.apache.org/jira/browse/MESOS-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song updated MESOS-5687: Priority: Major (was: Critical) > Port mapping isolator may cause segfault if the agent flag > `egress_rate_limit_per_container` is specified. > -- > > Key: MESOS-5687 > URL: https://issues.apache.org/jira/browse/MESOS-5687 > Project: Mesos > Issue Type: Bug > Components: isolation, network >Affects Versions: 0.27.3, 0.28.2 > Environment: Fedora 23 with network isolatrion >Reporter: Gilbert Song > Labels: isolation, mesosphere, networking > > The port mapping isolator may get into segfault if the agent flag > `egress_rate_limit_per_container` is specified and > `/sys/class/net/eth0/speed` is not readable. > This can be exposed in this test: > {noformat} > PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit > {noformat} > Here is the log: > {noformat} > [20:18:05] : [Step 10/10] [ RUN ] > PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit > [20:18:05]W: [Step 10/10] I0622 20:18:05.375366 28395 > port_mapping_tests.cpp:229] Using eth0 as the public interface > [20:18:05]W: [Step 10/10] I0622 20:18:05.375664 28395 > port_mapping_tests.cpp:237] Using lo as the loopback interface > [20:18:05]W: [Step 10/10] I0622 20:18:05.33 28395 resources.cpp:572] > Parsing resources as JSON failed: > cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] > [20:18:05]W: [Step 10/10] Trying semicolon-delimited string format instead > [20:18:05]W: [Step 10/10] I0622 20:18:05.389879 28395 > port_mapping.cpp:1557] Using eth0 as the public interface > [20:18:05]W: [Step 10/10] I0622 20:18:05.390173 28395 > port_mapping.cpp:1582] Using lo as the loopback interface > [20:18:05]W: [Step 10/10] F0622 20:18:05.390365 28395 > port_mapping_tests.cpp:1496] CHECK_SOME(isolator): Failed to read > /sys/class/net/eth0/speed: Invalid argument > [20:18:05]W: [Step 10/10] *** Check failure stack trace: *** > [20:18:05]W: [Step 10/10] @ 0x7f11003bdd1a > google::LogMessage::Fail() > [20:18:05]W: [Step 10/10] @ 0x7f11003bdc73 > google::LogMessage::SendToLog() > [20:18:05]W: [Step 10/10] @ 0x7f11003bd669 > google::LogMessage::Flush() > [20:18:05]W: [Step 10/10] @ 0x7f11003c04da > google::LogMessageFatal::~LogMessageFatal() > [20:18:05]W: [Step 10/10] @ 0xa62ce1 > _CheckFatal::~_CheckFatal() > [20:18:05]W: [Step 10/10] @ 0x199a13d > mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_SmallEgressLimit_Test::TestBody() > [20:18:05]W: [Step 10/10] @ 0x1a36fbe > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [20:18:05]W: [Step 10/10] @ 0x1a3206c > testing::internal::HandleExceptionsInMethodIfSupported<>() > [20:18:05]W: [Step 10/10] @ 0x1a12ab6 testing::Test::Run() > [20:18:05]W: [Step 10/10] @ 0x1a1326e testing::TestInfo::Run() > [20:18:05]W: [Step 10/10] @ 0x1a138bf testing::TestCase::Run() > [20:18:05]W: [Step 10/10] @ 0x1a1a3fd > testing::internal::UnitTestImpl::RunAllTests() > [20:18:05]W: [Step 10/10] @ 0x1a37c85 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [20:18:05]W: [Step 10/10] @ 0x1a32bac > testing::internal::HandleExceptionsInMethodIfSupported<>() > [20:18:05]W: [Step 10/10] @ 0x1a190d9 testing::UnitTest::Run() > [20:18:05]W: [Step 10/10] @ 0x1004b7f RUN_ALL_TESTS() > [20:18:05]W: [Step 10/10] @ 0x1004765 main > [20:18:05]W: [Step 10/10] @ 0x7f10f9aa4580 __libc_start_main > [20:18:05]W: [Step 10/10] @ 0xa61339 _start > [20:18:06]W: [Step 10/10] > /mnt/teamcity/temp/agentTmp/custom_script8081387914816808529: line 3: 28395 > Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose > --gtest_filter="$GTEST_FILTER" > [20:18:06]W: [Step 10/10] Process exited with code 134 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5687) Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified.
[ https://issues.apache.org/jira/browse/MESOS-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-5687: -- Affects Version/s: 0.27.3 0.28.2 > Port mapping isolator may cause segfault if the agent flag > `egress_rate_limit_per_container` is specified. > -- > > Key: MESOS-5687 > URL: https://issues.apache.org/jira/browse/MESOS-5687 > Project: Mesos > Issue Type: Bug > Components: isolation, network >Affects Versions: 0.27.3, 0.28.2 > Environment: Fedora 23 with network isolatrion >Reporter: Gilbert Song >Priority: Critical > Labels: isolation, mesosphere, networking > > The port mapping isolator may get into segfault if the agent flag > `egress_rate_limit_per_container` is specified and > `/sys/class/net/eth0/speed` is not readable. > This can be exposed in this test: > {noformat} > PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit > {noformat} > Here is the log: > {noformat} > [20:18:05] : [Step 10/10] [ RUN ] > PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit > [20:18:05]W: [Step 10/10] I0622 20:18:05.375366 28395 > port_mapping_tests.cpp:229] Using eth0 as the public interface > [20:18:05]W: [Step 10/10] I0622 20:18:05.375664 28395 > port_mapping_tests.cpp:237] Using lo as the loopback interface > [20:18:05]W: [Step 10/10] I0622 20:18:05.33 28395 resources.cpp:572] > Parsing resources as JSON failed: > cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] > [20:18:05]W: [Step 10/10] Trying semicolon-delimited string format instead > [20:18:05]W: [Step 10/10] I0622 20:18:05.389879 28395 > port_mapping.cpp:1557] Using eth0 as the public interface > [20:18:05]W: [Step 10/10] I0622 20:18:05.390173 28395 > port_mapping.cpp:1582] Using lo as the loopback interface > [20:18:05]W: [Step 10/10] F0622 20:18:05.390365 28395 > port_mapping_tests.cpp:1496] CHECK_SOME(isolator): Failed to read > /sys/class/net/eth0/speed: Invalid argument > [20:18:05]W: [Step 10/10] *** Check failure stack trace: *** > [20:18:05]W: [Step 10/10] @ 0x7f11003bdd1a > google::LogMessage::Fail() > [20:18:05]W: [Step 10/10] @ 0x7f11003bdc73 > google::LogMessage::SendToLog() > [20:18:05]W: [Step 10/10] @ 0x7f11003bd669 > google::LogMessage::Flush() > [20:18:05]W: [Step 10/10] @ 0x7f11003c04da > google::LogMessageFatal::~LogMessageFatal() > [20:18:05]W: [Step 10/10] @ 0xa62ce1 > _CheckFatal::~_CheckFatal() > [20:18:05]W: [Step 10/10] @ 0x199a13d > mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_SmallEgressLimit_Test::TestBody() > [20:18:05]W: [Step 10/10] @ 0x1a36fbe > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [20:18:05]W: [Step 10/10] @ 0x1a3206c > testing::internal::HandleExceptionsInMethodIfSupported<>() > [20:18:05]W: [Step 10/10] @ 0x1a12ab6 testing::Test::Run() > [20:18:05]W: [Step 10/10] @ 0x1a1326e testing::TestInfo::Run() > [20:18:05]W: [Step 10/10] @ 0x1a138bf testing::TestCase::Run() > [20:18:05]W: [Step 10/10] @ 0x1a1a3fd > testing::internal::UnitTestImpl::RunAllTests() > [20:18:05]W: [Step 10/10] @ 0x1a37c85 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [20:18:05]W: [Step 10/10] @ 0x1a32bac > testing::internal::HandleExceptionsInMethodIfSupported<>() > [20:18:05]W: [Step 10/10] @ 0x1a190d9 testing::UnitTest::Run() > [20:18:05]W: [Step 10/10] @ 0x1004b7f RUN_ALL_TESTS() > [20:18:05]W: [Step 10/10] @ 0x1004765 main > [20:18:05]W: [Step 10/10] @ 0x7f10f9aa4580 __libc_start_main > [20:18:05]W: [Step 10/10] @ 0xa61339 _start > [20:18:06]W: [Step 10/10] > /mnt/teamcity/temp/agentTmp/custom_script8081387914816808529: line 3: 28395 > Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose > --gtest_filter="$GTEST_FILTER" > [20:18:06]W: [Step 10/10] Process exited with code 134 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5686) Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified.
Gilbert Song created MESOS-5686: --- Summary: Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified. Key: MESOS-5686 URL: https://issues.apache.org/jira/browse/MESOS-5686 Project: Mesos Issue Type: Bug Components: isolation, network Environment: Fedora 23 with network isolatrion Reporter: Gilbert Song Priority: Critical The port mapping isolator may get into segfault if the agent flag `egress_rate_limit_per_container` is specified and `/sys/class/net/eth0/speed` is not readable. This can be exposed in this test: {noformat} PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit {noformat} Here is the log: {noformat} [20:18:05] : [Step 10/10] [ RUN ] PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit [20:18:05]W: [Step 10/10] I0622 20:18:05.375366 28395 port_mapping_tests.cpp:229] Using eth0 as the public interface [20:18:05]W: [Step 10/10] I0622 20:18:05.375664 28395 port_mapping_tests.cpp:237] Using lo as the loopback interface [20:18:05]W: [Step 10/10] I0622 20:18:05.33 28395 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] [20:18:05]W: [Step 10/10] Trying semicolon-delimited string format instead [20:18:05]W: [Step 10/10] I0622 20:18:05.389879 28395 port_mapping.cpp:1557] Using eth0 as the public interface [20:18:05]W: [Step 10/10] I0622 20:18:05.390173 28395 port_mapping.cpp:1582] Using lo as the loopback interface [20:18:05]W: [Step 10/10] F0622 20:18:05.390365 28395 port_mapping_tests.cpp:1496] CHECK_SOME(isolator): Failed to read /sys/class/net/eth0/speed: Invalid argument [20:18:05]W: [Step 10/10] *** Check failure stack trace: *** [20:18:05]W: [Step 10/10] @ 0x7f11003bdd1a google::LogMessage::Fail() [20:18:05]W: [Step 10/10] @ 0x7f11003bdc73 google::LogMessage::SendToLog() [20:18:05]W: [Step 10/10] @ 0x7f11003bd669 google::LogMessage::Flush() [20:18:05]W: [Step 10/10] @ 0x7f11003c04da google::LogMessageFatal::~LogMessageFatal() [20:18:05]W: [Step 10/10] @ 0xa62ce1 _CheckFatal::~_CheckFatal() [20:18:05]W: [Step 10/10] @ 0x199a13d mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_SmallEgressLimit_Test::TestBody() [20:18:05]W: [Step 10/10] @ 0x1a36fbe testing::internal::HandleSehExceptionsInMethodIfSupported<>() [20:18:05]W: [Step 10/10] @ 0x1a3206c testing::internal::HandleExceptionsInMethodIfSupported<>() [20:18:05]W: [Step 10/10] @ 0x1a12ab6 testing::Test::Run() [20:18:05]W: [Step 10/10] @ 0x1a1326e testing::TestInfo::Run() [20:18:05]W: [Step 10/10] @ 0x1a138bf testing::TestCase::Run() [20:18:05]W: [Step 10/10] @ 0x1a1a3fd testing::internal::UnitTestImpl::RunAllTests() [20:18:05]W: [Step 10/10] @ 0x1a37c85 testing::internal::HandleSehExceptionsInMethodIfSupported<>() [20:18:05]W: [Step 10/10] @ 0x1a32bac testing::internal::HandleExceptionsInMethodIfSupported<>() [20:18:05]W: [Step 10/10] @ 0x1a190d9 testing::UnitTest::Run() [20:18:05]W: [Step 10/10] @ 0x1004b7f RUN_ALL_TESTS() [20:18:05]W: [Step 10/10] @ 0x1004765 main [20:18:05]W: [Step 10/10] @ 0x7f10f9aa4580 __libc_start_main [20:18:05]W: [Step 10/10] @ 0xa61339 _start [20:18:06]W: [Step 10/10] /mnt/teamcity/temp/agentTmp/custom_script8081387914816808529: line 3: 28395 Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="$GTEST_FILTER" [20:18:06]W: [Step 10/10] Process exited with code 134 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5687) Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified.
Gilbert Song created MESOS-5687: --- Summary: Port mapping isolator may cause segfault if the agent flag `egress_rate_limit_per_container` is specified. Key: MESOS-5687 URL: https://issues.apache.org/jira/browse/MESOS-5687 Project: Mesos Issue Type: Bug Components: isolation, network Environment: Fedora 23 with network isolatrion Reporter: Gilbert Song Priority: Critical The port mapping isolator may get into segfault if the agent flag `egress_rate_limit_per_container` is specified and `/sys/class/net/eth0/speed` is not readable. This can be exposed in this test: {noformat} PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit {noformat} Here is the log: {noformat} [20:18:05] : [Step 10/10] [ RUN ] PortMappingIsolatorTest.ROOT_NC_SmallEgressLimit [20:18:05]W: [Step 10/10] I0622 20:18:05.375366 28395 port_mapping_tests.cpp:229] Using eth0 as the public interface [20:18:05]W: [Step 10/10] I0622 20:18:05.375664 28395 port_mapping_tests.cpp:237] Using lo as the loopback interface [20:18:05]W: [Step 10/10] I0622 20:18:05.33 28395 resources.cpp:572] Parsing resources as JSON failed: cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] [20:18:05]W: [Step 10/10] Trying semicolon-delimited string format instead [20:18:05]W: [Step 10/10] I0622 20:18:05.389879 28395 port_mapping.cpp:1557] Using eth0 as the public interface [20:18:05]W: [Step 10/10] I0622 20:18:05.390173 28395 port_mapping.cpp:1582] Using lo as the loopback interface [20:18:05]W: [Step 10/10] F0622 20:18:05.390365 28395 port_mapping_tests.cpp:1496] CHECK_SOME(isolator): Failed to read /sys/class/net/eth0/speed: Invalid argument [20:18:05]W: [Step 10/10] *** Check failure stack trace: *** [20:18:05]W: [Step 10/10] @ 0x7f11003bdd1a google::LogMessage::Fail() [20:18:05]W: [Step 10/10] @ 0x7f11003bdc73 google::LogMessage::SendToLog() [20:18:05]W: [Step 10/10] @ 0x7f11003bd669 google::LogMessage::Flush() [20:18:05]W: [Step 10/10] @ 0x7f11003c04da google::LogMessageFatal::~LogMessageFatal() [20:18:05]W: [Step 10/10] @ 0xa62ce1 _CheckFatal::~_CheckFatal() [20:18:05]W: [Step 10/10] @ 0x199a13d mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_SmallEgressLimit_Test::TestBody() [20:18:05]W: [Step 10/10] @ 0x1a36fbe testing::internal::HandleSehExceptionsInMethodIfSupported<>() [20:18:05]W: [Step 10/10] @ 0x1a3206c testing::internal::HandleExceptionsInMethodIfSupported<>() [20:18:05]W: [Step 10/10] @ 0x1a12ab6 testing::Test::Run() [20:18:05]W: [Step 10/10] @ 0x1a1326e testing::TestInfo::Run() [20:18:05]W: [Step 10/10] @ 0x1a138bf testing::TestCase::Run() [20:18:05]W: [Step 10/10] @ 0x1a1a3fd testing::internal::UnitTestImpl::RunAllTests() [20:18:05]W: [Step 10/10] @ 0x1a37c85 testing::internal::HandleSehExceptionsInMethodIfSupported<>() [20:18:05]W: [Step 10/10] @ 0x1a32bac testing::internal::HandleExceptionsInMethodIfSupported<>() [20:18:05]W: [Step 10/10] @ 0x1a190d9 testing::UnitTest::Run() [20:18:05]W: [Step 10/10] @ 0x1004b7f RUN_ALL_TESTS() [20:18:05]W: [Step 10/10] @ 0x1004765 main [20:18:05]W: [Step 10/10] @ 0x7f10f9aa4580 __libc_start_main [20:18:05]W: [Step 10/10] @ 0xa61339 _start [20:18:06]W: [Step 10/10] /mnt/teamcity/temp/agentTmp/custom_script8081387914816808529: line 3: 28395 Aborted (core dumped) GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="$GTEST_FILTER" [20:18:06]W: [Step 10/10] Process exited with code 134 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5330) Agent should backoff before connecting to the master
[ https://issues.apache.org/jira/browse/MESOS-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-5330: -- Fix Version/s: 0.28.3 0.27.4 > Agent should backoff before connecting to the master > > > Key: MESOS-5330 > URL: https://issues.apache.org/jira/browse/MESOS-5330 > Project: Mesos > Issue Type: Bug >Reporter: David Robinson >Assignee: David Robinson > Fix For: 0.28.3, 1.0.0, 0.27.4 > > > When an agent is started it starts a background task (libprocess process?) to > detect the leading master. When the leading master is detected (or changes) > the [SocketManager's link() method is called and a TCP connection to the > master is > established|https://github.com/apache/mesos/blob/a138e2246a30c4b5c9bc3f7069ad12204dcaffbc/src/slave/slave.cpp#L954]. > The agent _then_ backs off before sending a ReRegisterSlave message via the > newly established connection. The agent needs to backoff _before_ attempting > to establish a TCP connection to the master, not before sending the first > message over the connection. > During scale tests at Twitter we discovered that agents can SYN flood the > master upon leader changes, then the problem described in MESOS-5200 can > occur where ephemeral connections are used, which exacerbates the problem. > The end result is a lot of hosts setting up and tearing down TCP connections > every slave_ping_timeout seconds (15 by default), connections failing to be > established, hosts being marked as unhealthy and being shutdown. We observed > ~800 passive TCP connections per second on the leading master during scale > tests. > The problem can be somewhat mitigated by tuning the kernel to handle a > thundering herd of TCP connections, but ideally there would not be a > thundering herd to begin with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5685) The /files/download endpoint's authorization can be compromised
[ https://issues.apache.org/jira/browse/MESOS-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann updated MESOS-5685: - Summary: The /files/download endpoint's authorization can be compromised (was: The /files/download endpoint authorization can be compromised) > The /files/download endpoint's authorization can be compromised > --- > > Key: MESOS-5685 > URL: https://issues.apache.org/jira/browse/MESOS-5685 > Project: Mesos > Issue Type: Bug >Affects Versions: 0.28.2 >Reporter: Greg Mann > Labels: mesosphere > > If a forward slash is appended to the path of a file a user wishes to > download via {{/files/download}}, the authorization logic for that path will > be bypassed and the file will be downloaded regardless of permissions. This > is because we store the authorization callbacks for these paths in a map > which is keyed by the path name, so a request to {{/master/log/}} fails to > find the callback which is installed for {{/master/log}}. When the master > fails to find the callback, it assumes authorization is not required for that > path and authorizes the action. > Consider the following excerpt: > {code} > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar > HTTP/1.1 403 Forbidden > Content-Length: 0 > Date: Wed, 22 Jun 2016 21:28:53 GMT > gmann@gmac:~/src/mesos/build⚡ http GET > http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar > HTTP/1.1 200 OK > Content-Disposition: attachment; > filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615 > Content-Length: 14432 > Content-Type: application/octet-stream > Date: Wed, 22 Jun 2016 21:28:56 GMT > Log file created at: 2016/06/22 14:28:43 > Running on machine: gmac > Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg > I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started! > I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' > allocator > I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us > I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us > I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us > I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db > in 9us > I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in > the db in 8us > I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log > positions 0 -> 0 with 1 holes and 0 unlearned > I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery > I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status > I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' > authorizer > I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master > I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (4)@127.0.0.1:5050 > I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response > from a replica in EMPTY status > I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to > STARTING > {code} > We could consider disallowing paths which end in trailing slashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5685) The /files/download endpoint authorization can be compromised
Greg Mann created MESOS-5685: Summary: The /files/download endpoint authorization can be compromised Key: MESOS-5685 URL: https://issues.apache.org/jira/browse/MESOS-5685 Project: Mesos Issue Type: Bug Affects Versions: 0.28.2 Reporter: Greg Mann If a forward slash is appended to the path of a file a user wishes to download via {{/files/download}}, the authorization logic for that path will be bypassed and the file will be downloaded regardless of permissions. This is because we store the authorization callbacks for these paths in a map which is keyed by the path name, so a request to {{/master/log/}} fails to find the callback which is installed for {{/master/log}}. When the master fails to find the callback, it assumes authorization is not required for that path and authorizes the action. Consider the following excerpt: {code} gmann@gmac:~/src/mesos/build⚡ http GET http://127.0.0.1:5050/files/download\?path\=/master/log -a foo:bar HTTP/1.1 403 Forbidden Content-Length: 0 Date: Wed, 22 Jun 2016 21:28:53 GMT gmann@gmac:~/src/mesos/build⚡ http GET http://127.0.0.1:5050/files/download\?path\=/master/log/ -a foo:bar HTTP/1.1 200 OK Content-Disposition: attachment; filename=mesos-master.gmac.gmann.log.INFO.20160622-142843.65615 Content-Length: 14432 Content-Type: application/octet-stream Date: Wed, 22 Jun 2016 21:28:56 GMT Log file created at: 2016/06/22 14:28:43 Running on machine: gmac Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg I0622 14:28:43.476925 2080764672 logging.cpp:194] INFO level logging started! I0622 14:28:43.477522 2080764672 main.cpp:367] Using 'HierarchicalDRF' allocator I0622 14:28:43.480650 2080764672 leveldb.cpp:174] Opened db in 2961us I0622 14:28:43.481046 2080764672 leveldb.cpp:181] Compacted db in 372us I0622 14:28:43.481078 2080764672 leveldb.cpp:196] Created db iterator in 13us I0622 14:28:43.481096 2080764672 leveldb.cpp:202] Seeked to beginning of db in 9us I0622 14:28:43.48 2080764672 leveldb.cpp:271] Iterated through 0 keys in the db in 8us I0622 14:28:43.481165 2080764672 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned I0622 14:28:43.481967 219914240 recover.cpp:451] Starting replica recovery I0622 14:28:43.482193 219914240 recover.cpp:477] Replica is in EMPTY status I0622 14:28:43.482589 2080764672 main.cpp:488] Creating default 'local' authorizer I0622 14:28:43.482719 2080764672 main.cpp:545] Starting Mesos master I0622 14:28:43.483085 218841088 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (4)@127.0.0.1:5050 I0622 14:28:43.487284 218304512 recover.cpp:197] Received a recover response from a replica in EMPTY status I0622 14:28:43.487694 219914240 recover.cpp:568] Updating replica status to STARTING {code} We could consider disallowing paths which end in trailing slashes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5642) Move include/mesos/v1/master/allocator.proto to its own directory and package
[ https://issues.apache.org/jira/browse/MESOS-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345006#comment-15345006 ] Vinod Kone commented on MESOS-5642: --- I don't know of any modules for allocator. But it is best to send an email to the dev list about this change and see if someone objects. cc [~karya] Also currently modules have to be recompiled for every version of mesos, so it might not be that big of a problem. For example, we completely changed the authorizer interface between 0.28.0 and 1.0. > Move include/mesos/v1/master/allocator.proto to its own directory and package > - > > Key: MESOS-5642 > URL: https://issues.apache.org/jira/browse/MESOS-5642 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Zhitao Li >Assignee: Zhitao Li > Fix For: 1.0.0 > > > Right now, all protobuf used in `include/mesos/v1` is in their own directory > and package except for allocator. > We should do the same for it for both consistency, as well as protobuf > compiler friendliness (e.g. golang compiler doesn't work well when two .proto > files generates messages into same package). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5684) Master captures `this` when creating authorization callback
[ https://issues.apache.org/jira/browse/MESOS-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Mazumdar updated MESOS-5684: -- Priority: Blocker (was: Major) > Master captures `this` when creating authorization callback > --- > > Key: MESOS-5684 > URL: https://issues.apache.org/jira/browse/MESOS-5684 > Project: Mesos > Issue Type: Bug >Reporter: Greg Mann >Assignee: Greg Mann >Priority: Blocker > Labels: mesosphere > Fix For: 1.0.0 > > > When exposing its log file, the master currently installs an authorization > callback for the log file which captures the master's {{this}} pointer. Such > captures have previously caused bugs (MESOS-5629), and this one should be > fixed as well. The callback should be dispatched to the master process, and > it should be dispatched via the {{self()}} PID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5642) Move include/mesos/v1/master/allocator.proto to its own directory and package
[ https://issues.apache.org/jira/browse/MESOS-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344924#comment-15344924 ] Zhitao Li commented on MESOS-5642: -- (Moving conversation from patch to here). Based on conversations in r/48092, we also want to move non-versioned allocator.proto into its own directory. I tried to started a separate patch for it, but got stuck on how to handle the `mesos::master::allocator::Allocator` interface class in `include/mesos/master/allocator.hpp`. Ideally, I want to keep it in the new allocator directory too, but that would affect quite some places, and my most worry is that custom allocator module compilation would be broken because the base class is moved away. I wonder whether a typedef alias is sufficient to maintain compile compatibility and give module maintainer some time to update. [~vinodkone], [~haosd...@gmail.com] and [~anandmazumdar], any thought on this? > Move include/mesos/v1/master/allocator.proto to its own directory and package > - > > Key: MESOS-5642 > URL: https://issues.apache.org/jira/browse/MESOS-5642 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Reporter: Zhitao Li >Assignee: Zhitao Li > Fix For: 1.0.0 > > > Right now, all protobuf used in `include/mesos/v1` is in their own directory > and package except for allocator. > We should do the same for it for both consistency, as well as protobuf > compiler friendliness (e.g. golang compiler doesn't work well when two .proto > files generates messages into same package). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5684) Master captures `this` when creating authorization callback
[ https://issues.apache.org/jira/browse/MESOS-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Mann reassigned MESOS-5684: Assignee: Greg Mann > Master captures `this` when creating authorization callback > --- > > Key: MESOS-5684 > URL: https://issues.apache.org/jira/browse/MESOS-5684 > Project: Mesos > Issue Type: Bug >Reporter: Greg Mann >Assignee: Greg Mann > Labels: mesosphere > Fix For: 1.0.0 > > > When exposing its log file, the master currently installs an authorization > callback for the log file which captures the master's {{this}} pointer. Such > captures have previously caused bugs (MESOS-5629), and this one should be > fixed as well. The callback should be dispatched to the master process, and > it should be dispatched via the {{self()}} PID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5684) Master captures `this` when creating authorization callback
Greg Mann created MESOS-5684: Summary: Master captures `this` when creating authorization callback Key: MESOS-5684 URL: https://issues.apache.org/jira/browse/MESOS-5684 Project: Mesos Issue Type: Bug Reporter: Greg Mann Fix For: 1.0.0 When exposing its log file, the master currently installs an authorization callback for the log file which captures the master's {{this}} pointer. Such captures have previously caused bugs (MESOS-5629), and this one should be fixed as well. The callback should be dispatched to the master process, and it should be dispatched via the {{self()}} PID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-5673) Port mapping isolator may cause segfault if it bind mount root does not exist.
[ https://issues.apache.org/jira/browse/MESOS-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-5673: -- Fix Version/s: 1.0.0 0.28.3 > Port mapping isolator may cause segfault if it bind mount root does not exist. > -- > > Key: MESOS-5673 > URL: https://issues.apache.org/jira/browse/MESOS-5673 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 0.28.2 > Environment: Fedora 23 with network isolation >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: isolator, mesosphere, networking, tests > Fix For: 0.28.3, 1.0.0 > > > A check is needed for port mapping isolator for its bind mount root. > Otherwise, non-existed port-mapping bind mount root may cause segmentation > fault for some cases. Here is the test log: > {noformat} > [00:57:42] : [Step 10/10] [--] 11 tests from PortMappingIsolatorTest > [00:57:42] : [Step 10/10] [ RUN ] > PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP > [00:57:42]W: [Step 10/10] I0604 00:57:42.723029 24841 > port_mapping_tests.cpp:229] Using eth0 as the public interface > [00:57:42]W: [Step 10/10] I0604 00:57:42.723348 24841 > port_mapping_tests.cpp:237] Using lo as the loopback interface > [00:57:42]W: [Step 10/10] I0604 00:57:42.735090 24841 resources.cpp:572] > Parsing resources as JSON failed: > cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] > [00:57:42]W: [Step 10/10] Trying semicolon-delimited string format instead > [00:57:42]W: [Step 10/10] I0604 00:57:42.736006 24841 > port_mapping.cpp:1557] Using eth0 as the public interface > [00:57:42]W: [Step 10/10] I0604 00:57:42.736331 24841 > port_mapping.cpp:1582] Using lo as the loopback interface > [00:57:42]W: [Step 10/10] I0604 00:57:42.737501 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737545 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737578 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384 4194304' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737608 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737637 24841 > port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737666 24841 > port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737694 24841 > port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737720 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380 6291456' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737746 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737772 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737798 24841 > port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737828 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737854 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737879 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737905 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15' > [00:57:42]W: [Step 10/10] F0604 00:57:42.737968 24841 > port_mapping_tests.cpp:448] CHECK_SOME(isolator): Failed to get realpath for > bind mount root '/var/run/netns': Not found > [00:57:42]W: [Step 10/10] *** Check failure stack trace: *** > [00:57:42]W: [Step 10/10] @ 0x7f8bd52583d2 > google::LogMessage::Fail() > [00:57:42]W: [Step 10/10] @ 0x7f8bd525832b > google::LogMessage::SendToLog() > [00:57:42]W: [Step 10/10] @ 0x7f8bd5257d21 > google::LogMessage::Flush() > [00:57:42]W: [Step 10/10] @ 0x7f8bd525ab92 > google::LogMessageFatal::~LogMessageFatal() > [00:57:42]W: [Step 10/10] @ 0xa62171 > _CheckFatal::~_CheckFatal() > [00:57:42]W: [Step 10/10] @ 0x1931b17 > mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_ContainerToContainerTCP_Test::TestBody() > [00:57:42]W: [Step 10/10] @ 0x19e17b6 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [00:57:42]W: [Step 10/10] @ 0x19dc8
[jira] [Updated] (MESOS-5673) Port mapping isolator may cause segfault if it bind mount root does not exist.
[ https://issues.apache.org/jira/browse/MESOS-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu updated MESOS-5673: -- Affects Version/s: 0.28.2 > Port mapping isolator may cause segfault if it bind mount root does not exist. > -- > > Key: MESOS-5673 > URL: https://issues.apache.org/jira/browse/MESOS-5673 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 0.28.2 > Environment: Fedora 23 with network isolation >Reporter: Gilbert Song >Assignee: Gilbert Song > Labels: isolator, mesosphere, networking, tests > Fix For: 0.28.3, 1.0.0 > > > A check is needed for port mapping isolator for its bind mount root. > Otherwise, non-existed port-mapping bind mount root may cause segmentation > fault for some cases. Here is the test log: > {noformat} > [00:57:42] : [Step 10/10] [--] 11 tests from PortMappingIsolatorTest > [00:57:42] : [Step 10/10] [ RUN ] > PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP > [00:57:42]W: [Step 10/10] I0604 00:57:42.723029 24841 > port_mapping_tests.cpp:229] Using eth0 as the public interface > [00:57:42]W: [Step 10/10] I0604 00:57:42.723348 24841 > port_mapping_tests.cpp:237] Using lo as the loopback interface > [00:57:42]W: [Step 10/10] I0604 00:57:42.735090 24841 resources.cpp:572] > Parsing resources as JSON failed: > cpus:2;mem:1024;disk:1024;ephemeral_ports:[30001-30999];ports:[31000-32000] > [00:57:42]W: [Step 10/10] Trying semicolon-delimited string format instead > [00:57:42]W: [Step 10/10] I0604 00:57:42.736006 24841 > port_mapping.cpp:1557] Using eth0 as the public interface > [00:57:42]W: [Step 10/10] I0604 00:57:42.736331 24841 > port_mapping.cpp:1582] Using lo as the loopback interface > [00:57:42]W: [Step 10/10] I0604 00:57:42.737501 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh3 = '1024' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737545 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh1 = '128' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737578 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_wmem = '409616384 4194304' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737608 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_synack_retries = '5' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737637 24841 > port_mapping.cpp:1869] /proc/sys/net/core/rmem_max = '212992' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737666 24841 > port_mapping.cpp:1869] /proc/sys/net/core/somaxconn = '128' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737694 24841 > port_mapping.cpp:1869] /proc/sys/net/core/wmem_max = '212992' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737720 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_rmem = '409687380 6291456' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737746 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_time = '7200' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737772 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/neigh/default/gc_thresh2 = '512' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737798 24841 > port_mapping.cpp:1869] /proc/sys/net/core/netdev_max_backlog = '1000' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737828 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_intvl = '75' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737854 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_keepalive_probes = '9' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737879 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_max_syn_backlog = '512' > [00:57:42]W: [Step 10/10] I0604 00:57:42.737905 24841 > port_mapping.cpp:1869] /proc/sys/net/ipv4/tcp_retries2 = '15' > [00:57:42]W: [Step 10/10] F0604 00:57:42.737968 24841 > port_mapping_tests.cpp:448] CHECK_SOME(isolator): Failed to get realpath for > bind mount root '/var/run/netns': Not found > [00:57:42]W: [Step 10/10] *** Check failure stack trace: *** > [00:57:42]W: [Step 10/10] @ 0x7f8bd52583d2 > google::LogMessage::Fail() > [00:57:42]W: [Step 10/10] @ 0x7f8bd525832b > google::LogMessage::SendToLog() > [00:57:42]W: [Step 10/10] @ 0x7f8bd5257d21 > google::LogMessage::Flush() > [00:57:42]W: [Step 10/10] @ 0x7f8bd525ab92 > google::LogMessageFatal::~LogMessageFatal() > [00:57:42]W: [Step 10/10] @ 0xa62171 > _CheckFatal::~_CheckFatal() > [00:57:42]W: [Step 10/10] @ 0x1931b17 > mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_ContainerToContainerTCP_Test::TestBody() > [00:57:42]W: [Step 10/10] @ 0x19e17b6 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > [00:57:42]W: [Step 10/10] @ 0x19dc864 > testing::inter
[jira] [Commented] (MESOS-5565) Add logging when Offer::Operation::Launch message has no tasks.
[ https://issues.apache.org/jira/browse/MESOS-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344681#comment-15344681 ] José Guilherme Vanz commented on MESOS-5565: [~klaus1982] news? If you're working on it I'll find anything else to work. ;) > Add logging when Offer::Operation::Launch message has no tasks. > --- > > Key: MESOS-5565 > URL: https://issues.apache.org/jira/browse/MESOS-5565 > Project: Mesos > Issue Type: Improvement >Reporter: Anand Mazumdar >Assignee: Klaus Ma >Priority: Minor > Labels: newbie > > Currently, when a {{Offer::Accept::Launch}} message has no tasks specified, > Mesos would treat such requests as implicitly declining all offers. This can > be very counter-intuitive for framework developers since we do not have any > logging on the Master around this behavior. It would be good to add some > logging on the master to apprise the framework developers that all the offers > have been implicitly declined. > {code} > if (operation.type() == Offer::Operation::LAUNCH) { > if (operation.launch().task_infos().size() > 0) { > ++metrics->messages_launch_tasks; > } else { > ++metrics->messages_decline_offers; > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5683) Can't see the finished tasks when run the Java example framework
[ https://issues.apache.org/jira/browse/MESOS-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344540#comment-15344540 ] Joseph Wu commented on MESOS-5683: -- [~ZLuo], most of the example frameworks will run and exit relatively quickly. When a framework "completes" the associated tasks are moved to a separate section of the web UI: {{http://localhost:5050/#/frameworks}} > Can't see the finished tasks when run the Java example framework > > > Key: MESOS-5683 > URL: https://issues.apache.org/jira/browse/MESOS-5683 > Project: Mesos > Issue Type: Bug > Components: master >Reporter: Zhigang Luo > > Following the steps in "Getting Started" and run example framework(Java), > then can't see the finished tasks from the mesos wed page > (http://127.0.0.1:5050). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5683) Can't see the finished tasks when run the Java example framework
Zhigang Luo created MESOS-5683: -- Summary: Can't see the finished tasks when run the Java example framework Key: MESOS-5683 URL: https://issues.apache.org/jira/browse/MESOS-5683 Project: Mesos Issue Type: Bug Components: master Reporter: Zhigang Luo Following the steps in "Getting Started" and run example framework(Java), then can't see the finished tasks from the mesos wed page (http://127.0.0.1:5050). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2679) Slave asked to shut down by master because 'health check timed out'
[ https://issues.apache.org/jira/browse/MESOS-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344118#comment-15344118 ] vincenzo.lomba...@kydea.com commented on MESOS-2679: Just to let you know, I had the same problem and I solved changing the “—net” option in docker run from “host” to “bridge”. I think it depends on having a mess_slave running in the same physical server hosting the mess_master... > Slave asked to shut down by master because 'health check timed out' > --- > > Key: MESOS-2679 > URL: https://issues.apache.org/jira/browse/MESOS-2679 > Project: Mesos > Issue Type: Bug > Components: isolation >Affects Versions: 0.22.1 >Reporter: Littlestar > > I run spark 1.3.1 on mesos 0.22.1 rc6 (linux64), some mesos slave node > offline. > slave node logs: > I0430 15:12:12.737057 32354 slave.cpp:571] Slave asked to shut down by > master@192.168.1.10:5050 because 'health check timed out' > master node logs: > I0430 15:12:00.615777 19759 master.cpp:237] Shutting down slave > 20150430-141442-1214949568-5050-19747-S2 due to health check timeout > W0430 15:12:00.616083 19751 master.cpp:3417] Shutting down slave > 20150430-141442-1214949568-5050-19747-S2 at slave(1)@192.168.1.15:5051 > (hpblade05) with message 'health check timed out' > why master-slave offline and not restart itself? > Any configurations to increase this timeout interval? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-4732) Migrate rest of the endpoints to use `jsonify`
[ https://issues.apache.org/jira/browse/MESOS-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344035#comment-15344035 ] Jay Guo edited comment on MESOS-4732 at 6/22/16 9:56 AM: - [~neilconway] [~mcypark] Is this migration is still active? I'm working on v1 operator API and I observed that some of existing endpoints are not transformed to use {{jsonify}}. I wonder whether it makes sense at all to rework those endpoints to use {{jsonify}}, since we are refactoring them anyway. The particular API I'm looking at right now is {{slave/containers}} was (Author: guoger): [~neilconway][~mcypark] Is this migration is still active? I'm working on v1 operator API and I observed that some of existing endpoints are not transformed to use {{jsonify}}. I wonder whether it makes sense at all to rework those endpoints to use {{jsonify}}, since we are refactoring them anyway. The particular API I'm looking at right now is {{slave/containers}} > Migrate rest of the endpoints to use `jsonify` > -- > > Key: MESOS-4732 > URL: https://issues.apache.org/jira/browse/MESOS-4732 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Michael Park >Assignee: Neil Conway > > As MVP, we shipped `/state` and `/state-summary` to use `jsonify`. We need to > follow through with the migration of the rest of the endpoints to use > `jsonify` as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4732) Migrate rest of the endpoints to use `jsonify`
[ https://issues.apache.org/jira/browse/MESOS-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344035#comment-15344035 ] Jay Guo commented on MESOS-4732: [~neilconway][~mcypark] Is this migration is still active? I'm working on v1 operator API and I observed that some of existing endpoints are not transformed to use {{jsonify}}. I wonder whether it makes sense at all to rework those endpoints to use {{jsonify}}, since we are refactoring them anyway. The particular API I'm looking at right now is {{slave/containers}} > Migrate rest of the endpoints to use `jsonify` > -- > > Key: MESOS-4732 > URL: https://issues.apache.org/jira/browse/MESOS-4732 > Project: Mesos > Issue Type: Task > Components: master >Reporter: Michael Park >Assignee: Neil Conway > > As MVP, we shipped `/state` and `/state-summary` to use `jsonify`. We need to > follow through with the migration of the rest of the endpoints to use > `jsonify` as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-5490) Implement GET_STATE_SUMMARY Call in v1 master API.
[ https://issues.apache.org/jira/browse/MESOS-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Guo reassigned MESOS-5490: -- Assignee: Jay Guo > Implement GET_STATE_SUMMARY Call in v1 master API. > -- > > Key: MESOS-5490 > URL: https://issues.apache.org/jira/browse/MESOS-5490 > Project: Mesos > Issue Type: Task >Reporter: Vinod Kone >Assignee: Jay Guo > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5310) Enable `network/cni` isolator to allow modifications and deletion of CNI config
[ https://issues.apache.org/jira/browse/MESOS-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343925#comment-15343925 ] Qian Zhang commented on MESOS-5310: --- The above patches are to introduce 'cni/config' endpoint in agent to support dynamically adding a new CNI network configuration. However, based on the discussion with Jie and Avinash, that is not in the scope of MVP, so we will hold on and may iterate on it later. As MVP, we'd like to checkpoint CNI network configuration in container dir when launching the container, and also use it when detaching container, this will make the container life cycle consistent. Here is the review chain: https://reviews.apache.org/r/49069/ > Enable `network/cni` isolator to allow modifications and deletion of CNI > config > --- > > Key: MESOS-5310 > URL: https://issues.apache.org/jira/browse/MESOS-5310 > Project: Mesos > Issue Type: Task > Components: containerization > Environment: linux >Reporter: Avinash Sridharan >Assignee: Qian Zhang > Labels: mesosphere > > Currently the `network/cni` isolator can only load the CNI configs at > startup. This makes the CNI networks immutable. From an operational > standpoint this can make deployments painful for operators. > To make CNI more flexible the `network/cni` isolator should be able to load > configs at run time. > The proposal is to add an endpoint to the `network/cni` isolator, to which > when the operator sends a PUT request the `network/cni` isolator will reload > CNI configs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-5682) The /flags endpoints use authorization but there is a bypass to get their content
Alexander Rojas created MESOS-5682: -- Summary: The /flags endpoints use authorization but there is a bypass to get their content Key: MESOS-5682 URL: https://issues.apache.org/jira/browse/MESOS-5682 Project: Mesos Issue Type: Bug Components: master, slave Reporter: Alexander Rojas Priority: Minor The {{/flags}} endpoints use authorization in both, master and agent. However the contents of the flags are available without any need for authorization by accessing the {{/state}} endpoints on both, master and agents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)