[ 
https://issues.apache.org/jira/browse/MESOS-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-5340:
----------------------------------
    Summary: libevent builds may prevent new connections  (was: SSL-downgrading 
support may prevent new connections)

> libevent builds may prevent new connections
> -------------------------------------------
>
>                 Key: MESOS-5340
>                 URL: https://issues.apache.org/jira/browse/MESOS-5340
>             Project: Mesos
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.29.0, 0.28.1
>            Reporter: Till Toenshoff
>            Priority: Blocker
>              Labels: mesosphere, security, ssl
>
> When using an SSL-enabled build of Mesos in combination with SSL-downgrading 
> support, any connection that does not actually transmit data will hang the 
> runnable (e.g. master).
> For reproducing the issue (on any platform)...
> Spin up a master with enabled SSL-downgrading:
> {noformat}
> $ export SSL_ENABLED=true
> $ export SSL_SUPPORT_DOWNGRADE=true
> $ export SSL_KEY_FILE=/path/to/your/foo.key
> $ export SSL_CERT_FILE=/path/to/your/foo.crt
> $ export SSL_CA_FILE=/path/to/your/ca.crt
> $ ./bin/mesos-master.sh --work_dir=/tmp/foo
> {noformat}
> Create some artificial HTTP request load for quickly spotting the problem in 
> both, the master logs as well as the output of CURL itself:
> {noformat}
> $ while true; do sleep 0.1; echo $( date +">%H:%M:%S.%3N"; curl -s -k -A "SSL 
> Debug" http://localhost:5050/master/slaves; echo ;date +"<%H:%M:%S.%3N"; 
> echo); done
> {noformat}
> Now create a connection to the master that does not transmit any data:
> {noformat}
> $ telnet localhost 5050
> {noformat}
> You should now see the CURL requests hanging, the master stops responding to 
> new connections. This will persist until either some data is transmitted via 
> the above telnet connection or it is closed.
> This problem has initially been observed when running Mesos on an AWS cluster 
> with enabled internal ELB health-checks for the master node. Those 
> health-checks are using long-lasting connections that do not transmit any 
> data and are closed after a configurable duration. In our test environment, 
> this duration was set to 60 seconds and hence we were seeing our master 
> getting repetitively unresponsive for 60 seconds, then getting "unstuck" for 
> a brief period until it got stuck again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to