subject:"\[jira\] \[Commented\] \(FLINK\-10063\) Jepsen\: Automatically restart Mesos Processes"

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574455#comment-16574455
 ] 

ASF GitHub Bot commented on FLINK-10063:


asfgit closed pull request #6496: [FLINK-10063][tests] Use runit to supervise 
mesos processes.
URL: https://github.com/apache/flink/pull/6496
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/flink-jepsen/docker/Dockerfile-db 
b/flink-jepsen/docker/Dockerfile-db
index 1555329af3f..cb60efce2e5 100644
--- a/flink-jepsen/docker/Dockerfile-db
+++ b/flink-jepsen/docker/Dockerfile-db
@@ -21,7 +21,7 @@ FROM debian:jessie
 RUN echo "deb http://http.debian.net/debian jessie-backports main" >> 
/etc/apt/sources.list && \
 apt-get update && \
 apt-get install -y -t jessie-backports openjdk-8-jdk && \
-apt-get install -y apt-utils bzip2 curl faketime iproute iptables 
iputils-ping less libzip2 logrotate man man-db net-tools ntpdate psmisc python 
rsyslog sudo sysvinit sysvinit-core sysvinit-utils tar unzip vim wget
+apt-get install -y apt-utils bzip2 curl faketime iproute iptables 
iputils-ping less libzip2 logrotate man man-db net-tools ntpdate psmisc python 
rsyslog runit sudo sysvinit sysvinit-core sysvinit-utils tar unzip vim wget
 
 RUN apt-get update && \
 apt-get -y install openssh-server && \
@@ -35,5 +35,12 @@ RUN mkdir -p /root/.ssh/ && \
 chmod 600 /root/.ssh/authorized_keys && \
 cat /root/id_rsa.pub >> /root/.ssh/authorized_keys
 
+COPY sshd-run /etc/sv/service/sshd/run
+RUN chmod +x /etc/sv/service/sshd/run && \
+ln -sf /etc/sv/service/sshd /etc/service
+
 EXPOSE 22
-CMD exec /usr/sbin/sshd -D
+
+# Start runit process supervisor which will bring up sshd.
+# In our tests we can use runit to supervise more processes, e.g., Mesos.
+CMD runsvdir -P /etc/service /dev/null > /dev/null
diff --git a/flink-jepsen/src/jepsen/flink/db.clj 
b/flink-jepsen/src/jepsen/flink/db.clj
index 9a725d7149a..becc551e2cf 100644
--- a/flink-jepsen/src/jepsen/flink/db.clj
+++ b/flink-jepsen/src/jepsen/flink/db.clj
@@ -97,7 +97,7 @@
   (if (cu/exists? log-dir) (cu/ls-full log-dir) []))
 
 (defn flink-db
-  [test]
+  []
   (reify db/DB
 (setup! [_ test node]
   (c/su
@@ -131,7 +131,7 @@
   []
   (let [zk (zk/db deb-zookeeper-package)
 hadoop (hadoop/db hadoop-dist-url)
-flink (flink-db test)]
+flink (flink-db)]
 (combined-db [hadoop zk flink])))
 
 (defn exec-flink!
@@ -192,7 +192,7 @@
   (let [zk (zk/db deb-zookeeper-package)
 hadoop (hadoop/db hadoop-dist-url)
 mesos (mesos/db deb-mesos-package deb-marathon-package)
-flink (flink-db test)]
+flink (flink-db)]
 (combined-db [hadoop zk mesos flink])))
 
 (defn submit-job-with-retry!
@@ -209,24 +209,25 @@
 (let [r (fu/retry (fn []
 (http/post
   (str (mesos/marathon-base-url test) "/v2/apps")
-  {:form-params  {:id   "flink"
-  :cmd  (str "HADOOP_CLASSPATH=`" 
hadoop/install-dir "/bin/hadoop classpath` "
- "HADOOP_CONF_DIR=" 
hadoop/hadoop-conf-dir " "
- install-dir 
"/bin/mesos-appmaster.sh "
- "-Dmesos.master=" 
(zookeeper-uri
- test
- 
mesos/zk-namespace) " "
- 
"-Djobmanager.rpc.address=$(hostname -f) "
- 
"-Djobmanager.heap.mb=2048 "
- 
"-Djobmanager.rpc.port=6123 "
- 
"-Djobmanager.web.port=8081 "
- 
"-Dmesos.resourcemanager.tasks.mem=2048 "
- 
"-Dtaskmanager.heap.mb=2048 "
- 
"-Dtaskmanager.numberOfTaskSlots=2 "
- 
"-Dmesos.resourcemanager.tasks.cpus=1 "
- 
"-Drest.bind-address=$(hostname -f) ")
-  :cpus 1.0
-  :mem  2048}
+  {:form-params  {:id"flink"
+  :cmd   (str 
"HADOOP_CLASSPATH=`" hadoop/install-dir "/bin/hadoop classpath` "
+

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-09 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574452#comment-16574452
 ] 

ASF GitHub Bot commented on FLINK-10063:


tillrohrmann commented on issue #6496: [FLINK-10063][tests] Use runit to 
supervise mesos processes.
URL: https://github.com/apache/flink/pull/6496#issuecomment-411672800
 
 
   Merging this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jepsen: Automatically restart Mesos Processes
> -
>
> Key: FLINK-10063
> URL: https://issues.apache.org/jira/browse/FLINK-10063
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> Use a process supervisor to automatically restart Mesos processes. This is 
> needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
> Mesos master will exit when it discovers it has been partitioned away from 
> the Zookeeper quorum. Currently the some of the tests cannot pass because the 
> Mesos processes exiting.
> *Acceptance Criteria*
> * Running tests with {{--deployment-mode mesos-session}} should not fail due 
> to reasons related to the Mesos setup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-08 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573300#comment-16573300
 ] 

ASF GitHub Bot commented on FLINK-10063:


GJL commented on a change in pull request #6496: [FLINK-10063][tests] Use runit 
to supervise mesos processes.
URL: https://github.com/apache/flink/pull/6496#discussion_r208608118
 
 

 ##
 File path: flink-jepsen/docker/Dockerfile-db
 ##
 @@ -35,5 +35,12 @@ RUN mkdir -p /root/.ssh/ && \
 chmod 600 /root/.ssh/authorized_keys && \
 cat /root/id_rsa.pub >> /root/.ssh/authorized_keys
 
+COPY sshd-run /etc/sv/service/sshd/run
+RUN chmod +x /etc/sv/service/sshd/run && \
+ln -sf /etc/sv/service/sshd /etc/service
+
 EXPOSE 22
-CMD exec /usr/sbin/sshd -D
+
+# Start runit process supervisor which will bring up sshd.
+# In our tests we can use runit to supervise more processes, e.g., Mesos.
+CMD runsvdir -P /etc/service /dev/null > /dev/null
 
 Review comment:
   Yes is needed. It only redirects std err:
   
   >>> If the log argument is given to runsvdir, all output to standard error 
is redirected to this log


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jepsen: Automatically restart Mesos Processes
> -
>
> Key: FLINK-10063
> URL: https://issues.apache.org/jira/browse/FLINK-10063
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> Use a process supervisor to automatically restart Mesos processes. This is 
> needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
> Mesos master will exit when it discovers it has been partitioned away from 
> the Zookeeper quorum. Currently the some of the tests cannot pass because the 
> Mesos processes exiting.
> *Acceptance Criteria*
> * Running tests with {{--deployment-mode mesos-session}} should not fail due 
> to reasons related to the Mesos setup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-08 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573302#comment-16573302
 ] 

ASF GitHub Bot commented on FLINK-10063:


GJL commented on a change in pull request #6496: [FLINK-10063][tests] Use runit 
to supervise mesos processes.
URL: https://github.com/apache/flink/pull/6496#discussion_r208608118
 
 

 ##
 File path: flink-jepsen/docker/Dockerfile-db
 ##
 @@ -35,5 +35,12 @@ RUN mkdir -p /root/.ssh/ && \
 chmod 600 /root/.ssh/authorized_keys && \
 cat /root/id_rsa.pub >> /root/.ssh/authorized_keys
 
+COPY sshd-run /etc/sv/service/sshd/run
+RUN chmod +x /etc/sv/service/sshd/run && \
+ln -sf /etc/sv/service/sshd /etc/service
+
 EXPOSE 22
-CMD exec /usr/sbin/sshd -D
+
+# Start runit process supervisor which will bring up sshd.
+# In our tests we can use runit to supervise more processes, e.g., Mesos.
+CMD runsvdir -P /etc/service /dev/null > /dev/null
 
 Review comment:
   Yes is needed. It only redirects std err:
   
   > If the log argument is given to runsvdir, all output to standard error is 
redirected to this log


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jepsen: Automatically restart Mesos Processes
> -
>
> Key: FLINK-10063
> URL: https://issues.apache.org/jira/browse/FLINK-10063
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> Use a process supervisor to automatically restart Mesos processes. This is 
> needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
> Mesos master will exit when it discovers it has been partitioned away from 
> the Zookeeper quorum. Currently the some of the tests cannot pass because the 
> Mesos processes exiting.
> *Acceptance Criteria*
> * Running tests with {{--deployment-mode mesos-session}} should not fail due 
> to reasons related to the Mesos setup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-07 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572756#comment-16572756
 ] 

ASF GitHub Bot commented on FLINK-10063:


GJL commented on a change in pull request #6496: [FLINK-10063][tests] Use runit 
to supervise mesos processes.
URL: https://github.com/apache/flink/pull/6496#discussion_r208471471
 
 

 ##
 File path: flink-jepsen/src/jepsen/flink/mesos.clj
 ##
 @@ -24,11 +24,35 @@
 [jepsen.os.debian :as debian]
 [jepsen.flink.zookeeper :refer [zookeeper-uri]]))
 
+;;; runit process supervisor (http://smarden.org/runit/)
+;;;
+;;; We use runit to supervise Mesos processes because Mesos uses a "fail-fast" 
approach to
+;;; error handling, e.g., the Mesos master will exit when it discovers it has 
been partitioned away
+;;; from the Zookeeper quorum.
+
+(def runit-version "2.1.2-3")
+
+(defn create-supervised-service!
+  "Registers a service with the process supervisor and starts it."
+  [service-name cmd]
+  (let [service-dir (str "/etc/sv/" service-name)
+run-script (str service-dir "/run")]
+(c/su
+  (c/exec :mkdir :-p service-dir)
+  (c/exec :echo (clojure.string/join "\n" ["#!/bin/sh" cmd]) :> run-script)
 
 Review comment:
   I'll fix this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jepsen: Automatically restart Mesos Processes
> -
>
> Key: FLINK-10063
> URL: https://issues.apache.org/jira/browse/FLINK-10063
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> Use a process supervisor to automatically restart Mesos processes. This is 
> needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
> Mesos master will exit when it discovers it has been partitioned away from 
> the Zookeeper quorum. Currently the some of the tests cannot pass because the 
> Mesos processes exiting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-06 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569829#comment-16569829
 ] 

ASF GitHub Bot commented on FLINK-10063:


cewood commented on a change in pull request #6496: [FLINK-10063][tests] Use 
runit to supervise mesos processes.
URL: https://github.com/apache/flink/pull/6496#discussion_r207793002
 
 

 ##
 File path: flink-jepsen/src/jepsen/flink/mesos.clj
 ##
 @@ -24,11 +24,35 @@
 [jepsen.os.debian :as debian]
 [jepsen.flink.zookeeper :refer [zookeeper-uri]]))
 
+;;; runit process supervisor (http://smarden.org/runit/)
+;;;
+;;; We use runit to supervise Mesos processes because Mesos uses a "fail-fast" 
approach to
+;;; error handling, e.g., the Mesos master will exit when it discovers it has 
been partitioned away
+;;; from the Zookeeper quorum.
+
+(def runit-version "2.1.2-3")
+
+(defn create-supervised-service!
+  "Registers a service with the process supervisor and starts it."
+  [service-name cmd]
+  (let [service-dir (str "/etc/sv/" service-name)
+run-script (str service-dir "/run")]
+(c/su
+  (c/exec :mkdir :-p service-dir)
+  (c/exec :echo (clojure.string/join "\n" ["#!/bin/sh" cmd]) :> run-script)
 
 Review comment:
   It's generally considered best practice for runit units to include an `exec 
2>&1` line, and to prefix your command with `exec ...`. So I'd suggest updating 
this line accordingly; `["#!/bin/sh" "exec 2>&1" (str "exec " cmd)]`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jepsen: Automatically restart Mesos Processes
> -
>
> Key: FLINK-10063
> URL: https://issues.apache.org/jira/browse/FLINK-10063
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> Use a process supervisor to automatically restart Mesos processes. This is 
> needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
> Mesos master will exit when it discovers it has been partitioned away from 
> the Zookeeper quorum. Currently the some of the tests cannot pass because the 
> Mesos processes exiting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-06 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569828#comment-16569828
 ] 

ASF GitHub Bot commented on FLINK-10063:


cewood commented on a change in pull request #6496: [FLINK-10063][tests] Use 
runit to supervise mesos processes.
URL: https://github.com/apache/flink/pull/6496#discussion_r207790888
 
 

 ##
 File path: flink-jepsen/docker/Dockerfile-db
 ##
 @@ -35,5 +35,12 @@ RUN mkdir -p /root/.ssh/ && \
 chmod 600 /root/.ssh/authorized_keys && \
 cat /root/id_rsa.pub >> /root/.ssh/authorized_keys
 
+COPY sshd-run /etc/sv/service/sshd/run
+RUN chmod +x /etc/sv/service/sshd/run && \
+ln -sf /etc/sv/service/sshd /etc/service
+
 EXPOSE 22
-CMD exec /usr/sbin/sshd -D
+
+# Start runit process supervisor which will bring up sshd.
+# In our tests we can use runit to supervise more processes, e.g., Mesos.
+CMD runsvdir -P /etc/service /dev/null > /dev/null
 
 Review comment:
   Is the extra `> /dev/null` actually required? I would have expected that the 
log argument to `/dev/null` alone would have sufficed, since it also redirects 
standard error according to the docs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jepsen: Automatically restart Mesos Processes
> -
>
> Key: FLINK-10063
> URL: https://issues.apache.org/jira/browse/FLINK-10063
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> Use a process supervisor to automatically restart Mesos processes. This is 
> needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
> Mesos master will exit when it discovers it has been partitioned away from 
> the Zookeeper quorum. Currently the some of the tests cannot pass because the 
> Mesos processes exiting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-06 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569830#comment-16569830
 ] 

ASF GitHub Bot commented on FLINK-10063:


cewood commented on issue #6496: [FLINK-10063][tests] Use runit to supervise 
mesos processes.
URL: https://github.com/apache/flink/pull/6496#issuecomment-410615164
 
 
   And nice work on this, it's super tedious doing all this setup and tear down 
stuff, nice job :100: 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jepsen: Automatically restart Mesos Processes
> -
>
> Key: FLINK-10063
> URL: https://issues.apache.org/jira/browse/FLINK-10063
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> Use a process supervisor to automatically restart Mesos processes. This is 
> needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
> Mesos master will exit when it discovers it has been partitioned away from 
> the Zookeeper quorum. Currently the some of the tests cannot pass because the 
> Mesos processes exiting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

2018-08-05 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569458#comment-16569458
 ] 

ASF GitHub Bot commented on FLINK-10063:


GJL opened a new pull request #6496: [FLINK-10063][tests] Use runit to 
supervise mesos processes.
URL: https://github.com/apache/flink/pull/6496
 
 
   ## What is the purpose of the change
   
   *Use a process supervisor to automatically restart Mesos processes. This is 
needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
Mesos master will exit when it discovers it has been partitioned away from the 
Zookeeper quorum. Currently the some of the tests cannot pass because the Mesos 
processes exiting.*
   
   cc: @igalshilman @cewood @tillrohrmann 
   
   ## Brief change log
   
 - *Use runit to supervise Mesos processes.*
 - *Make docker setup work.*
   
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
 - *Ran Mesos tests on docker.*
 
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (**yes** (in test 
code) / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Jepsen: Automatically restart Mesos Processes
> -
>
> Key: FLINK-10063
> URL: https://issues.apache.org/jira/browse/FLINK-10063
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.6.0
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.1, 1.7.0
>
>
> Use a process supervisor to automatically restart Mesos processes. This is 
> needed because Mesos uses a "fail-fast" approach to error handling, e.g., the 
> Mesos master will exit when it discovers it has been partitioned away from 
> the Zookeeper quorum. Currently the some of the tests cannot pass because the 
> Mesos processes exiting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

[jira] [Commented] (FLINK-10063) Jepsen: Automatically restart Mesos Processes

9 matches

Site Navigation

Mail list logo

Footer information