> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 120 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line120>
> >
> >     Do we have to give up eventually? (I suppose not...)

I don't think so. If we give up, I assume the scheduler is going to shut down. 
Suppose if Mesos is down, on scheduler shutdown means we will elect a new 
leader. A new leader (by default) has a one minute timeout to register to 
Mesos. If we give up, we will just be flapping between leaders until the system 
heals. I think that's pretty undesirable.


> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 125-129 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line125>
> >
> >     Does the Mesos docs say anything about simultanous `SUBSCRIBE` calls?
> >     
> >     If the backoff time is still pretty low we might end up sending another 
> > subscribe before we have received an answer for the previous one.

>From what I understand, multiple subscription per framework is not allowed and 
>subsequent subscribe attempts will fail if a connection was already 
>established. The underlying driver ignores those failures so we should be fine.


> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 128-130 (original), 165-167 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line165>
> >
> >     You are unsetting `isSubscribed` in the `disconnected` handler. Doesn't 
> > this imply we will never run the reregistration code here?

Good catch, fixed.


> On March 30, 2017, 8:13 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
> > Lines 137-138 (original), 174-175 (patched)
> > <https://reviews.apache.org/r/58053/diff/1/?file=1680496#file1680496line175>
> >
> >     I am wondering why we need this here for `OFFERS` but not for 
> > `RESCIND`, `INVERSE_OFFERS`, etc.

I put it in here for the same kind of errors are the unversioned driver. 
Technically we could put it everywhere. I'm not opposed if you think we should 
do it.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58053/#review170580
-----------------------------------------------------------


On March 29, 2017, 4:52 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58053/
> -----------------------------------------------------------
> 
> (Updated March 29, 2017, 4:52 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1911
>     https://issues.apache.org/jira/browse/AURORA-1911
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> As noted in AURORA-1911 the `V1Mesos` driver doesn't re try `SUBSCRIBE` calls 
> if they fail. This means that after a leader subscribes and disconnects, it 
> is possible for it to never re subscribe again if the Mesos Master is 
> unhealthy.
> 
> To fix this, I have moved the subscription into the dedicated 
> `SchedulerExecutor` and it coninutes to attempt to subscribe using truncated 
> binary backoff. It only stops if we are disconnected or if we sucessfully 
> connect.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 
> 206b11458da2b0f938f0fcab5e5d3259a88ac9ee 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 5bf1e4e8c46044cb69b266cd203b5ec2f8b9ab61 
>   src/main/java/org/apache/aurora/scheduler/mesos/SchedulerDriverModule.java 
> 10d4f1b515b91d85b283cb7c655275c22fb133f9 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImpl.java
>  67d356ab66c926a3b56860b906a453d57d6b694d 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/VersionedMesosSchedulerImplTest.java
>  756d0d9e30a447f9fba75c1c60f2f2f3c610399b 
> 
> 
> Diff: https://reviews.apache.org/r/58053/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>

Reply via email to