> On Oct. 17, 2017, 8:26 p.m., Santhosh Kumar Shanmugham wrote: > > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > > Lines 67-68 (patched) > > <https://reviews.apache.org/r/62956/diff/2/?file=1854107#file1854107line67> > > > > Since the Scheduler fails over every 24 hours, maybe we can let the new > > Scheduler retry the Slave? > > > > 30 days seems like a very high threshold and can sneak into a tight > > capacity situation without much warning. Typically in those scenarios, we > > manually churn the cluster to free up space. Wonder how the 30 day filter > > would behave in such a case. Having said that, we should make this > > configurable with a resonable default (few hrs)? > > Jordan Ly wrote: > I believe that the filter only works against a specific framework ID, so > that a scheduler failover or deploy would receive the offers again. > > Jordan Ly wrote: > Additionally, does churning the cluster mean new offers would be > generated? If so, I think that they would get a new offer ID and be reissued.
The framework ID remains constent across failovers of both Aurora schedulers and Mesos masters. Otherwise we'd lose all currently runnings tasks during a failover. For the filtering I am under the impression that it is per agent and independent of offer or offer IDs. To be safe, we should check with some Mesos developers though :) - Stephan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62956/#review188356 ----------------------------------------------------------- On Oct. 13, 2017, 1:18 a.m., Bill Farner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62956/ > ----------------------------------------------------------- > > (Updated Oct. 13, 2017, 1:18 a.m.) > > > Review request for Aurora, David McLaughlin and Jordan Ly. > > > Repository: aurora > > > Description > ------- > > There's no reason for us to evaluate offers with no CPUs or memory, so reject > them early in the offer lifecycle. > > This is an incremental performance optimization, but it may net significant > improvements based on observations in some very large clusters. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/http/Utilization.java > 3c77e2983ce00f897f3d5ed106b779cd7f7f0940 > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java > e8334310a2a46a0ccb09ee6e4122c515892d3996 > > src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java > 1b1239753f40d7d46d91724def6c25037eb79f1c > src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java > d5db81b88a0369d0b26c8fbf70efab3886ad7695 > src/main/java/org/apache/aurora/scheduler/stats/TaskStatCalculator.java > b98aaaf48ae60afef19a368ee96abc897300f8fa > src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java > 2cfdc090ff75a63111ae146c9fe7b3542e7ac83f > src/test/java/org/apache/aurora/scheduler/offers/Offers.java > 129b4437315c6ad4ea47ca75d4ae6e28cadd7911 > src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java > 765a527acb96997989c920be8b69dfa1113dc302 > > > Diff: https://reviews.apache.org/r/62956/diff/2/ > > > Testing > ------- > > > Thanks, > > Bill Farner > >