> On Oct. 17, 2017, 6:26 p.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java
> > Lines 67-68 (patched)
> > <https://reviews.apache.org/r/62956/diff/2/?file=1854107#file1854107line67>
> >
> >     Since the Scheduler fails over every 24 hours, maybe we can let the new 
> > Scheduler retry the Slave?
> >     
> >     30 days seems like a very high threshold and can sneak into a tight 
> > capacity situation without much warning. Typically in those scenarios, we 
> > manually churn the cluster to free up space. Wonder how the 30 day filter 
> > would behave in such a case. Having said that, we should make this 
> > configurable with a resonable default (few hrs)?
> 
> Jordan Ly wrote:
>     I believe that the filter only works against a specific framework ID, so 
> that a scheduler failover or deploy would receive the offers again.
> 
> Jordan Ly wrote:
>     Additionally, does churning the cluster mean new offers would be 
> generated? If so, I think that they would get a new offer ID and be reissued.
> 
> Stephan Erb wrote:
>     The framework ID remains constent across failovers of both Aurora 
> schedulers and Mesos masters. Otherwise we'd lose all currently runnings 
> tasks during a failover.
>     
>     For the filtering I am under the impression that it is per agent and 
> independent of offer or offer IDs. To be safe, we should check with some 
> Mesos developers though :)
> 
> Jordan Ly wrote:
>     You are correct, the framework ID will remain constant and the filters 
> will stay in place.
>     
>     For the filtering, I am being told that if you refuse an offer with x 
> resources, then if those resources stay the same Mesos will not offer them to 
> you again. However, if the resources increases then Mesos will offer them to 
> the framework again.
>     
>     Could we take advantage of the reviveOffers() call to remove filters on 
> scheduler initialization?
> 
> Bill Farner wrote:
>     The deeper i dig, the more dubious i find the offer filter mechanism 
> (lack of documentation does not help).  One case i cannot address is that 
> _resources_ are filtered rather than the 'immutable offer'.  So if we filter 
> an offer with `resources = [mem=1GB]`, we will not receive a new offer for 
> the same agent with `resources = [cpus=1,mem=1GB]`.
>     
>     I am going to use the default filter to avoid any unintended consequences 
> of offer filters now or in the future.  This will result in some unnecessary 
> chatter with the master, but will still have the benefit of avoiding 
> unnecessary consideration when trying to schedule tasks.

Could we hold the offers but just never return them from 
OfferManager::getOffers if they have 0 of cpu/memory/disk? The same applies for 
all the maintenance offers too. Right now we just put them at the back of the 
offer ordering and let the filter take care of them. But if we just never 
returned them, we'd reduce the size of our statically banned offers map and 
avoid a bunch of never-going-to-pass-the-filter offers having to be processed 
in the TaskAssigner.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62956/#review188356
-----------------------------------------------------------


On Oct. 18, 2017, 8:06 p.m., Bill Farner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62956/
> -----------------------------------------------------------
> 
> (Updated Oct. 18, 2017, 8:06 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> There's no reason for us to evaluate offers with no CPUs or memory, so reject 
> them early in the offer lifecycle.
> 
> This is an incremental performance optimization, but it may net significant 
> improvements based on observations in some very large clusters.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/http/Utilization.java 
> 3c77e2983ce00f897f3d5ed106b779cd7f7f0940 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> e8334310a2a46a0ccb09ee6e4122c515892d3996 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java
>  1b1239753f40d7d46d91724def6c25037eb79f1c 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java 
> d5db81b88a0369d0b26c8fbf70efab3886ad7695 
>   src/main/java/org/apache/aurora/scheduler/stats/TaskStatCalculator.java 
> b98aaaf48ae60afef19a368ee96abc897300f8fa 
>   src/test/java/org/apache/aurora/scheduler/offers/OfferManagerImplTest.java 
> 2cfdc090ff75a63111ae146c9fe7b3542e7ac83f 
>   src/test/java/org/apache/aurora/scheduler/offers/Offers.java 
> 129b4437315c6ad4ea47ca75d4ae6e28cadd7911 
>   src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java 
> 765a527acb96997989c920be8b69dfa1113dc302 
> 
> 
> Diff: https://reviews.apache.org/r/62956/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Bill Farner
> 
>

Reply via email to