[ 
https://issues.apache.org/jira/browse/AURORA-1802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Erb updated AURORA-1802:
--------------------------------
    Fix Version/s: 0.17.0

> AttributeAggregate slows down scheduling of jobs with many instances
> --------------------------------------------------------------------
>
>                 Key: AURORA-1802
>                 URL: https://issues.apache.org/jira/browse/AURORA-1802
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Stephan Erb
>             Fix For: 0.17.0
>
>
> The current implementation of 
> [{{AttributeAggregate}}|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java]
>  slows down scheduling of jobs with many instances. Interestingly, this is 
> currently not visible in our job scheduling benchmark results as it only 
> affects the benchmark setup time but not the measured part.
> {{AttributeAggregate}} relies on {{Suppliers.memoize}} to ensure that it is 
> only computed once and only when necessary. This has probably been done 
> because the factory 
> [{{AttributeAggregate.getJobActiveState}}|https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java#L56-L91]
>  is slow. 
> After some recent changes to schedule multiple task instances per scheduling 
> round the aggregate is computed in each scheduling round via the call 
> [{{resourceRequest.getJobState().updateAttributeAggregate(...)}} 
> |https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java#L173]
>  in {{TaskAssigner}}. This means the expensive factory is called once per 
> scheduling round.
> h3. Potential improvements
> * the current factory implementation performs one {{fetchTasks}} query 
> followed by {{n}} distinct {{getHostAttributes}} queries. This could be 
> reduced to a single SQL query.
> * the aggregate makes heavy use of {{ImmutableMultiset}} even though it is 
> not immutable any more. There is potential room for improvement here.
> * The aggregate uses suppliers to perform a lazy instantiation even though 
> its current usage is not lazy any more. We can either make the implementation 
> eager, or ensure that the expensive part is only run when absolutely 
> necessary.
> h3. Proof of concept
> * 4 mins 23.407 secs -- total runtime of {{./gradlew jmh 
> -Pbenchmarks='SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark'}}
> * 2 mins 40.308 secs -- total runtime of {{./gradlew jmh 
> -Pbenchmarks='SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark'}}
>  with [{{resourceRequest.getJobState().updateAttributeAggregate(...)}} 
> |https://github.com/apache/aurora/blob/f559e930659e25b3d7cacb7b845ebda50d18d66a/src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java#L173]
>  commented out. This works as the call is not necessary when only a single 
> instance is scheduled per scheduling round, as done in the benchmarks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to