[ 
https://issues.apache.org/jira/browse/AURORA-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181745#comment-16181745
 ] 

Bill Farner commented on AURORA-1949:
-------------------------------------

I recently changed this code to address a similar issue.  I intended to 
faithfully leave the algorithm untouched even though i was suspicious of it, 
but may have also introduced a bug.  Your intuition to rethink the approach may 
be the way to go.

> PreemptionVictimFilterImpl comparator violates transitivity causing exceptions
> ------------------------------------------------------------------------------
>
>                 Key: AURORA-1949
>                 URL: https://issues.apache.org/jira/browse/AURORA-1949
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Jordan Ly
>            Priority: Critical
>
> The PreemptionVictimFilterImpl uses a comparator to sort ResourceBags in 
> order to preempt the biggest tasks first when searching for a victim. 
> However, the current implementation causes an exception which causes the 
> Scheduler to fail:
> {noformat}
> SEVERE: Service PreemptorService [FAILED] has failed in the RUNNING state.
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>         at java.util.TimSort.mergeLo(TimSort.java:777)
>         at java.util.TimSort.mergeAt(TimSort.java:514)
>         at java.util.TimSort.mergeCollapse(TimSort.java:441)
>         at java.util.TimSort.sort(TimSort.java:245)
>         at java.util.Arrays.sort(Arrays.java:1438)
>         at 
> com.google.common.collect.Ordering.immutableSortedCopy(Ordering.java:882)
>         at 
> org.apache.aurora.scheduler.preemptor.PreemptionVictimFilter$PreemptionVictimFilterImpl.filterPreemptionVictims(PreemptionVictimFilter.java:210)
>         at 
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(PendingTaskProcessor.java:178)
>         at 
> org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147)
>         at 
> org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
>         at 
> org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
>         at 
> org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:562)
>         at 
> org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113)
>         at 
> org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:135)
>         at 
> org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
>         at 
> org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:205)
>         at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188)
>         at 
> com.google.common.util.concurrent.Callables$4.run(Callables.java:122)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Looking at the code, it seems it violates transitivity:
> {code:java}
>     @VisibleForTesting
>     static final Ordering<ResourceBag> ORDER = new Ordering<ResourceBag>() {
>       @Override
>       public int compare(ResourceBag left, ResourceBag right) {
>         Set<ResourceType> types = ImmutableSet.<ResourceType>builder()
>             .addAll(left.streamResourceVectors().map(e -> 
> e.getKey()).iterator())
>             .addAll(right.streamResourceVectors().map(e -> 
> e.getKey()).iterator())
>             .build();
>         boolean allZero = true;
>         boolean allGreaterOrEqual = true;
>         boolean allLessOrEqual = true;
>         for (ResourceType type : types) {
>           int compare = left.valueOf(type).compareTo(right.valueOf(type));
>           if (compare != 0) {
>             allZero = false;
>           }
>           if (compare < 0) {
>             allGreaterOrEqual = false;
>           }
>           if (compare > 0) {
>             allLessOrEqual = false;
>           }
>         }
>         if (allZero) {
>           return 0;
>         }
>         if (allGreaterOrEqual) {
>           return 1;
>         }
>         if (allLessOrEqual) {
>           return -1;
>         }
>         return 0;
>       }
>     };
> {code}
> The example below illustrates the error:
> {noformat}
> Resource:    X Y Z
> Bag A:       2 0 2
> Bag B:       1 2 1
> Bag C:       2 2 1
> {noformat}
> We can see that A = B, B < C, and C = A which would cause an exception.
> There are a couple of routes we can take to solve this. What we really want 
> is to be able to define partial ordering (comparator does total ordering) so 
> we can do a topological sort. Additionally, we can give priority to 
> particular resources (Dominant Resource Fairness).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to