[ https://issues.apache.org/jira/browse/AURORA-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181745#comment-16181745 ]
Bill Farner commented on AURORA-1949: ------------------------------------- I recently changed this code to address a similar issue. I intended to faithfully leave the algorithm untouched even though i was suspicious of it, but may have also introduced a bug. Your intuition to rethink the approach may be the way to go. > PreemptionVictimFilterImpl comparator violates transitivity causing exceptions > ------------------------------------------------------------------------------ > > Key: AURORA-1949 > URL: https://issues.apache.org/jira/browse/AURORA-1949 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Reporter: Jordan Ly > Priority: Critical > > The PreemptionVictimFilterImpl uses a comparator to sort ResourceBags in > order to preempt the biggest tasks first when searching for a victim. > However, the current implementation causes an exception which causes the > Scheduler to fail: > {noformat} > SEVERE: Service PreemptorService [FAILED] has failed in the RUNNING state. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeLo(TimSort.java:777) > at java.util.TimSort.mergeAt(TimSort.java:514) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1438) > at > com.google.common.collect.Ordering.immutableSortedCopy(Ordering.java:882) > at > org.apache.aurora.scheduler.preemptor.PreemptionVictimFilter$PreemptionVictimFilterImpl.filterPreemptionVictims(PreemptionVictimFilter.java:210) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(PendingTaskProcessor.java:178) > at > org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147) > at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > at > org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:562) > at > org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:135) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > at > org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:205) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188) > at > com.google.common.util.concurrent.Callables$4.run(Callables.java:122) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Looking at the code, it seems it violates transitivity: > {code:java} > @VisibleForTesting > static final Ordering<ResourceBag> ORDER = new Ordering<ResourceBag>() { > @Override > public int compare(ResourceBag left, ResourceBag right) { > Set<ResourceType> types = ImmutableSet.<ResourceType>builder() > .addAll(left.streamResourceVectors().map(e -> > e.getKey()).iterator()) > .addAll(right.streamResourceVectors().map(e -> > e.getKey()).iterator()) > .build(); > boolean allZero = true; > boolean allGreaterOrEqual = true; > boolean allLessOrEqual = true; > for (ResourceType type : types) { > int compare = left.valueOf(type).compareTo(right.valueOf(type)); > if (compare != 0) { > allZero = false; > } > if (compare < 0) { > allGreaterOrEqual = false; > } > if (compare > 0) { > allLessOrEqual = false; > } > } > if (allZero) { > return 0; > } > if (allGreaterOrEqual) { > return 1; > } > if (allLessOrEqual) { > return -1; > } > return 0; > } > }; > {code} > The example below illustrates the error: > {noformat} > Resource: X Y Z > Bag A: 2 0 2 > Bag B: 1 2 1 > Bag C: 2 2 1 > {noformat} > We can see that A = B, B < C, and C = A which would cause an exception. > There are a couple of routes we can take to solve this. What we really want > is to be able to define partial ordering (comparator does total ordering) so > we can do a topological sort. Additionally, we can give priority to > particular resources (Dominant Resource Fairness). -- This message was sent by Atlassian JIRA (v6.4.14#64029)