[ https://issues.apache.org/jira/browse/AURORA-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519805#comment-16519805 ]
Stephan Erb commented on AURORA-1949: ------------------------------------- Recent version of the stacktrace we have been bumping into: {code:java} Jun 20, 2018 2:07:26 PM com.google.common.util.concurrent.ServiceManager$ServiceListener failed SEVERE: Service PreemptorService [FAILED] has failed in the RUNNING state. java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeHi(TimSort.java:899) at java.util.TimSort.mergeAt(TimSort.java:516) at java.util.TimSort.mergeForceCollapse(TimSort.java:457) at java.util.TimSort.sort(TimSort.java:254) at java.util.Arrays.sort(Arrays.java:1512) at java.util.ArrayList.sort(ArrayList.java:1462) at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:387) at java.util.stream.Sink$ChainedReference.end(Sink.java:258) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.aurora.scheduler.preemptor.PreemptionVictimFilter$PreemptionVictimFilterImpl.filterPreemptionVictims(PreemptionVictimFilter.java:194) at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(PendingTaskProcessor.java:185) at org.apache.aurora.scheduler.storage.mem.MemStorage.read(MemStorage.java:90) at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) at org.apache.aurora.scheduler.storage.durability.DurableStorage.read(DurableStorage.java:232) at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:125) at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:141) at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) at org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:184) at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188) at com.google.common.util.concurrent.Callables$4.run(Callables.java:122) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) E0620 14:07:26.743 [PreemptorService RUNNING, GuavaUtils$LifecycleShutdownListener] Service: PreemptorService [FAILED] failed unexpectedly. Triggering shutdown. {code} > PreemptionVictimFilterImpl comparator violates transitivity causing exceptions > ------------------------------------------------------------------------------ > > Key: AURORA-1949 > URL: https://issues.apache.org/jira/browse/AURORA-1949 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Reporter: Jordan Ly > Assignee: Jordan Ly > Priority: Critical > > The PreemptionVictimFilterImpl uses a comparator to sort ResourceBags in > order to preempt the biggest tasks first when searching for a victim. > However, the current implementation causes an exception which causes the > Scheduler to fail: > {noformat} > SEVERE: Service PreemptorService [FAILED] has failed in the RUNNING state. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeLo(TimSort.java:777) > at java.util.TimSort.mergeAt(TimSort.java:514) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1438) > at > com.google.common.collect.Ordering.immutableSortedCopy(Ordering.java:882) > at > org.apache.aurora.scheduler.preemptor.PreemptionVictimFilter$PreemptionVictimFilterImpl.filterPreemptionVictims(PreemptionVictimFilter.java:210) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(PendingTaskProcessor.java:178) > at > org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147) > at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > at > org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:562) > at > org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:135) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > at > org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:205) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188) > at > com.google.common.util.concurrent.Callables$4.run(Callables.java:122) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Looking at the code, it seems it violates transitivity: > {code:java} > @VisibleForTesting > static final Ordering<ResourceBag> ORDER = new Ordering<ResourceBag>() { > @Override > public int compare(ResourceBag left, ResourceBag right) { > Set<ResourceType> types = ImmutableSet.<ResourceType>builder() > .addAll(left.streamResourceVectors().map(e -> > e.getKey()).iterator()) > .addAll(right.streamResourceVectors().map(e -> > e.getKey()).iterator()) > .build(); > boolean allZero = true; > boolean allGreaterOrEqual = true; > boolean allLessOrEqual = true; > for (ResourceType type : types) { > int compare = left.valueOf(type).compareTo(right.valueOf(type)); > if (compare != 0) { > allZero = false; > } > if (compare < 0) { > allGreaterOrEqual = false; > } > if (compare > 0) { > allLessOrEqual = false; > } > } > if (allZero) { > return 0; > } > if (allGreaterOrEqual) { > return 1; > } > if (allLessOrEqual) { > return -1; > } > return 0; > } > }; > {code} > The example below illustrates the error: > {noformat} > Resource: X Y Z > Bag A: 2 0 2 > Bag B: 1 2 1 > Bag C: 2 2 1 > {noformat} > We can see that A = B, B < C, and C = A which would cause an exception. > There are a couple of routes we can take to solve this. What we really want > is to be able to define partial ordering (comparator does total ordering) so > we can do a topological sort. Additionally, we can give priority to > particular resources (Dominant Resource Fairness). -- This message was sent by Atlassian JIRA (v7.6.3#76005)