[ https://issues.apache.org/jira/browse/AURORA-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213179#comment-16213179 ]
Mohit Jaggi commented on AURORA-1952: ------------------------------------- [~jordanly] Is there a test in https://reviews.apache.org/r/63157/ which covers this bug? > race condition in offers by agent id map (and potentially others) > caused(probably) a crash > ------------------------------------------------------------------------------------------ > > Key: AURORA-1952 > URL: https://issues.apache.org/jira/browse/AURORA-1952 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Affects Versions: 0.18.0 > Environment: nothing special > Reporter: Mohit Jaggi > Assignee: Mohit Jaggi > Fix For: 0.18.0 > > > Crashed here > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java#L145 > due to duplicates in map. Most likely a concurrency issue. [~wfarner] pointed > out the following code: > I'm looking at this chunk here, where a concurrent map would not help. > {code:java} > Optional<HostOffer> sameSlave = > hostOffers.get(offer.getOffer().getAgentId()); > if (sameSlave.isPresent()) { > // If there are existing offers for the slave, decline all of them so > the master can > // compact all of those offers into a single offer and send them back. > LOG.info("Returning offers for " + > offer.getOffer().getAgentId().getValue() > + " for compaction."); > decline(offer.getOffer().getId()); > removeAndDecline(sameSlave.get().getOffer().getId()); > } else { > hostOffers.add(offer); > {code} > --------- logs ---------- > {code:java} > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: Sep 28, 2017 6:09:00 PM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: SEVERE: Service > PreemptorService [FAILED] has failed in the RUNNING state. > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: > java.lang.IllegalArgumentException: Multiple entries with same key: > 1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102=HostOffer{offer=id { > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: value: > "f7b84805-a0c5-4405-be77-f7f1b7110405-O56597202" > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: } > ... > ... > ep 28 18:09:00 machine1163 aurora-scheduler[14266]: , > hostAttributes=IHostAttributes{host=compute606-dca1.prod.uber.internal, > attributes=[IAttribute{name=host, values=[compute606-dca1]}, > IAttribute{name=rack, values=[as13]}, IAttribute{name=pod, values=[d]}, > IAttribute{name=dedicated, values=[infra/cassandra]}], mode=NONE, > slaveId=1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102}}. To index multiple values > under a key, use Multimaps.index. > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > com.google.common.collect.Maps.uniqueIndex(Maps.java:1251) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > com.google.common.collect.Maps.uniqueIndex(Maps.java:1208) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(PendingTaskProcessor.java:146) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:562) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:135) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:161) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > com.google.common.util.concurrent.Callables$4.run(Callables.java:122) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at > java.lang.Thread.run(Thread.java:748) > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: E0928 18:09:00.316 > [PreemptorService RUNNING, GuavaUtils$LifecycleShutdownListener:55] Service: > PreemptorService [FAILED] failed unexpectedly. Triggering shutdown. > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 > [qtp1000734462-3068369, Slf4jRequestLog:60] 10.187.28.19 - - > [28/Sep/2017:18:09:00 +0000] "POST //10.188.43.6:8082/api HTTP/1.1" 200 95 > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 > [PreemptorService RUNNING, Lifecycle:84] Shutting down application > Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 > [PreemptorService RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77] > Executing 4 shutdown commands. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)