[ https://issues.apache.org/jira/browse/AURORA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111055#comment-15111055 ]
Bill Farner commented on AURORA-1580: ------------------------------------- {quote} Is there a reason they were done as a cascading delete + re-insert to begin with? Was it just for the sake of not having to diff the task to see what had changed? {quote} That's effectively it. The storage API is non-specific about what is being changed in the {{IScheduledTask}} object tree, essentially a PUT. While we could implement that, i think specific mutation verbs (of which there are few) is much more straightforward. > java.util.NoSuchElementException from Tasks.getLatestEvent with DbTaskStore > --------------------------------------------------------------------------- > > Key: AURORA-1580 > URL: https://issues.apache.org/jira/browse/AURORA-1580 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Reporter: Zameer Manji > Assignee: Zameer Manji > > I have discovered the following exception from a scheduler that is running > off master with the beta task store enabled. > {noformat} > E0113 22:51:55.941 [AsyncProcessor-2, AsyncUtil:123] > java.util.concurrent.ExecutionException: java.util.NoSuchElementException > java.util.concurrent.ExecutionException: java.util.NoSuchElementException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > ~[na:1.8.0_66-Tw8r9b1] > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > ~[na:1.8.0_66-Tw8r9b1] > at > org.apache.aurora.scheduler.base.AsyncUtil.evaluateResult(AsyncUtil.java:118) > [aurora-110.jar:na] > at > org.apache.aurora.scheduler.base.AsyncUtil.access$000(AsyncUtil.java:32) > [aurora-110.jar:na] > at > org.apache.aurora.scheduler.base.AsyncUtil$1.afterExecute(AsyncUtil.java:59) > [aurora-110.jar:na] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150) > [na:1.8.0_66-Tw8r9b1] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_66-Tw8r9b1] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66-Tw8r9b1] > Caused by: java.util.NoSuchElementException: null > at com.google.common.collect.Iterables.getLast(Iterables.java:784) > ~[guava-19.0.jar:na] > at > org.apache.aurora.scheduler.base.Tasks.getLatestEvent(Tasks.java:149) > ~[aurora-110.jar:na] > at org.apache.aurora.scheduler.base.Tasks$1.apply(Tasks.java:156) > ~[aurora-110.jar:na] > at org.apache.aurora.scheduler.base.Tasks$1.apply(Tasks.java:153) > ~[aurora-110.jar:na] > at > com.google.common.collect.ByFunctionOrdering.compare(ByFunctionOrdering.java:45) > ~[guava-19.0.jar:na] > at java.util.TimSort.binarySort(TimSort.java:296) > ~[na:1.8.0_66-Tw8r9b1] > at java.util.TimSort.sort(TimSort.java:239) ~[na:1.8.0_66-Tw8r9b1] > at java.util.Arrays.sort(Arrays.java:1438) ~[na:1.8.0_66-Tw8r9b1] > at com.google.common.collect.Ordering.sortedCopy(Ordering.java:860) > ~[guava-19.0.jar:na] > at > org.apache.aurora.scheduler.pruning.TaskHistoryPruner.lambda$registerInactiveTask$20(TaskHistoryPruner.java:156) > ~[aurora-110.jar:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_66-Tw8r9b1] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_66-Tw8r9b1] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > ~[na:1.8.0_66-Tw8r9b1] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > ~[na:1.8.0_66-Tw8r9b1] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_66-Tw8r9b1] > ... 2 common frames omitted > {noformat} > Similar exception occurs within the preemptor and causes the scheduler to > crash: > {noformat} > E0113 01:43:06.242 THREAD5037 > com.google.common.util.concurrent.ServiceManager$ServiceListener.failed: > Service PreemptorService [FAILED] has failed in the RUNNING state. > java.util.NoSuchElementException > at com.google.common.collect.Iterables.getLast(Iterables.java:784) > at > org.apache.aurora.scheduler.base.Tasks.getLatestEvent(Tasks.java:149) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor$3.apply(PendingTaskProcessor.java:237) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor$3.apply(PendingTaskProcessor.java:234) > at > com.google.common.base.Predicates$AndPredicate.apply(Predicates.java:374) > at > com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at > com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:43) > at com.google.common.collect.Iterators.addAll(Iterators.java:364) > at com.google.common.collect.Iterables.addAll(Iterables.java:352) > at com.google.common.collect.HashMultiset.create(HashMultiset.java:66) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.fetchIdlePendingGroups(PendingTaskProcessor.java:183) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$140(PendingTaskProcessor.java:144) > at > org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:138) > at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > at > org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:570) > at > org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113) > at > org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:119) > at > org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:145) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) > at > com.google.common.util.concurrent.Callables$3.run(Callables.java:100) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Background: > I discovered this while running a load test on a small test cluster. This > exception kept on occurring, preventing any tasks from being pruned from the > dbtask store. This exception also kept on killing the preemptor, causing a > new scheduler to be elected. Eventually the number of tasks stored for this > job reached 8k+ and there was a general slowdown observed across the entire > system. After a certain point, the scheduler wasn't able to register within > the 1minute registration timeout, causing the entire cluster to stop working > until I raised the timeout to 5 minutes, and then killed the job preventing > more tasks from being created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)