[jira] [Commented] (AURORA-1603) Scheduler fails to start after rollback
[ https://issues.apache.org/jira/browse/AURORA-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131580#comment-15131580 ] Maxim Khutornenko commented on AURORA-1603: --- https://reviews.apache.org/r/43172/ > Scheduler fails to start after rollback > --- > > Key: AURORA-1603 > URL: https://issues.apache.org/jira/browse/AURORA-1603 > Project: Aurora > Issue Type: Bug > Components: Scheduler >Reporter: Maxim Khutornenko >Assignee: Maxim Khutornenko >Priority: Critical > > We had to rollback scheduler due to the duplicate instances in the UI and > when tried to restart on the older version > (8d3fb2413306387bc533b1b800bbc97149f96b26) got the following error preventing > scheduler from loading snapshot: > {noformat} > To index multiple values under a key, use Multimaps.index. > at com.google.common.collect.Maps.uniqueIndex(Maps.java:1215) > ~[guava-19.0.jar:na] > at com.google.common.collect.Maps.uniqueIndex(Maps.java:1173) > ~[guava-19.0.jar:na] > at > org.apache.aurora.scheduler.storage.db.TaskConfigManager.getConfigRow(TaskConfigManager.java:46) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.TaskConfigManager.insert(TaskConfigManager.java:57) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbJobUpdateStore.saveJobUpdate(DbJobUpdateStore.java:125) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl$7.restoreFromSnapshot(SnapshotStoreImpl.java:208) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.lambda$applySnapshot$238(SnapshotStoreImpl.java:278) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:137) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:132) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:146) > ~[aurora-113.jar:na] > at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > ~[mybatis-guice-3.7.jar:3.7] > at > org.apache.aurora.scheduler.storage.db.DbStorage.lambda$write$203(DbStorage.java:160) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.async.GatingDelayExecutor.closeDuring(GatingDelayExecutor.java:62) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:158) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:274) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:63) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > ... > {noformat} > We blamed that to fee5943a95c4f08e148dc5f1366486a8c23d5773 and reverted it in > https://reviews.apache.org/r/42922/. I have been unable to reproduce it in > unit tests yet. Need some further investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1603) Scheduler fails to start after rollback
[ https://issues.apache.org/jira/browse/AURORA-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15128776#comment-15128776 ] Maxim Khutornenko commented on AURORA-1603: --- Reversal RB: https://reviews.apache.org/r/43104/ > Scheduler fails to start after rollback > --- > > Key: AURORA-1603 > URL: https://issues.apache.org/jira/browse/AURORA-1603 > Project: Aurora > Issue Type: Bug > Components: Scheduler >Reporter: Maxim Khutornenko >Assignee: Maxim Khutornenko >Priority: Critical > > We had to rollback scheduler due to the duplicate instances in the UI and > when tried to restart on the older version > (8d3fb2413306387bc533b1b800bbc97149f96b26) got the following error preventing > scheduler from loading snapshot: > {noformat} > To index multiple values under a key, use Multimaps.index. > at com.google.common.collect.Maps.uniqueIndex(Maps.java:1215) > ~[guava-19.0.jar:na] > at com.google.common.collect.Maps.uniqueIndex(Maps.java:1173) > ~[guava-19.0.jar:na] > at > org.apache.aurora.scheduler.storage.db.TaskConfigManager.getConfigRow(TaskConfigManager.java:46) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.TaskConfigManager.insert(TaskConfigManager.java:57) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbJobUpdateStore.saveJobUpdate(DbJobUpdateStore.java:125) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl$7.restoreFromSnapshot(SnapshotStoreImpl.java:208) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.lambda$applySnapshot$238(SnapshotStoreImpl.java:278) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:137) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:132) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:146) > ~[aurora-113.jar:na] > at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > ~[mybatis-guice-3.7.jar:3.7] > at > org.apache.aurora.scheduler.storage.db.DbStorage.lambda$write$203(DbStorage.java:160) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.async.GatingDelayExecutor.closeDuring(GatingDelayExecutor.java:62) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:158) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:274) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:63) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > ... > {noformat} > We blamed that to fee5943a95c4f08e148dc5f1366486a8c23d5773 and reverted it in > https://reviews.apache.org/r/42922/. I have been unable to reproduce it in > unit tests yet. Need some further investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1603) Scheduler fails to start after rollback
[ https://issues.apache.org/jira/browse/AURORA-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127270#comment-15127270 ] Maxim Khutornenko commented on AURORA-1603: --- I feel like allowing invalid data in the DB storage is inherently more error prone than maintaining a version-agnostic invariant at the thrift (backfill) level. We may cover one identified scenario but miss others. It's also fairly nonstandard for us to treat thrift version compatibility at this level. I think we should stick to the backfill approach that has proven in practice to work well rather than attempt to fix duplicate data in a customary fashion. > Scheduler fails to start after rollback > --- > > Key: AURORA-1603 > URL: https://issues.apache.org/jira/browse/AURORA-1603 > Project: Aurora > Issue Type: Bug > Components: Scheduler >Reporter: Maxim Khutornenko >Assignee: Maxim Khutornenko >Priority: Critical > > We had to rollback scheduler due to the duplicate instances in the UI and > when tried to restart on the older version > (8d3fb2413306387bc533b1b800bbc97149f96b26) got the following error preventing > scheduler from loading snapshot: > {noformat} > To index multiple values under a key, use Multimaps.index. > at com.google.common.collect.Maps.uniqueIndex(Maps.java:1215) > ~[guava-19.0.jar:na] > at com.google.common.collect.Maps.uniqueIndex(Maps.java:1173) > ~[guava-19.0.jar:na] > at > org.apache.aurora.scheduler.storage.db.TaskConfigManager.getConfigRow(TaskConfigManager.java:46) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.TaskConfigManager.insert(TaskConfigManager.java:57) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbJobUpdateStore.saveJobUpdate(DbJobUpdateStore.java:125) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl$7.restoreFromSnapshot(SnapshotStoreImpl.java:208) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.lambda$applySnapshot$238(SnapshotStoreImpl.java:278) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:137) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:132) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:146) > ~[aurora-113.jar:na] > at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > ~[mybatis-guice-3.7.jar:3.7] > at > org.apache.aurora.scheduler.storage.db.DbStorage.lambda$write$203(DbStorage.java:160) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.async.GatingDelayExecutor.closeDuring(GatingDelayExecutor.java:62) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:158) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:274) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:63) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > ... > {noformat} > We blamed that to fee5943a95c4f08e148dc5f1366486a8c23d5773 and reverted it in > https://reviews.apache.org/r/42922/. I have been unable to reproduce it in > unit tests yet. Need some further investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1603) Scheduler fails to start after rollback
[ https://issues.apache.org/jira/browse/AURORA-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126653#comment-15126653 ] Bill Farner commented on AURORA-1603: - One option worth considering - allow duplicates in this routine, and add a cleanup operation that collapses them. This sidesteps tricky schema evolution logic entirely, and AFAICT little negative impact. > Scheduler fails to start after rollback > --- > > Key: AURORA-1603 > URL: https://issues.apache.org/jira/browse/AURORA-1603 > Project: Aurora > Issue Type: Bug > Components: Scheduler >Reporter: Maxim Khutornenko >Assignee: Maxim Khutornenko >Priority: Critical > > We had to rollback scheduler due to the duplicate instances in the UI and > when tried to restart on the older version > (8d3fb2413306387bc533b1b800bbc97149f96b26) got the following error preventing > scheduler from loading snapshot: > {noformat} > To index multiple values under a key, use Multimaps.index. > at com.google.common.collect.Maps.uniqueIndex(Maps.java:1215) > ~[guava-19.0.jar:na] > at com.google.common.collect.Maps.uniqueIndex(Maps.java:1173) > ~[guava-19.0.jar:na] > at > org.apache.aurora.scheduler.storage.db.TaskConfigManager.getConfigRow(TaskConfigManager.java:46) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.TaskConfigManager.insert(TaskConfigManager.java:57) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbJobUpdateStore.saveJobUpdate(DbJobUpdateStore.java:125) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl$7.restoreFromSnapshot(SnapshotStoreImpl.java:208) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.lambda$applySnapshot$238(SnapshotStoreImpl.java:278) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:137) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:132) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:146) > ~[aurora-113.jar:na] > at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > ~[mybatis-guice-3.7.jar:3.7] > at > org.apache.aurora.scheduler.storage.db.DbStorage.lambda$write$203(DbStorage.java:160) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.async.GatingDelayExecutor.closeDuring(GatingDelayExecutor.java:62) > ~[aurora-113.jar:na] > at > org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:158) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:274) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > at > org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:63) > ~[aurora-113.jar:na] > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > ~[commons-113.jar:na] > ... > {noformat} > We blamed that to fee5943a95c4f08e148dc5f1366486a8c23d5773 and reverted it in > https://reviews.apache.org/r/42922/. I have been unable to reproduce it in > unit tests yet. Need some further investigation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)