The River QA tests definitely do not include any RetryTask retries. This is a serious issue, because tasks can get reordered during retries in ways that would not be permitted if every RetryTask succeeded first time.

I am thinking about ways of forcing retries for test coverage. I have two main strategies in mind:

1. Add a configuration parameter to force RetryTask to not run the payload code the first time for some percentage of tasks, and treat those tasks as having failed once.

2. Keep the tests as they are, but add a high priority workload that will cause random periods of extreme load for minutes at a time, so that operations may time out naturally. This may shake loose other concurrency bugs, if they exist.

Any other ideas? Preferences? Comments?

Patricia

Reply via email to