Am 29.05.2014 19:47, schrieb Paolo Bonzini: > Il 29/05/2014 19:31, Peter Lieven ha scritto: >> +static const unsigned iscsi_retry_times[] = {4, 16, 64, 256, 1024, 4096}; > > This is a pretty fast growth.
I know, I wanted to end up with a total retry period of approx. 10 seconds before failing. As we allow only for 5 retries smaller growth would result in less total time. > >> +static void iscsi_retry_timer_expired(void *opaque) >> +{ >> + struct IscsiTask *iTask = opaque; >> + timer_del(iTask->retry_timer); > > This timer_del is not necessary. Also, since you are introducing a new usage > I would prefer if you used aio_timer_init instead. Okay, I didn't know that aio_timer_new was deprecated. timer_del is not necessary because we delete the timer in the callback and can be sure that it is not armed? > >> + timer_free(iTask->retry_timer); >> + if (iTask->co) { >> + qemu_coroutine_enter(iTask->co, NULL); >> + } >> +} >> + >> +#define RANDRANGE(M, N) (M + rand() / (RAND_MAX / (N - M + 1) + 1)) > > A minor nit: you should probably use "((double)rand() / RAND_MAX) and then > cast it back to integer at the end. You should also make it a static inline > function for readability, or surround the arguments with parentheses. > > If you care, it probably would be better to have the random numbers be > exponentially distributed, even more so since the ranges form a geometric > progression. So for example the first retry would be on average 8 ms rather > than 10 ms. The difference for the last retry is half a second! I hope we never get in the situation that we need more than one retry. I just wanted to give the storage a chance to calm down and in this case a static value would be either to high for the first retry or to low for the higher retries if ever reached. If you want I can include this one here which seems to work: static inline unsigned geo_rand_range(double m, double n) { return exp((log(m) + (double)rand() / (RAND_MAX / (log(n) - log(m) + log(1)) + log(1)))); } > > You can do that by taking the (natural) logarithm of N and M, applying > randrange on the logarithms, and then applying exp on the result. > > Otherwise looks good, but how did you test the patch? I'd rather have the > growth in iscsi_retry_times backed by actual data, but I understand that's > hard to gather. I don't know how actually. In my use case a see a single BUSY once every few hours on a group runnning new storage firmware with a few hundred targets. For testing this I randomly created BUSY conditions and overwrote the status. Peter > > Paolo > >> static void >> iscsi_co_generic_cb(struct iscsi_context *iscsi, int status, >> void *command_data, void *opaque) >> @@ -156,20 +172,40 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int >> status, >> iTask->do_retry = 0; >> iTask->task = task; >> >> - if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION >> - && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) { >> - error_report("iSCSI CheckCondition: %s", iscsi_get_error(iscsi)); >> - iTask->do_retry = 1; >> - goto out; >> - } >> - >> if (status != SCSI_STATUS_GOOD) { >> + if (iTask->retries++ < ISCSI_CMD_RETRIES) { >> + if (status == SCSI_STATUS_CHECK_CONDITION >> + && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) { >> + error_report("iSCSI CheckCondition: %s", >> + iscsi_get_error(iscsi)); >> + iTask->do_retry = 1; >> + goto out; >> + } >> + if (status == SCSI_STATUS_BUSY) { >> + unsigned retry_time = >> + RANDRANGE(iscsi_retry_times[iTask->retries - 1], >> + iscsi_retry_times[iTask->retries]); >> + error_report("iSCSI Busy (retry #%u in %u ms): %s", >> + iTask->retries, retry_time, >> + iscsi_get_error(iscsi)); >> + iTask->retry_timer = >> aio_timer_new(iTask->iscsilun->aio_context, >> + QEMU_CLOCK_REALTIME, >> + SCALE_MS, >> + >> iscsi_retry_timer_expired, >> + iTask); >> + timer_mod(iTask->retry_timer, >> + qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + >> retry_time); >