On 8/10/2018 7:43 PM, Kevin Wolf wrote: > Am 08.10.2018 um 18:04 hat Anton Nefedov geschrieben: >> >> >> On 8/10/2018 6:46 PM, Kevin Wolf wrote: >>> Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben: >>>> >>>> >>>> On 8/10/2018 6:03 PM, Kevin Wolf wrote: >>>>> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben: >>>>>> On 4/10/2018 6:33 PM, Kevin Wolf wrote: >>>>>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben: >>>>>>>> Signed-off-by: Anton Nefedov <anton.nefe...@virtuozzo.com> >>>>>>>> Reviewed-by: Alberto Garcia <be...@igalia.com> >>>>>>>> --- >>>>>>>> hw/ide/core.c | 12 ++++++++++++ >>>>>>>> 1 file changed, 12 insertions(+) >>>>>>>> >>>>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c >>>>>>>> index 2c62efc..352429b 100644 >>>>>>>> --- a/hw/ide/core.c >>>>>>>> +++ b/hw/ide/core.c >>>>>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int >>>>>>>> ret) >>>>>>>> TrimAIOCB *iocb = opaque; >>>>>>>> IDEState *s = iocb->s; >>>>>>>> >>>>>>>> + if (iocb->i >= 0) { >>>>>>>> + if (ret >= 0) { >>>>>>>> + block_acct_done(blk_get_stats(s->blk), &s->acct); >>>>>>>> + } else { >>>>>>>> + block_acct_failed(blk_get_stats(s->blk), &s->acct); >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> if (ret >= 0) { >>>>>>>> while (iocb->j < iocb->qiov->niov) { >>>>>>>> int j = iocb->j; >>>>>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int >>>>>>>> ret) >>>>>>>> goto done; >>>>>>>> } >>>>>>>> >>>>>>>> + block_acct_start(blk_get_stats(s->blk), &s->acct, >>>>>>>> + count << BDRV_SECTOR_BITS, >>>>>>>> BLOCK_ACCT_UNMAP); >>>>>>>> + >>>>>>>> /* Got an entry! Submit and exit. */ >>>>>>>> iocb->aiocb = blk_aio_pdiscard(s->blk, >>>>>>>> sector << >>>>>>>> BDRV_SECTOR_BITS, >>>>>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret) >>>>>>>> } >>>>>>>> >>>>>>>> if (ret == -EINVAL) { >>>>>>>> + block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP); >>>>>>> >>>>>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but >>>>>>> also for reads and writes, and each of them could return -EINVAL. >>>>>>> >>>>>> >>>>>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :( >>>>>> >>>>>>> Also, -EINVAL doesn't necessarily mean that the guest driver did >>>>>>> something wrong, it could also be the result of a host problem. >>>>>>> Therefore, it isn't right to call block_acct_invalid() here - especially >>>>>>> since the request may already have been accounted for as either done or >>>>>>> failed in ide_issue_trim_cb(). >>>>>>> >>>>>> >>>>>> Couldn't be accounted done with such retcode; >>>>>> and it seems I shouldnt do block_acct_failed() there anyway - or it's >>>>>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error() >>>>>> >>>>>> But if EINVAL (from further layers) should not be accounted as an >>>>>> invalid op, then it should be accounted failed instead, the thing that >>>>>> current code does not do. >>>>>> (and which brings us back to possible double-accounting if we account >>>>>> invalid in ide_issue_trim_cb() ) >>>>> >>>>> Yes, commit caeadbc8ba4 was already wrong in assuming that there is >>>>> only one possible source for -EINVAL. >>>>> >>>>>>> Instead, I think it would be better to immediately account for invalid >>>>>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we >>>>>>> know for sure that indeed !ide_sect_range_ok() is the cause for the >>>>>>> -EINVAL return code. >>>>>>> >>>>>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave >>>>>> acct_failed there, and filter off TRIM commands in the common >>>>>> accounting. >>>>> >>>>> blk_aio_discard() can fail with -EINVAL, too, so getting this error code >>>>> from a TRIM command doesn't mean anything. It can still have multiple >>>>> possible sources. >>>>> >>>> >>>> I meant that common ide_dma_cb() should account EINVAL (along with other >>>> errors) as failed, unless it's TRIM, which means it's already >>>> accounted (either invalid or failed) >>> >>> Oh, you would already account for failure in ide_issue_trim_cb(), too, >>> but only if it's EINVAL? That feels like it would complicate the code >>> quite a bit. >>> >> >> No, no :) ide_issue_trim_cb does the proper accounting (failed/invalid) >> for TRIM. >> Then common path (ide_dma_cb()) does not account TRIM operations at all >> regardless of the error code. No need to check for TRIM specifically if >> we have BLOCK_ACCT_NONE. >> >>> And actually, didn't commit caeadbc8ba4 break werror=stop for requests >>> returning -EINVAL because we don't call ide_handle_rw_error() any more? >>> >> >> Yes. >> >> Read/write ignore werror=stop for invalid range case (not for EINVAL). >> I wonder if it's crucial to ignore it for TRIM too, otherwise we could >> just remove this chunk >> >> if (ret == -EINVAL) { >> ide_dma_error(s); >> return; >> } > > Ah, right, I forgot about this. > > It is actually desirable to avoid stopping for invalid requests because > we should only stop for host errors. An invalid request is a guest error > and stopping the VM will do nothing to fix it. (Resuming the VM would > immediately fail again, so the VM would be locked in paused state.) > > Kevin >
I can't come up with a perfect solution of this. It's hard to reach TRIM sector-ranges from common ide_dma_cb() (which only has QEMUSGList) and it's hard to accurately report range invalidity to ide_dma_cb() with a single retval. I see the following options for invalid range trim: 1. distinguish range invalidity in ide_dma_cb() with some unused errcode instead of -EINVAL - does one exist? 2. use some new global flag on IDEState - every callback (ide_dma_cb, pmac_ide_transfer_cb) must check and clear that 3. parse SGList and check range invalidity separately in ide_dma_cb() - somewhat too intrusive change or put up with it: 4. ignore (account as invalid but do not abort) 5. treat as a host error i.e. call ide_handle_rw_error() and stop VM if werror=stop 6. keep -EINVAL (as done now) and potentially ignore configured error action for the host returned -EINVAL What do you think? /Anton