Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-16 Thread Peter Lieven
Am 14.10.2015 um 20:21 schrieb John Snow:
>
> On 10/14/2015 02:19 PM, Peter Lieven wrote:
>> Am 08.10.2015 um 18:44 schrieb John Snow:
>>> On 10/08/2015 08:06 AM, Peter Lieven wrote:
 Hi all,

 short summary from my side. The whole thing seems to get complicated,
 let me explain why:

 1) During review I found that the code in ide_atapi_cmd_reply_end can't
 work correctly if the
 byte_count_limit is not a divider or a multiple of cd_sector_size. The
 reason is that as soon
 as we load the next sector we start at io_buffer offset 0 overwriting
 whatever is left in there
 for transfer. We also reset the io_buffer_index to 0 which means if we
 continue with the
 elementary transfer we always transfer a whole sector (of corrupt data)
 regardless if we
 are allowed to transfer that much data. Before we consider fixing this I
 wonder if it
 is legal at all to have an unaligned byte_count_limit. It obviously has
 never caused trouble in
 practice so maybe its not happening in real life.

>>> I had overlooked that part. Good catch. I do suspect that in practice
>>> nobody will be asking for bizarre values.
>>>
>>> There's no rule against an unaligned byte_count_limit as far as I have
>>> read, but suspect nobody would have a reason to use it in practice.
>>>
 2) I found that whatever cool optimization I put in to buffer multiple
 sectors at once I end
 up with code that breaks migration because older versions would either
 not fill the io_buffer
 as expected or we introduce variables that older versions do not
 understand. This will
 lead to problems if we migrate in the middle of a transfer.

>>> Ech. This sounds like a bit of a problem. I'll need to think about this
>>> one...
>>>
 3) My current plan to get this patch to a useful state would be to use
 my initial patch and just
 change the code to use a sync request if we need to buffer additional
 sectors in an elementary
 transfer. I found that in real world operating systems the
 byte_count_limit seems to be equal to
 the cd_sector_size. After all its just a PIO transfer an operating
 system will likely switch to DMA
 as soon as the kernel ist loaded.

 Thanks,
 Peter

>>> It sounds like that might be "good enough" for now, and won't make
>>> behavior *worse* than it currently is. You can adjust the test I had
>>> checked in to not use a "tricky" value and we can amend support for this
>>> later if desired.
>> Have you had a chance to look at the series with the "good enough" fix?
>>
>> Thanks,
>> Peter
>>
> Will do so Friday, thanks!

Thank you,
Peter




Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-14 Thread John Snow


On 10/14/2015 02:19 PM, Peter Lieven wrote:
> Am 08.10.2015 um 18:44 schrieb John Snow:
>>
>> On 10/08/2015 08:06 AM, Peter Lieven wrote:
>>> Hi all,
>>>
>>> short summary from my side. The whole thing seems to get complicated,
>>> let me explain why:
>>>
>>> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
>>> work correctly if the
>>> byte_count_limit is not a divider or a multiple of cd_sector_size. The
>>> reason is that as soon
>>> as we load the next sector we start at io_buffer offset 0 overwriting
>>> whatever is left in there
>>> for transfer. We also reset the io_buffer_index to 0 which means if we
>>> continue with the
>>> elementary transfer we always transfer a whole sector (of corrupt data)
>>> regardless if we
>>> are allowed to transfer that much data. Before we consider fixing this I
>>> wonder if it
>>> is legal at all to have an unaligned byte_count_limit. It obviously has
>>> never caused trouble in
>>> practice so maybe its not happening in real life.
>>>
>> I had overlooked that part. Good catch. I do suspect that in practice
>> nobody will be asking for bizarre values.
>>
>> There's no rule against an unaligned byte_count_limit as far as I have
>> read, but suspect nobody would have a reason to use it in practice.
>>
>>> 2) I found that whatever cool optimization I put in to buffer multiple
>>> sectors at once I end
>>> up with code that breaks migration because older versions would either
>>> not fill the io_buffer
>>> as expected or we introduce variables that older versions do not
>>> understand. This will
>>> lead to problems if we migrate in the middle of a transfer.
>>>
>> Ech. This sounds like a bit of a problem. I'll need to think about this
>> one...
>>
>>> 3) My current plan to get this patch to a useful state would be to use
>>> my initial patch and just
>>> change the code to use a sync request if we need to buffer additional
>>> sectors in an elementary
>>> transfer. I found that in real world operating systems the
>>> byte_count_limit seems to be equal to
>>> the cd_sector_size. After all its just a PIO transfer an operating
>>> system will likely switch to DMA
>>> as soon as the kernel ist loaded.
>>>
>>> Thanks,
>>> Peter
>>>
>> It sounds like that might be "good enough" for now, and won't make
>> behavior *worse* than it currently is. You can adjust the test I had
>> checked in to not use a "tricky" value and we can amend support for this
>> later if desired.
> 
> Have you had a chance to look at the series with the "good enough" fix?
> 
> Thanks,
> Peter
> 

Will do so Friday, thanks!

--js



Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-14 Thread Peter Lieven
Am 08.10.2015 um 18:44 schrieb John Snow:
>
> On 10/08/2015 08:06 AM, Peter Lieven wrote:
>> Hi all,
>>
>> short summary from my side. The whole thing seems to get complicated,
>> let me explain why:
>>
>> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
>> work correctly if the
>> byte_count_limit is not a divider or a multiple of cd_sector_size. The
>> reason is that as soon
>> as we load the next sector we start at io_buffer offset 0 overwriting
>> whatever is left in there
>> for transfer. We also reset the io_buffer_index to 0 which means if we
>> continue with the
>> elementary transfer we always transfer a whole sector (of corrupt data)
>> regardless if we
>> are allowed to transfer that much data. Before we consider fixing this I
>> wonder if it
>> is legal at all to have an unaligned byte_count_limit. It obviously has
>> never caused trouble in
>> practice so maybe its not happening in real life.
>>
> I had overlooked that part. Good catch. I do suspect that in practice
> nobody will be asking for bizarre values.
>
> There's no rule against an unaligned byte_count_limit as far as I have
> read, but suspect nobody would have a reason to use it in practice.
>
>> 2) I found that whatever cool optimization I put in to buffer multiple
>> sectors at once I end
>> up with code that breaks migration because older versions would either
>> not fill the io_buffer
>> as expected or we introduce variables that older versions do not
>> understand. This will
>> lead to problems if we migrate in the middle of a transfer.
>>
> Ech. This sounds like a bit of a problem. I'll need to think about this
> one...
>
>> 3) My current plan to get this patch to a useful state would be to use
>> my initial patch and just
>> change the code to use a sync request if we need to buffer additional
>> sectors in an elementary
>> transfer. I found that in real world operating systems the
>> byte_count_limit seems to be equal to
>> the cd_sector_size. After all its just a PIO transfer an operating
>> system will likely switch to DMA
>> as soon as the kernel ist loaded.
>>
>> Thanks,
>> Peter
>>
> It sounds like that might be "good enough" for now, and won't make
> behavior *worse* than it currently is. You can adjust the test I had
> checked in to not use a "tricky" value and we can amend support for this
> later if desired.

Have you had a chance to look at the series with the "good enough" fix?

Thanks,
Peter




Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-09 Thread Kevin Wolf
Am 08.10.2015 um 18:44 hat John Snow geschrieben:
> On 10/08/2015 08:06 AM, Peter Lieven wrote:
> > Hi all,
> > 
> > short summary from my side. The whole thing seems to get complicated,
> > let me explain why:
> > 
> > 1) During review I found that the code in ide_atapi_cmd_reply_end can't
> > work correctly if the
> > byte_count_limit is not a divider or a multiple of cd_sector_size. The
> > reason is that as soon
> > as we load the next sector we start at io_buffer offset 0 overwriting
> > whatever is left in there
> > for transfer. We also reset the io_buffer_index to 0 which means if we
> > continue with the
> > elementary transfer we always transfer a whole sector (of corrupt data)
> > regardless if we
> > are allowed to transfer that much data. Before we consider fixing this I
> > wonder if it
> > is legal at all to have an unaligned byte_count_limit. It obviously has
> > never caused trouble in
> > practice so maybe its not happening in real life.
> > 
> 
> I had overlooked that part. Good catch. I do suspect that in practice
> nobody will be asking for bizarre values.
> 
> There's no rule against an unaligned byte_count_limit as far as I have
> read, but suspect nobody would have a reason to use it in practice.

If we know that our code is technically wrong, "nobody uses it" is not a
good reason not to fix it. Because sooner or later someone will use it.

> > 2) I found that whatever cool optimization I put in to buffer multiple
> > sectors at once I end
> > up with code that breaks migration because older versions would either
> > not fill the io_buffer
> > as expected or we introduce variables that older versions do not
> > understand. This will
> > lead to problems if we migrate in the middle of a transfer.
> > 
> 
> Ech. This sounds like a bit of a problem. I'll need to think about this
> one...

Sounds like a textbook example for subsections to me?

Kevin



Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-08 Thread Peter Lieven

Hi all,

short summary from my side. The whole thing seems to get complicated, let me 
explain why:

1) During review I found that the code in ide_atapi_cmd_reply_end can't work 
correctly if the
byte_count_limit is not a divider or a multiple of cd_sector_size. The reason 
is that as soon
as we load the next sector we start at io_buffer offset 0 overwriting whatever 
is left in there
for transfer. We also reset the io_buffer_index to 0 which means if we continue 
with the
elementary transfer we always transfer a whole sector (of corrupt data) 
regardless if we
are allowed to transfer that much data. Before we consider fixing this I wonder 
if it
is legal at all to have an unaligned byte_count_limit. It obviously has never 
caused trouble in
practice so maybe its not happening in real life.

2) I found that whatever cool optimization I put in to buffer multiple sectors 
at once I end
up with code that breaks migration because older versions would either not fill 
the io_buffer
as expected or we introduce variables that older versions do not understand. 
This will
lead to problems if we migrate in the middle of a transfer.

3) My current plan to get this patch to a useful state would be to use my 
initial patch and just
change the code to use a sync request if we need to buffer additional sectors 
in an elementary
transfer. I found that in real world operating systems the byte_count_limit 
seems to be equal to
the cd_sector_size. After all its just a PIO transfer an operating system will 
likely switch to DMA
as soon as the kernel ist loaded.

Thanks,
Peter




Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-08 Thread John Snow


On 10/08/2015 08:06 AM, Peter Lieven wrote:
> Hi all,
> 
> short summary from my side. The whole thing seems to get complicated,
> let me explain why:
> 
> 1) During review I found that the code in ide_atapi_cmd_reply_end can't
> work correctly if the
> byte_count_limit is not a divider or a multiple of cd_sector_size. The
> reason is that as soon
> as we load the next sector we start at io_buffer offset 0 overwriting
> whatever is left in there
> for transfer. We also reset the io_buffer_index to 0 which means if we
> continue with the
> elementary transfer we always transfer a whole sector (of corrupt data)
> regardless if we
> are allowed to transfer that much data. Before we consider fixing this I
> wonder if it
> is legal at all to have an unaligned byte_count_limit. It obviously has
> never caused trouble in
> practice so maybe its not happening in real life.
> 

I had overlooked that part. Good catch. I do suspect that in practice
nobody will be asking for bizarre values.

There's no rule against an unaligned byte_count_limit as far as I have
read, but suspect nobody would have a reason to use it in practice.

> 2) I found that whatever cool optimization I put in to buffer multiple
> sectors at once I end
> up with code that breaks migration because older versions would either
> not fill the io_buffer
> as expected or we introduce variables that older versions do not
> understand. This will
> lead to problems if we migrate in the middle of a transfer.
> 

Ech. This sounds like a bit of a problem. I'll need to think about this
one...

> 3) My current plan to get this patch to a useful state would be to use
> my initial patch and just
> change the code to use a sync request if we need to buffer additional
> sectors in an elementary
> transfer. I found that in real world operating systems the
> byte_count_limit seems to be equal to
> the cd_sector_size. After all its just a PIO transfer an operating
> system will likely switch to DMA
> as soon as the kernel ist loaded.
> 
> Thanks,
> Peter
> 

It sounds like that might be "good enough" for now, and won't make
behavior *worse* than it currently is. You can adjust the test I had
checked in to not use a "tricky" value and we can amend support for this
later if desired.



Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-07 Thread Kevin Wolf
Am 06.10.2015 um 17:54 hat John Snow geschrieben:
> 
> 
> On 10/06/2015 04:57 AM, Kevin Wolf wrote:
> > Am 05.10.2015 um 23:15 hat John Snow geschrieben:
> >>
> >>
> >> On 09/21/2015 08:25 AM, Peter Lieven wrote:
> >>> PIO read requests on the ATAPI interface used to be sync blk requests.
> >>> This has to siginificant drawbacks. First the main loop hangs util an
> >>> I/O request is completed and secondly if the I/O request does not
> >>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> >>>
> >>> Signed-off-by: Peter Lieven 
> >>> ---
> >>>  hw/ide/atapi.c | 69 
> >>> --
> >>>  1 file changed, 43 insertions(+), 26 deletions(-)
> >>>
> >>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> >>> index 747f466..9257e1c 100644
> >>> --- a/hw/ide/atapi.c
> >>> +++ b/hw/ide/atapi.c
> >>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
> >>>  memset(buf, 0, 288);
> >>>  }
> >>>  
> >>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int 
> >>> sector_size)
> >>> +static void cd_read_sector_cb(void *opaque, int ret)
> >>>  {
> >>> -int ret;
> >>> +IDEState *s = opaque;
> >>>  
> >>> -switch(sector_size) {
> >>> -case 2048:
> >>> -block_acct_start(blk_get_stats(s->blk), >acct,
> >>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> >>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> >>> -block_acct_done(blk_get_stats(s->blk), >acct);
> >>> -break;
> >>> -case 2352:
> >>> -block_acct_start(blk_get_stats(s->blk), >acct,
> >>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> >>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> >>> -block_acct_done(blk_get_stats(s->blk), >acct);
> >>> -if (ret < 0)
> >>> -return ret;
> >>> -cd_data_to_raw(buf, lba);
> >>> -break;
> >>> -default:
> >>> -ret = -EIO;
> >>> -break;
> >>> +block_acct_done(blk_get_stats(s->blk), >acct);
> >>> +
> >>> +if (ret < 0) {
> >>> +ide_atapi_io_error(s, ret);
> >>> +return;
> >>> +}
> >>> +
> >>> +if (s->cd_sector_size == 2352) {
> >>> +cd_data_to_raw(s->io_buffer, s->lba);
> >>>  }
> >>> -return ret;
> >>> +
> >>> +s->lba++;
> >>> +s->io_buffer_index = 0;
> >>> +s->status &= ~BUSY_STAT;
> >>> +
> >>> +ide_atapi_cmd_reply_end(s);
> >>> +}
> >>> +
> >>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int 
> >>> sector_size)
> >>> +{
> >>> +if (sector_size != 2048 && sector_size != 2352) {
> >>> +return -EINVAL;
> >>> +}
> >>> +
> >>> +s->iov.iov_base = buf;
> >>> +if (sector_size == 2352) {
> >>> +buf += 4;
> >>> +}
> > 
> > This doesn't look quite right, buf is never read after this.
> > 
> > Also, why +=4 when it was originally buf + 16?
> > 
> >>> +
> >>> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> >>> +qemu_iovec_init_external(>qiov, >iov, 1);
> >>> +
> >>> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
> >>> +  cd_read_sector_cb, s) == NULL) {
> >>> +return -EIO;
> >>> +}
> >>> +
> >>> +block_acct_start(blk_get_stats(s->blk), >acct,
> >>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> >>> +s->status |= BUSY_STAT;
> >>> +return 0;
> >>>  }
> >>>  
> >>
> >> We discussed this off-list a bit, but for upstream synchronization:
> >>
> >> Unfortunately, I believe making cd_read_sector here non-blocking makes
> >> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
> >> s->end_transfer_func() nonblocking, which functions like ide_data_readw
> >> are not prepared to cope with.
> > 
> > I don't think that's a problem as long as BSY is set while the
> > asynchronous command is running and DRQ is cleared. The latter will
> > protect ide_data_readw(). ide_sector_read() does essentially the same
> > thing.
> > 
> > Or maybe I'm just missing what you're trying to say.
> > 
> 
> It will be correct from a code standpoint, but I don't think the guest
> *expects* DRQ to become re-set before byte_count_limit is exhausted.

Oh, I misunderstood what you're after. Yes, I think you're right. The
guest most probably uses string I/O instructions like 'rep insw' in
order to transfer the whole block, i.e. it doesn't even check the status
register in between and will simply transfer invalid data (zeros) while
DRQ isn't set.

> In the synchronous version of the code, DRQ flickers while we rebuffer
> s->io_buffer, but since it's synchronous, the guest *never sees this*.

Thanks, that the current code would be wrong if it weren't synchronous
is the part I missed.

> The guest does not necessarily have any reason or motivation to check if
> DRQ is still set after 2048 bytes -- is that recommended in the spec?
> 
> ("Warning! The drive may decide to rebuffer 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-07 Thread Peter Lieven
Am 07.10.2015 um 18:42 schrieb John Snow:
>
> On 10/06/2015 04:46 AM, Peter Lieven wrote:
>> Am 05.10.2015 um 23:15 schrieb John Snow:
>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
 PIO read requests on the ATAPI interface used to be sync blk requests.
 This has to siginificant drawbacks. First the main loop hangs util an
 I/O request is completed and secondly if the I/O request does not
 complete (e.g. due to an unresponsive storage) Qemu hangs completely.

 Signed-off-by: Peter Lieven 
 ---
   hw/ide/atapi.c | 69
 --
   1 file changed, 43 insertions(+), 26 deletions(-)

 diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
 index 747f466..9257e1c 100644
 --- a/hw/ide/atapi.c
 +++ b/hw/ide/atapi.c
 @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
   memset(buf, 0, 288);
   }
   -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
 sector_size)
 +static void cd_read_sector_cb(void *opaque, int ret)
   {
 -int ret;
 +IDEState *s = opaque;
   -switch(sector_size) {
 -case 2048:
 -block_acct_start(blk_get_stats(s->blk), >acct,
 - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
 -block_acct_done(blk_get_stats(s->blk), >acct);
 -break;
 -case 2352:
 -block_acct_start(blk_get_stats(s->blk), >acct,
 - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
 -block_acct_done(blk_get_stats(s->blk), >acct);
 -if (ret < 0)
 -return ret;
 -cd_data_to_raw(buf, lba);
 -break;
 -default:
 -ret = -EIO;
 -break;
 +block_acct_done(blk_get_stats(s->blk), >acct);
 +
 +if (ret < 0) {
 +ide_atapi_io_error(s, ret);
 +return;
 +}
 +
 +if (s->cd_sector_size == 2352) {
 +cd_data_to_raw(s->io_buffer, s->lba);
   }
 -return ret;
 +
 +s->lba++;
 +s->io_buffer_index = 0;
 +s->status &= ~BUSY_STAT;
 +
 +ide_atapi_cmd_reply_end(s);
 +}
 +
 +static int cd_read_sector(IDEState *s, int lba, void *buf, int
 sector_size)
 +{
 +if (sector_size != 2048 && sector_size != 2352) {
 +return -EINVAL;
 +}
 +
 +s->iov.iov_base = buf;
 +if (sector_size == 2352) {
 +buf += 4;
 +}
 +
 +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
 +qemu_iovec_init_external(>qiov, >iov, 1);
 +
 +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
 +  cd_read_sector_cb, s) == NULL) {
 +return -EIO;
 +}
 +
 +block_acct_start(blk_get_stats(s->blk), >acct,
 + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 +s->status |= BUSY_STAT;
 +return 0;
   }
   
>>> We discussed this off-list a bit, but for upstream synchronization:
>>>
>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>> are not prepared to cope with.
>>>
>>> My suggestion is to buffer an entire DRQ block of data at once
>>> (byte_count_limit) to avoid the problem.
>> Hi John,
>>
>> first of all thank you for the detailed analysis.
>>
>> Is the following what you have i mind. For PIO reads > 1 sector
>> it is a great improvement for the NFS backend:
>>
>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>> index ab45495..ec2ba89 100644
>> --- a/hw/ide/atapi.c
>> +++ b/hw/ide/atapi.c
>> @@ -117,37 +117,40 @@ static void cd_read_sector_cb(void *opaque, int ret)
>>  }
>>
>>  if (s->cd_sector_size == 2352) {
>> -cd_data_to_raw(s->io_buffer, s->lba);
>> +int nb_sectors = s->packet_transfer_size / 2352;
>> +while (nb_sectors--) {
>> +memmove(s->io_buffer + nb_sectors * 2352 + 4,
>> +s->io_buffer + nb_sectors * 2048, 2048);
>> +cd_data_to_raw(s->io_buffer + nb_sectors * 2352,
>> +   s->lba + nb_sectors);
>> +}
>>  }
> Is this going to be correct in cases where the number of sectors we are
> copying is less than the total request size? We might need to bookmark
> how many sectors/bytes we're copying this go-around. Perhaps by looking
> at lcyl/hcyl.

It needs some adjustments, like the whole copying logic. We need
a read and a write offset.

And, of course, it should read +16 and not +4 in the memmove
line as Kevin pointed out.

My current plan is to rebuffer if not all data 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-07 Thread John Snow


On 10/06/2015 04:46 AM, Peter Lieven wrote:
> Am 05.10.2015 um 23:15 schrieb John Snow:
>>
>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>> This has to siginificant drawbacks. First the main loop hangs util an
>>> I/O request is completed and secondly if the I/O request does not
>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>
>>> Signed-off-by: Peter Lieven 
>>> ---
>>>   hw/ide/atapi.c | 69
>>> --
>>>   1 file changed, 43 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index 747f466..9257e1c 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>   memset(buf, 0, 288);
>>>   }
>>>   -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>> sector_size)
>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>   {
>>> -int ret;
>>> +IDEState *s = opaque;
>>>   -switch(sector_size) {
>>> -case 2048:
>>> -block_acct_start(blk_get_stats(s->blk), >acct,
>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>> -block_acct_done(blk_get_stats(s->blk), >acct);
>>> -break;
>>> -case 2352:
>>> -block_acct_start(blk_get_stats(s->blk), >acct,
>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>> -block_acct_done(blk_get_stats(s->blk), >acct);
>>> -if (ret < 0)
>>> -return ret;
>>> -cd_data_to_raw(buf, lba);
>>> -break;
>>> -default:
>>> -ret = -EIO;
>>> -break;
>>> +block_acct_done(blk_get_stats(s->blk), >acct);
>>> +
>>> +if (ret < 0) {
>>> +ide_atapi_io_error(s, ret);
>>> +return;
>>> +}
>>> +
>>> +if (s->cd_sector_size == 2352) {
>>> +cd_data_to_raw(s->io_buffer, s->lba);
>>>   }
>>> -return ret;
>>> +
>>> +s->lba++;
>>> +s->io_buffer_index = 0;
>>> +s->status &= ~BUSY_STAT;
>>> +
>>> +ide_atapi_cmd_reply_end(s);
>>> +}
>>> +
>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>> sector_size)
>>> +{
>>> +if (sector_size != 2048 && sector_size != 2352) {
>>> +return -EINVAL;
>>> +}
>>> +
>>> +s->iov.iov_base = buf;
>>> +if (sector_size == 2352) {
>>> +buf += 4;
>>> +}
>>> +
>>> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>> +qemu_iovec_init_external(>qiov, >iov, 1);
>>> +
>>> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
>>> +  cd_read_sector_cb, s) == NULL) {
>>> +return -EIO;
>>> +}
>>> +
>>> +block_acct_start(blk_get_stats(s->blk), >acct,
>>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> +s->status |= BUSY_STAT;
>>> +return 0;
>>>   }
>>>   
>> We discussed this off-list a bit, but for upstream synchronization:
>>
>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>> are not prepared to cope with.
>>
>> My suggestion is to buffer an entire DRQ block of data at once
>> (byte_count_limit) to avoid the problem.
> 
> Hi John,
> 
> first of all thank you for the detailed analysis.
> 
> Is the following what you have i mind. For PIO reads > 1 sector
> it is a great improvement for the NFS backend:
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index ab45495..ec2ba89 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -117,37 +117,40 @@ static void cd_read_sector_cb(void *opaque, int ret)
>  }
> 
>  if (s->cd_sector_size == 2352) {
> -cd_data_to_raw(s->io_buffer, s->lba);
> +int nb_sectors = s->packet_transfer_size / 2352;
> +while (nb_sectors--) {
> +memmove(s->io_buffer + nb_sectors * 2352 + 4,
> +s->io_buffer + nb_sectors * 2048, 2048);
> +cd_data_to_raw(s->io_buffer + nb_sectors * 2352,
> +   s->lba + nb_sectors);
> +}
>  }

Is this going to be correct in cases where the number of sectors we are
copying is less than the total request size? We might need to bookmark
how many sectors/bytes we're copying this go-around. Perhaps by looking
at lcyl/hcyl.

> 
> -s->lba++;
> +s->lba = -1;
>  s->io_buffer_index = 0;
>  s->status &= ~BUSY_STAT;
> 
>  ide_atapi_cmd_reply_end(s);
>  }
> 

Well, I might not name it cd_read_sector and cd_read_sector_cb anymore.
Perhaps cd_read_sectors[_cb].

> -static int cd_read_sector(IDEState *s, int lba, void *buf, int
> sector_size)
> +static int cd_read_sector(IDEState *s, int lba, void 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread John Snow


On 10/06/2015 02:31 PM, Peter Lieven wrote:
> Am 06.10.2015 um 19:56 schrieb John Snow:
>>
>> On 10/06/2015 01:12 PM, Peter Lieven wrote:
 Am 06.10.2015 um 19:07 schrieb John Snow :



> On 10/06/2015 05:20 AM, Peter Lieven wrote:
>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
 PIO read requests on the ATAPI interface used to be sync blk requests.
 This has to siginificant drawbacks. First the main loop hangs util an
 I/O request is completed and secondly if the I/O request does not
 complete (e.g. due to an unresponsive storage) Qemu hangs completely.
Maybe you can have a look at the other patches of this series as well?
Then I can
respin the whole series.



 Signed-off-by: Peter Lieven 
 ---
  hw/ide/atapi.c | 69
 --
  1 file changed, 43 insertions(+), 26 deletions(-)

 diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
 index 747f466..9257e1c 100644
 --- a/hw/ide/atapi.c
 +++ b/hw/ide/atapi.c
 @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
  memset(buf, 0, 288);
  }
  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
 sector_size)
 +static void cd_read_sector_cb(void *opaque, int ret)
  {
 -int ret;
 +IDEState *s = opaque;
  -switch(sector_size) {
 -case 2048:
 -block_acct_start(blk_get_stats(s->blk), >acct,
 - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
 -block_acct_done(blk_get_stats(s->blk), >acct);
 -break;
 -case 2352:
 -block_acct_start(blk_get_stats(s->blk), >acct,
 - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
 -block_acct_done(blk_get_stats(s->blk), >acct);
 -if (ret < 0)
 -return ret;
 -cd_data_to_raw(buf, lba);
 -break;
 -default:
 -ret = -EIO;
 -break;
 +block_acct_done(blk_get_stats(s->blk), >acct);
 +
 +if (ret < 0) {
 +ide_atapi_io_error(s, ret);
 +return;
 +}
 +
 +if (s->cd_sector_size == 2352) {
 +cd_data_to_raw(s->io_buffer, s->lba);
  }
 -return ret;
 +
 +s->lba++;
 +s->io_buffer_index = 0;
 +s->status &= ~BUSY_STAT;
 +
 +ide_atapi_cmd_reply_end(s);
 +}
 +
 +static int cd_read_sector(IDEState *s, int lba, void *buf, int
 sector_size)
 +{
 +if (sector_size != 2048 && sector_size != 2352) {
 +return -EINVAL;
 +}
 +
 +s->iov.iov_base = buf;
 +if (sector_size == 2352) {
 +buf += 4;
 +}
>> This doesn't look quite right, buf is never read after this.
>>
>> Also, why +=4 when it was originally buf + 16?
> You are right. I mixed that up.
>
 +
 +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
 +qemu_iovec_init_external(>qiov, >iov, 1);
 +
 +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
 +  cd_read_sector_cb, s) == NULL) {
 +return -EIO;
 +}
 +
 +block_acct_start(blk_get_stats(s->blk), >acct,
 + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 +s->status |= BUSY_STAT;
 +return 0;
  }
>>> We discussed this off-list a bit, but for upstream synchronization:
>>>
>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>> are not prepared to cope with.
>> I don't think that's a problem as long as BSY is set while the
>> asynchronous command is running and DRQ is cleared. The latter will
>> protect ide_data_readw(). ide_sector_read() does essentially the same
>> thing.
> I was thinking the same. Without the BSY its not working at all.
>
>> Or maybe I'm just missing what you're trying to say.
>>
>>> My suggestion is to buffer an entire DRQ block of data at once
>>> (byte_count_limit) to avoid the problem.
>> No matter whether there is a problem or not, buffering more 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread John Snow


On 10/06/2015 01:12 PM, Peter Lieven wrote:
> 
>> Am 06.10.2015 um 19:07 schrieb John Snow :
>>
>>
>>
>>> On 10/06/2015 05:20 AM, Peter Lieven wrote:
 Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
 Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>
> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>> PIO read requests on the ATAPI interface used to be sync blk requests.
>> This has to siginificant drawbacks. First the main loop hangs util an
>> I/O request is completed and secondly if the I/O request does not
>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>
>> Signed-off-by: Peter Lieven 
>> ---
>>  hw/ide/atapi.c | 69
>> --
>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>
>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>> index 747f466..9257e1c 100644
>> --- a/hw/ide/atapi.c
>> +++ b/hw/ide/atapi.c
>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>  memset(buf, 0, 288);
>>  }
>>  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>> sector_size)
>> +static void cd_read_sector_cb(void *opaque, int ret)
>>  {
>> -int ret;
>> +IDEState *s = opaque;
>>  -switch(sector_size) {
>> -case 2048:
>> -block_acct_start(blk_get_stats(s->blk), >acct,
>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>> -block_acct_done(blk_get_stats(s->blk), >acct);
>> -break;
>> -case 2352:
>> -block_acct_start(blk_get_stats(s->blk), >acct,
>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>> -block_acct_done(blk_get_stats(s->blk), >acct);
>> -if (ret < 0)
>> -return ret;
>> -cd_data_to_raw(buf, lba);
>> -break;
>> -default:
>> -ret = -EIO;
>> -break;
>> +block_acct_done(blk_get_stats(s->blk), >acct);
>> +
>> +if (ret < 0) {
>> +ide_atapi_io_error(s, ret);
>> +return;
>> +}
>> +
>> +if (s->cd_sector_size == 2352) {
>> +cd_data_to_raw(s->io_buffer, s->lba);
>>  }
>> -return ret;
>> +
>> +s->lba++;
>> +s->io_buffer_index = 0;
>> +s->status &= ~BUSY_STAT;
>> +
>> +ide_atapi_cmd_reply_end(s);
>> +}
>> +
>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>> sector_size)
>> +{
>> +if (sector_size != 2048 && sector_size != 2352) {
>> +return -EINVAL;
>> +}
>> +
>> +s->iov.iov_base = buf;
>> +if (sector_size == 2352) {
>> +buf += 4;
>> +}
 This doesn't look quite right, buf is never read after this.

 Also, why +=4 when it was originally buf + 16?
>>>
>>> You are right. I mixed that up.
>>>

>> +
>> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>> +qemu_iovec_init_external(>qiov, >iov, 1);
>> +
>> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
>> +  cd_read_sector_cb, s) == NULL) {
>> +return -EIO;
>> +}
>> +
>> +block_acct_start(blk_get_stats(s->blk), >acct,
>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>> +s->status |= BUSY_STAT;
>> +return 0;
>>  }
> We discussed this off-list a bit, but for upstream synchronization:
>
> Unfortunately, I believe making cd_read_sector here non-blocking makes
> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
> s->end_transfer_func() nonblocking, which functions like ide_data_readw
> are not prepared to cope with.
 I don't think that's a problem as long as BSY is set while the
 asynchronous command is running and DRQ is cleared. The latter will
 protect ide_data_readw(). ide_sector_read() does essentially the same
 thing.
>>>
>>> I was thinking the same. Without the BSY its not working at all.
>>>

 Or maybe I'm just missing what you're trying to say.

> My suggestion is to buffer an entire DRQ block of data at once
> (byte_count_limit) to avoid the problem.
 No matter whether there is a problem or not, buffering more data at once
 (and therefore doing less requests) is better for performance anyway.
>>>
>>> Its possible to do only one read in the backend and read the whole
>>> request into the IO buffer. I send a follow-up.
>>
>> Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer
>> and the READ10 cdb can request up to 128MiB! For performance, it might
>> be nice to always buffer something 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread Peter Lieven

Am 05.10.2015 um 23:15 schrieb John Snow:


On 09/21/2015 08:25 AM, Peter Lieven wrote:

PIO read requests on the ATAPI interface used to be sync blk requests.
This has to siginificant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.

Signed-off-by: Peter Lieven 
---
  hw/ide/atapi.c | 69 --
  1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..9257e1c 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
  memset(buf, 0, 288);
  }
  
-static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)

+static void cd_read_sector_cb(void *opaque, int ret)
  {
-int ret;
+IDEState *s = opaque;
  
-switch(sector_size) {

-case 2048:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
-break;
-case 2352:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
-if (ret < 0)
-return ret;
-cd_data_to_raw(buf, lba);
-break;
-default:
-ret = -EIO;
-break;
+block_acct_done(blk_get_stats(s->blk), >acct);
+
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+return;
+}
+
+if (s->cd_sector_size == 2352) {
+cd_data_to_raw(s->io_buffer, s->lba);
  }
-return ret;
+
+s->lba++;
+s->io_buffer_index = 0;
+s->status &= ~BUSY_STAT;
+
+ide_atapi_cmd_reply_end(s);
+}
+
+static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
+{
+if (sector_size != 2048 && sector_size != 2352) {
+return -EINVAL;
+}
+
+s->iov.iov_base = buf;
+if (sector_size == 2352) {
+buf += 4;
+}
+
+s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+qemu_iovec_init_external(>qiov, >iov, 1);
+
+if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
+  cd_read_sector_cb, s) == NULL) {
+return -EIO;
+}
+
+block_acct_start(blk_get_stats(s->blk), >acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+s->status |= BUSY_STAT;
+return 0;
  }
  

We discussed this off-list a bit, but for upstream synchronization:

Unfortunately, I believe making cd_read_sector here non-blocking makes
ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
s->end_transfer_func() nonblocking, which functions like ide_data_readw
are not prepared to cope with.

My suggestion is to buffer an entire DRQ block of data at once
(byte_count_limit) to avoid the problem.


Hi John,

first of all thank you for the detailed analysis.

Is the following what you have i mind. For PIO reads > 1 sector
it is a great improvement for the NFS backend:

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index ab45495..ec2ba89 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -117,37 +117,40 @@ static void cd_read_sector_cb(void *opaque, int ret)
 }

 if (s->cd_sector_size == 2352) {
-cd_data_to_raw(s->io_buffer, s->lba);
+int nb_sectors = s->packet_transfer_size / 2352;
+while (nb_sectors--) {
+memmove(s->io_buffer + nb_sectors * 2352 + 4,
+s->io_buffer + nb_sectors * 2048, 2048);
+cd_data_to_raw(s->io_buffer + nb_sectors * 2352,
+   s->lba + nb_sectors);
+}
 }

-s->lba++;
+s->lba = -1;
 s->io_buffer_index = 0;
 s->status &= ~BUSY_STAT;

 ide_atapi_cmd_reply_end(s);
 }

-static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
+static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size, 
int nb_sectors)
 {
 if (sector_size != 2048 && sector_size != 2352) {
 return -EINVAL;
 }

 s->iov.iov_base = buf;
-if (sector_size == 2352) {
-buf += 4;
-}
-
-s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+s->iov.iov_len = nb_sectors * 2048;
 qemu_iovec_init_external(>qiov, >iov, 1);

-if (ide_readv_cancelable(s, (int64_t)lba << 2, >qiov, 4,
-  cd_read_sector_cb, s) == NULL) {
+if (ide_readv_cancelable(s, (int64_t)lba << 2,
+ >qiov, nb_sectors * 4,
+ cd_read_sector_cb, s) == NULL) {
 return -EIO;
 }

 block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+ nb_sectors * 2048, 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread Kevin Wolf
Am 05.10.2015 um 23:15 hat John Snow geschrieben:
> 
> 
> On 09/21/2015 08:25 AM, Peter Lieven wrote:
> > PIO read requests on the ATAPI interface used to be sync blk requests.
> > This has to siginificant drawbacks. First the main loop hangs util an
> > I/O request is completed and secondly if the I/O request does not
> > complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> > 
> > Signed-off-by: Peter Lieven 
> > ---
> >  hw/ide/atapi.c | 69 
> > --
> >  1 file changed, 43 insertions(+), 26 deletions(-)
> > 
> > diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> > index 747f466..9257e1c 100644
> > --- a/hw/ide/atapi.c
> > +++ b/hw/ide/atapi.c
> > @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
> >  memset(buf, 0, 288);
> >  }
> >  
> > -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int 
> > sector_size)
> > +static void cd_read_sector_cb(void *opaque, int ret)
> >  {
> > -int ret;
> > +IDEState *s = opaque;
> >  
> > -switch(sector_size) {
> > -case 2048:
> > -block_acct_start(blk_get_stats(s->blk), >acct,
> > - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> > -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> > -block_acct_done(blk_get_stats(s->blk), >acct);
> > -break;
> > -case 2352:
> > -block_acct_start(blk_get_stats(s->blk), >acct,
> > - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> > -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> > -block_acct_done(blk_get_stats(s->blk), >acct);
> > -if (ret < 0)
> > -return ret;
> > -cd_data_to_raw(buf, lba);
> > -break;
> > -default:
> > -ret = -EIO;
> > -break;
> > +block_acct_done(blk_get_stats(s->blk), >acct);
> > +
> > +if (ret < 0) {
> > +ide_atapi_io_error(s, ret);
> > +return;
> > +}
> > +
> > +if (s->cd_sector_size == 2352) {
> > +cd_data_to_raw(s->io_buffer, s->lba);
> >  }
> > -return ret;
> > +
> > +s->lba++;
> > +s->io_buffer_index = 0;
> > +s->status &= ~BUSY_STAT;
> > +
> > +ide_atapi_cmd_reply_end(s);
> > +}
> > +
> > +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> > +{
> > +if (sector_size != 2048 && sector_size != 2352) {
> > +return -EINVAL;
> > +}
> > +
> > +s->iov.iov_base = buf;
> > +if (sector_size == 2352) {
> > +buf += 4;
> > +}

This doesn't look quite right, buf is never read after this.

Also, why +=4 when it was originally buf + 16?

> > +
> > +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> > +qemu_iovec_init_external(>qiov, >iov, 1);
> > +
> > +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
> > +  cd_read_sector_cb, s) == NULL) {
> > +return -EIO;
> > +}
> > +
> > +block_acct_start(blk_get_stats(s->blk), >acct,
> > + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> > +s->status |= BUSY_STAT;
> > +return 0;
> >  }
> >  
> 
> We discussed this off-list a bit, but for upstream synchronization:
> 
> Unfortunately, I believe making cd_read_sector here non-blocking makes
> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
> s->end_transfer_func() nonblocking, which functions like ide_data_readw
> are not prepared to cope with.

I don't think that's a problem as long as BSY is set while the
asynchronous command is running and DRQ is cleared. The latter will
protect ide_data_readw(). ide_sector_read() does essentially the same
thing.

Or maybe I'm just missing what you're trying to say.

> My suggestion is to buffer an entire DRQ block of data at once
> (byte_count_limit) to avoid the problem.

No matter whether there is a problem or not, buffering more data at once
(and therefore doing less requests) is better for performance anyway.

Kevin

> >  void ide_atapi_cmd_ok(IDEState *s)
> > @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
> >  ret = cd_read_sector(s, s->lba, s->io_buffer, 
> > s->cd_sector_size);
> >  if (ret < 0) {
> >  ide_atapi_io_error(s, ret);
> > -return;
> >  }
> > -s->lba++;
> > -s->io_buffer_index = 0;
> > +return;
> >  }
> >  if (s->elementary_transfer_size > 0) {
> >  /* there are some data left to transmit in this elementary
> > @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int 
> > lba, int nb_sectors,
> >  s->io_buffer_index = sector_size;
> >  s->cd_sector_size = sector_size;
> >  
> > -s->status = READY_STAT | SEEK_STAT;
> >  ide_atapi_cmd_reply_end(s);
> >  }
> >  
> > 



Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread Peter Lieven

Am 06.10.2015 um 10:57 schrieb Kevin Wolf:

Am 05.10.2015 um 23:15 hat John Snow geschrieben:


On 09/21/2015 08:25 AM, Peter Lieven wrote:

PIO read requests on the ATAPI interface used to be sync blk requests.
This has to siginificant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.

Signed-off-by: Peter Lieven 
---
  hw/ide/atapi.c | 69 --
  1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..9257e1c 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
  memset(buf, 0, 288);
  }
  
-static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int sector_size)

+static void cd_read_sector_cb(void *opaque, int ret)
  {
-int ret;
+IDEState *s = opaque;
  
-switch(sector_size) {

-case 2048:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
-break;
-case 2352:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
-if (ret < 0)
-return ret;
-cd_data_to_raw(buf, lba);
-break;
-default:
-ret = -EIO;
-break;
+block_acct_done(blk_get_stats(s->blk), >acct);
+
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+return;
+}
+
+if (s->cd_sector_size == 2352) {
+cd_data_to_raw(s->io_buffer, s->lba);
  }
-return ret;
+
+s->lba++;
+s->io_buffer_index = 0;
+s->status &= ~BUSY_STAT;
+
+ide_atapi_cmd_reply_end(s);
+}
+
+static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
+{
+if (sector_size != 2048 && sector_size != 2352) {
+return -EINVAL;
+}
+
+s->iov.iov_base = buf;
+if (sector_size == 2352) {
+buf += 4;
+}

This doesn't look quite right, buf is never read after this.

Also, why +=4 when it was originally buf + 16?


You are right. I mixed that up.




+
+s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+qemu_iovec_init_external(>qiov, >iov, 1);
+
+if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
+  cd_read_sector_cb, s) == NULL) {
+return -EIO;
+}
+
+block_acct_start(blk_get_stats(s->blk), >acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+s->status |= BUSY_STAT;
+return 0;
  }
  

We discussed this off-list a bit, but for upstream synchronization:

Unfortunately, I believe making cd_read_sector here non-blocking makes
ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
s->end_transfer_func() nonblocking, which functions like ide_data_readw
are not prepared to cope with.

I don't think that's a problem as long as BSY is set while the
asynchronous command is running and DRQ is cleared. The latter will
protect ide_data_readw(). ide_sector_read() does essentially the same
thing.


 I was thinking the same. Without the BSY its not working at all.



Or maybe I'm just missing what you're trying to say.


My suggestion is to buffer an entire DRQ block of data at once
(byte_count_limit) to avoid the problem.

No matter whether there is a problem or not, buffering more data at once
(and therefore doing less requests) is better for performance anyway.


Its possible to do only one read in the backend and read the whole
request into the IO buffer. I send a follow-up.

Maybe do you have a pointer to the test tool that John mentioned?

Peter




Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread Peter Lieven
Am 06.10.2015 um 19:56 schrieb John Snow:
>
> On 10/06/2015 01:12 PM, Peter Lieven wrote:
>>> Am 06.10.2015 um 19:07 schrieb John Snow :
>>>
>>>
>>>
 On 10/06/2015 05:20 AM, Peter Lieven wrote:
> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>> This has to siginificant drawbacks. First the main loop hangs util an
>>> I/O request is completed and secondly if the I/O request does not
>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>
>>> Signed-off-by: Peter Lieven 
>>> ---
>>>  hw/ide/atapi.c | 69
>>> --
>>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index 747f466..9257e1c 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>  memset(buf, 0, 288);
>>>  }
>>>  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
>>> sector_size)
>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>  {
>>> -int ret;
>>> +IDEState *s = opaque;
>>>  -switch(sector_size) {
>>> -case 2048:
>>> -block_acct_start(blk_get_stats(s->blk), >acct,
>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>> -block_acct_done(blk_get_stats(s->blk), >acct);
>>> -break;
>>> -case 2352:
>>> -block_acct_start(blk_get_stats(s->blk), >acct,
>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>> -block_acct_done(blk_get_stats(s->blk), >acct);
>>> -if (ret < 0)
>>> -return ret;
>>> -cd_data_to_raw(buf, lba);
>>> -break;
>>> -default:
>>> -ret = -EIO;
>>> -break;
>>> +block_acct_done(blk_get_stats(s->blk), >acct);
>>> +
>>> +if (ret < 0) {
>>> +ide_atapi_io_error(s, ret);
>>> +return;
>>> +}
>>> +
>>> +if (s->cd_sector_size == 2352) {
>>> +cd_data_to_raw(s->io_buffer, s->lba);
>>>  }
>>> -return ret;
>>> +
>>> +s->lba++;
>>> +s->io_buffer_index = 0;
>>> +s->status &= ~BUSY_STAT;
>>> +
>>> +ide_atapi_cmd_reply_end(s);
>>> +}
>>> +
>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
>>> sector_size)
>>> +{
>>> +if (sector_size != 2048 && sector_size != 2352) {
>>> +return -EINVAL;
>>> +}
>>> +
>>> +s->iov.iov_base = buf;
>>> +if (sector_size == 2352) {
>>> +buf += 4;
>>> +}
> This doesn't look quite right, buf is never read after this.
>
> Also, why +=4 when it was originally buf + 16?
 You are right. I mixed that up.

>>> +
>>> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>> +qemu_iovec_init_external(>qiov, >iov, 1);
>>> +
>>> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
>>> +  cd_read_sector_cb, s) == NULL) {
>>> +return -EIO;
>>> +}
>>> +
>>> +block_acct_start(blk_get_stats(s->blk), >acct,
>>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> +s->status |= BUSY_STAT;
>>> +return 0;
>>>  }
>> We discussed this off-list a bit, but for upstream synchronization:
>>
>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>> are not prepared to cope with.
> I don't think that's a problem as long as BSY is set while the
> asynchronous command is running and DRQ is cleared. The latter will
> protect ide_data_readw(). ide_sector_read() does essentially the same
> thing.
 I was thinking the same. Without the BSY its not working at all.

> Or maybe I'm just missing what you're trying to say.
>
>> My suggestion is to buffer an entire DRQ block of data at once
>> (byte_count_limit) to avoid the problem.
> No matter whether there is a problem or not, buffering more data at once
> (and therefore doing less requests) is better for performance anyway.
 Its possible to do only one read in the backend and read the whole
 request into the IO buffer. I send a follow-up.
>>> Be cautious: we only have 128K (+4 bytes) to play with in 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread Peter Lieven

Am 06.10.2015 um 10:46 schrieb Peter Lieven:

Am 05.10.2015 um 23:15 schrieb John Snow:


On 09/21/2015 08:25 AM, Peter Lieven wrote:

PIO read requests on the ATAPI interface used to be sync blk requests.
This has to siginificant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.

Signed-off-by: Peter Lieven 
---
  hw/ide/atapi.c | 69 --
  1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 747f466..9257e1c 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
  memset(buf, 0, 288);
  }
  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int 
sector_size)
+static void cd_read_sector_cb(void *opaque, int ret)
  {
-int ret;
+IDEState *s = opaque;
  -switch(sector_size) {
-case 2048:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
-break;
-case 2352:
-block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
-block_acct_done(blk_get_stats(s->blk), >acct);
-if (ret < 0)
-return ret;
-cd_data_to_raw(buf, lba);
-break;
-default:
-ret = -EIO;
-break;
+block_acct_done(blk_get_stats(s->blk), >acct);
+
+if (ret < 0) {
+ide_atapi_io_error(s, ret);
+return;
+}
+
+if (s->cd_sector_size == 2352) {
+cd_data_to_raw(s->io_buffer, s->lba);
  }
-return ret;
+
+s->lba++;
+s->io_buffer_index = 0;
+s->status &= ~BUSY_STAT;
+
+ide_atapi_cmd_reply_end(s);
+}
+
+static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
+{
+if (sector_size != 2048 && sector_size != 2352) {
+return -EINVAL;
+}
+
+s->iov.iov_base = buf;
+if (sector_size == 2352) {
+buf += 4;
+}
+
+s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+qemu_iovec_init_external(>qiov, >iov, 1);
+
+if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
+  cd_read_sector_cb, s) == NULL) {
+return -EIO;
+}
+
+block_acct_start(blk_get_stats(s->blk), >acct,
+ 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+s->status |= BUSY_STAT;
+return 0;
  }

We discussed this off-list a bit, but for upstream synchronization:

Unfortunately, I believe making cd_read_sector here non-blocking makes
ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
s->end_transfer_func() nonblocking, which functions like ide_data_readw
are not prepared to cope with.

My suggestion is to buffer an entire DRQ block of data at once
(byte_count_limit) to avoid the problem.


Hi John,

first of all thank you for the detailed analysis.

Is the following what you have i mind. For PIO reads > 1 sector
it is a great improvement for the NFS backend:

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index ab45495..ec2ba89 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -117,37 +117,40 @@ static void cd_read_sector_cb(void *opaque, int ret)
 }

 if (s->cd_sector_size == 2352) {
-cd_data_to_raw(s->io_buffer, s->lba);
+int nb_sectors = s->packet_transfer_size / 2352;
+while (nb_sectors--) {
+memmove(s->io_buffer + nb_sectors * 2352 + 4,
+s->io_buffer + nb_sectors * 2048, 2048);
+cd_data_to_raw(s->io_buffer + nb_sectors * 2352,
+   s->lba + nb_sectors);
+}
 }

-s->lba++;
+s->lba = -1;
 s->io_buffer_index = 0;
 s->status &= ~BUSY_STAT;

 ide_atapi_cmd_reply_end(s);
 }

-static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
+static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size, 
int nb_sectors)
 {
 if (sector_size != 2048 && sector_size != 2352) {
 return -EINVAL;
 }

 s->iov.iov_base = buf;
-if (sector_size == 2352) {
-buf += 4;
-}
-
-s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+s->iov.iov_len = nb_sectors * 2048;
 qemu_iovec_init_external(>qiov, >iov, 1);

-if (ide_readv_cancelable(s, (int64_t)lba << 2, >qiov, 4,
-  cd_read_sector_cb, s) == NULL) {
+if (ide_readv_cancelable(s, (int64_t)lba << 2,
+ >qiov, nb_sectors * 4,
+ cd_read_sector_cb, s) == NULL) {
 return -EIO;
 }

 block_acct_start(blk_get_stats(s->blk), >acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+ 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread Laszlo Ersek
On 09/21/15 14:25, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has to siginificant drawbacks. First the main loop hangs util an

Trivial comments:
- s/to/two/
- s/siginificant/significant/

Thanks
Laszlo

> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> 
> Signed-off-by: Peter Lieven 
> ---
>  hw/ide/atapi.c | 69 
> --
>  1 file changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..9257e1c 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>  memset(buf, 0, 288);
>  }
>  
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int 
> sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret)
>  {
> -int ret;
> +IDEState *s = opaque;
>  
> -switch(sector_size) {
> -case 2048:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -break;
> -case 2352:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -if (ret < 0)
> -return ret;
> -cd_data_to_raw(buf, lba);
> -break;
> -default:
> -ret = -EIO;
> -break;
> +block_acct_done(blk_get_stats(s->blk), >acct);
> +
> +if (ret < 0) {
> +ide_atapi_io_error(s, ret);
> +return;
> +}
> +
> +if (s->cd_sector_size == 2352) {
> +cd_data_to_raw(s->io_buffer, s->lba);
>  }
> -return ret;
> +
> +s->lba++;
> +s->io_buffer_index = 0;
> +s->status &= ~BUSY_STAT;
> +
> +ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> +{
> +if (sector_size != 2048 && sector_size != 2352) {
> +return -EINVAL;
> +}
> +
> +s->iov.iov_base = buf;
> +if (sector_size == 2352) {
> +buf += 4;
> +}
> +
> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +qemu_iovec_init_external(>qiov, >iov, 1);
> +
> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
> +  cd_read_sector_cb, s) == NULL) {
> +return -EIO;
> +}
> +
> +block_acct_start(blk_get_stats(s->blk), >acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +s->status |= BUSY_STAT;
> +return 0;
>  }
>  
>  void ide_atapi_cmd_ok(IDEState *s)
> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>  ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>  if (ret < 0) {
>  ide_atapi_io_error(s, ret);
> -return;
>  }
> -s->lba++;
> -s->io_buffer_index = 0;
> +return;
>  }
>  if (s->elementary_transfer_size > 0) {
>  /* there are some data left to transmit in this elementary
> @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, 
> int nb_sectors,
>  s->io_buffer_index = sector_size;
>  s->cd_sector_size = sector_size;
>  
> -s->status = READY_STAT | SEEK_STAT;
>  ide_atapi_cmd_reply_end(s);
>  }
>  
> 




Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread Peter Lieven

> Am 06.10.2015 um 19:07 schrieb John Snow :
> 
> 
> 
>> On 10/06/2015 05:20 AM, Peter Lieven wrote:
>>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
 
 On 09/21/2015 08:25 AM, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has to siginificant drawbacks. First the main loop hangs util an
> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> 
> Signed-off-by: Peter Lieven 
> ---
>  hw/ide/atapi.c | 69
> --
>  1 file changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..9257e1c 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>  memset(buf, 0, 288);
>  }
>  -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
> sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret)
>  {
> -int ret;
> +IDEState *s = opaque;
>  -switch(sector_size) {
> -case 2048:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -break;
> -case 2352:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -if (ret < 0)
> -return ret;
> -cd_data_to_raw(buf, lba);
> -break;
> -default:
> -ret = -EIO;
> -break;
> +block_acct_done(blk_get_stats(s->blk), >acct);
> +
> +if (ret < 0) {
> +ide_atapi_io_error(s, ret);
> +return;
> +}
> +
> +if (s->cd_sector_size == 2352) {
> +cd_data_to_raw(s->io_buffer, s->lba);
>  }
> -return ret;
> +
> +s->lba++;
> +s->io_buffer_index = 0;
> +s->status &= ~BUSY_STAT;
> +
> +ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int
> sector_size)
> +{
> +if (sector_size != 2048 && sector_size != 2352) {
> +return -EINVAL;
> +}
> +
> +s->iov.iov_base = buf;
> +if (sector_size == 2352) {
> +buf += 4;
> +}
>>> This doesn't look quite right, buf is never read after this.
>>> 
>>> Also, why +=4 when it was originally buf + 16?
>> 
>> You are right. I mixed that up.
>> 
>>> 
> +
> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +qemu_iovec_init_external(>qiov, >iov, 1);
> +
> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
> +  cd_read_sector_cb, s) == NULL) {
> +return -EIO;
> +}
> +
> +block_acct_start(blk_get_stats(s->blk), >acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +s->status |= BUSY_STAT;
> +return 0;
>  }
 We discussed this off-list a bit, but for upstream synchronization:
 
 Unfortunately, I believe making cd_read_sector here non-blocking makes
 ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
 s->end_transfer_func() nonblocking, which functions like ide_data_readw
 are not prepared to cope with.
>>> I don't think that's a problem as long as BSY is set while the
>>> asynchronous command is running and DRQ is cleared. The latter will
>>> protect ide_data_readw(). ide_sector_read() does essentially the same
>>> thing.
>> 
>> I was thinking the same. Without the BSY its not working at all.
>> 
>>> 
>>> Or maybe I'm just missing what you're trying to say.
>>> 
 My suggestion is to buffer an entire DRQ block of data at once
 (byte_count_limit) to avoid the problem.
>>> No matter whether there is a problem or not, buffering more data at once
>>> (and therefore doing less requests) is better for performance anyway.
>> 
>> Its possible to do only one read in the backend and read the whole
>> request into the IO buffer. I send a follow-up.
> 
> Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer
> and the READ10 cdb can request up to 128MiB! For performance, it might
> be nice to always buffer something like:
> 
> MIN(128K, nb_sectors * sector_size)

isnt nb_sectors limited to CD_MAX_SECTORS (32)?

Peter


> 
> and then as the guest drains the DRQ block of size 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread John Snow


On 10/06/2015 05:20 AM, Peter Lieven wrote:
> Am 06.10.2015 um 10:57 schrieb Kevin Wolf:
>> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>>
>>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
 PIO read requests on the ATAPI interface used to be sync blk requests.
 This has to siginificant drawbacks. First the main loop hangs util an
 I/O request is completed and secondly if the I/O request does not
 complete (e.g. due to an unresponsive storage) Qemu hangs completely.

 Signed-off-by: Peter Lieven 
 ---
   hw/ide/atapi.c | 69
 --
   1 file changed, 43 insertions(+), 26 deletions(-)

 diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
 index 747f466..9257e1c 100644
 --- a/hw/ide/atapi.c
 +++ b/hw/ide/atapi.c
 @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
   memset(buf, 0, 288);
   }
   -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int
 sector_size)
 +static void cd_read_sector_cb(void *opaque, int ret)
   {
 -int ret;
 +IDEState *s = opaque;
   -switch(sector_size) {
 -case 2048:
 -block_acct_start(blk_get_stats(s->blk), >acct,
 - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
 -block_acct_done(blk_get_stats(s->blk), >acct);
 -break;
 -case 2352:
 -block_acct_start(blk_get_stats(s->blk), >acct,
 - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
 -block_acct_done(blk_get_stats(s->blk), >acct);
 -if (ret < 0)
 -return ret;
 -cd_data_to_raw(buf, lba);
 -break;
 -default:
 -ret = -EIO;
 -break;
 +block_acct_done(blk_get_stats(s->blk), >acct);
 +
 +if (ret < 0) {
 +ide_atapi_io_error(s, ret);
 +return;
 +}
 +
 +if (s->cd_sector_size == 2352) {
 +cd_data_to_raw(s->io_buffer, s->lba);
   }
 -return ret;
 +
 +s->lba++;
 +s->io_buffer_index = 0;
 +s->status &= ~BUSY_STAT;
 +
 +ide_atapi_cmd_reply_end(s);
 +}
 +
 +static int cd_read_sector(IDEState *s, int lba, void *buf, int
 sector_size)
 +{
 +if (sector_size != 2048 && sector_size != 2352) {
 +return -EINVAL;
 +}
 +
 +s->iov.iov_base = buf;
 +if (sector_size == 2352) {
 +buf += 4;
 +}
>> This doesn't look quite right, buf is never read after this.
>>
>> Also, why +=4 when it was originally buf + 16?
> 
> You are right. I mixed that up.
> 
>>
 +
 +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
 +qemu_iovec_init_external(>qiov, >iov, 1);
 +
 +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
 +  cd_read_sector_cb, s) == NULL) {
 +return -EIO;
 +}
 +
 +block_acct_start(blk_get_stats(s->blk), >acct,
 + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
 +s->status |= BUSY_STAT;
 +return 0;
   }
   
>>> We discussed this off-list a bit, but for upstream synchronization:
>>>
>>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>>> are not prepared to cope with.
>> I don't think that's a problem as long as BSY is set while the
>> asynchronous command is running and DRQ is cleared. The latter will
>> protect ide_data_readw(). ide_sector_read() does essentially the same
>> thing.
> 
>  I was thinking the same. Without the BSY its not working at all.
> 
>>
>> Or maybe I'm just missing what you're trying to say.
>>
>>> My suggestion is to buffer an entire DRQ block of data at once
>>> (byte_count_limit) to avoid the problem.
>> No matter whether there is a problem or not, buffering more data at once
>> (and therefore doing less requests) is better for performance anyway.
> 
> Its possible to do only one read in the backend and read the whole
> request into the IO buffer. I send a follow-up.
> 

Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer
and the READ10 cdb can request up to 128MiB! For performance, it might
be nice to always buffer something like:

MIN(128K, nb_sectors * sector_size)

and then as the guest drains the DRQ block of size byte_count_limit
which can only be at largest 0xFFFE (we can fit in at least two of these
per io_buffer refill) we can just shift the data_ptr and data_end
pointers to utilize io_buffer like a ring buffer.

Because the guest can at most fetch 0xfffe bytes at a 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-06 Thread John Snow


On 10/06/2015 04:57 AM, Kevin Wolf wrote:
> Am 05.10.2015 um 23:15 hat John Snow geschrieben:
>>
>>
>> On 09/21/2015 08:25 AM, Peter Lieven wrote:
>>> PIO read requests on the ATAPI interface used to be sync blk requests.
>>> This has to siginificant drawbacks. First the main loop hangs util an
>>> I/O request is completed and secondly if the I/O request does not
>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
>>>
>>> Signed-off-by: Peter Lieven 
>>> ---
>>>  hw/ide/atapi.c | 69 
>>> --
>>>  1 file changed, 43 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
>>> index 747f466..9257e1c 100644
>>> --- a/hw/ide/atapi.c
>>> +++ b/hw/ide/atapi.c
>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>>>  memset(buf, 0, 288);
>>>  }
>>>  
>>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int 
>>> sector_size)
>>> +static void cd_read_sector_cb(void *opaque, int ret)
>>>  {
>>> -int ret;
>>> +IDEState *s = opaque;
>>>  
>>> -switch(sector_size) {
>>> -case 2048:
>>> -block_acct_start(blk_get_stats(s->blk), >acct,
>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
>>> -block_acct_done(blk_get_stats(s->blk), >acct);
>>> -break;
>>> -case 2352:
>>> -block_acct_start(blk_get_stats(s->blk), >acct,
>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
>>> -block_acct_done(blk_get_stats(s->blk), >acct);
>>> -if (ret < 0)
>>> -return ret;
>>> -cd_data_to_raw(buf, lba);
>>> -break;
>>> -default:
>>> -ret = -EIO;
>>> -break;
>>> +block_acct_done(blk_get_stats(s->blk), >acct);
>>> +
>>> +if (ret < 0) {
>>> +ide_atapi_io_error(s, ret);
>>> +return;
>>> +}
>>> +
>>> +if (s->cd_sector_size == 2352) {
>>> +cd_data_to_raw(s->io_buffer, s->lba);
>>>  }
>>> -return ret;
>>> +
>>> +s->lba++;
>>> +s->io_buffer_index = 0;
>>> +s->status &= ~BUSY_STAT;
>>> +
>>> +ide_atapi_cmd_reply_end(s);
>>> +}
>>> +
>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
>>> +{
>>> +if (sector_size != 2048 && sector_size != 2352) {
>>> +return -EINVAL;
>>> +}
>>> +
>>> +s->iov.iov_base = buf;
>>> +if (sector_size == 2352) {
>>> +buf += 4;
>>> +}
> 
> This doesn't look quite right, buf is never read after this.
> 
> Also, why +=4 when it was originally buf + 16?
> 
>>> +
>>> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
>>> +qemu_iovec_init_external(>qiov, >iov, 1);
>>> +
>>> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
>>> +  cd_read_sector_cb, s) == NULL) {
>>> +return -EIO;
>>> +}
>>> +
>>> +block_acct_start(blk_get_stats(s->blk), >acct,
>>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
>>> +s->status |= BUSY_STAT;
>>> +return 0;
>>>  }
>>>  
>>
>> We discussed this off-list a bit, but for upstream synchronization:
>>
>> Unfortunately, I believe making cd_read_sector here non-blocking makes
>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
>> s->end_transfer_func() nonblocking, which functions like ide_data_readw
>> are not prepared to cope with.
> 
> I don't think that's a problem as long as BSY is set while the
> asynchronous command is running and DRQ is cleared. The latter will
> protect ide_data_readw(). ide_sector_read() does essentially the same
> thing.
> 
> Or maybe I'm just missing what you're trying to say.
> 

It will be correct from a code standpoint, but I don't think the guest
*expects* DRQ to become re-set before byte_count_limit is exhausted.

In the synchronous version of the code, DRQ flickers while we rebuffer
s->io_buffer, but since it's synchronous, the guest *never sees this*.

The guest does not necessarily have any reason or motivation to check if
DRQ is still set after 2048 bytes -- is that recommended in the spec?

("Warning! The drive may decide to rebuffer arbitrarily while you read,
so check if DRQ is still set before each read to the data register!" ??)

>> My suggestion is to buffer an entire DRQ block of data at once
>> (byte_count_limit) to avoid the problem.
> 
> No matter whether there is a problem or not, buffering more data at once
> (and therefore doing less requests) is better for performance anyway.
> 
> Kevin
> 
>>>  void ide_atapi_cmd_ok(IDEState *s)
>>> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>>>  ret = cd_read_sector(s, s->lba, s->io_buffer, 
>>> s->cd_sector_size);
>>>  if (ret < 0) {
>>>  ide_atapi_io_error(s, ret);
>>> -return;
>>>  }
>>> 

Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-05 Thread John Snow


On 09/21/2015 08:25 AM, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has to siginificant drawbacks. First the main loop hangs util an
> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> 
> Signed-off-by: Peter Lieven 
> ---
>  hw/ide/atapi.c | 69 
> --
>  1 file changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..9257e1c 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>  memset(buf, 0, 288);
>  }
>  
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int 
> sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret)
>  {
> -int ret;
> +IDEState *s = opaque;
>  
> -switch(sector_size) {
> -case 2048:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -break;
> -case 2352:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -if (ret < 0)
> -return ret;
> -cd_data_to_raw(buf, lba);
> -break;
> -default:
> -ret = -EIO;
> -break;
> +block_acct_done(blk_get_stats(s->blk), >acct);
> +
> +if (ret < 0) {
> +ide_atapi_io_error(s, ret);
> +return;
> +}
> +
> +if (s->cd_sector_size == 2352) {
> +cd_data_to_raw(s->io_buffer, s->lba);
>  }
> -return ret;
> +
> +s->lba++;
> +s->io_buffer_index = 0;
> +s->status &= ~BUSY_STAT;
> +
> +ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> +{
> +if (sector_size != 2048 && sector_size != 2352) {
> +return -EINVAL;
> +}
> +
> +s->iov.iov_base = buf;
> +if (sector_size == 2352) {
> +buf += 4;
> +}
> +
> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +qemu_iovec_init_external(>qiov, >iov, 1);
> +
> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
> +  cd_read_sector_cb, s) == NULL) {
> +return -EIO;
> +}
> +
> +block_acct_start(blk_get_stats(s->blk), >acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +s->status |= BUSY_STAT;
> +return 0;
>  }
>  

We discussed this off-list a bit, but for upstream synchronization:

Unfortunately, I believe making cd_read_sector here non-blocking makes
ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to
s->end_transfer_func() nonblocking, which functions like ide_data_readw
are not prepared to cope with.

My suggestion is to buffer an entire DRQ block of data at once
(byte_count_limit) to avoid the problem.

Thanks,
--js

>  void ide_atapi_cmd_ok(IDEState *s)
> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>  ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>  if (ret < 0) {
>  ide_atapi_io_error(s, ret);
> -return;
>  }
> -s->lba++;
> -s->io_buffer_index = 0;
> +return;
>  }
>  if (s->elementary_transfer_size > 0) {
>  /* there are some data left to transmit in this elementary
> @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, 
> int nb_sectors,
>  s->io_buffer_index = sector_size;
>  s->cd_sector_size = sector_size;
>  
> -s->status = READY_STAT | SEEK_STAT;
>  ide_atapi_cmd_reply_end(s);
>  }
>  
> 



Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async

2015-10-02 Thread John Snow


On 09/21/2015 08:25 AM, Peter Lieven wrote:
> PIO read requests on the ATAPI interface used to be sync blk requests.
> This has to siginificant drawbacks. First the main loop hangs util an
> I/O request is completed and secondly if the I/O request does not
> complete (e.g. due to an unresponsive storage) Qemu hangs completely.
> 
> Signed-off-by: Peter Lieven 
> ---
>  hw/ide/atapi.c | 69 
> --
>  1 file changed, 43 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
> index 747f466..9257e1c 100644
> --- a/hw/ide/atapi.c
> +++ b/hw/ide/atapi.c
> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba)
>  memset(buf, 0, 288);
>  }
>  
> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int 
> sector_size)
> +static void cd_read_sector_cb(void *opaque, int ret)
>  {
> -int ret;
> +IDEState *s = opaque;
>  
> -switch(sector_size) {
> -case 2048:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -break;
> -case 2352:
> -block_acct_start(blk_get_stats(s->blk), >acct,
> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> -ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4);
> -block_acct_done(blk_get_stats(s->blk), >acct);
> -if (ret < 0)
> -return ret;
> -cd_data_to_raw(buf, lba);
> -break;
> -default:
> -ret = -EIO;
> -break;
> +block_acct_done(blk_get_stats(s->blk), >acct);
> +
> +if (ret < 0) {
> +ide_atapi_io_error(s, ret);
> +return;
> +}
> +
> +if (s->cd_sector_size == 2352) {
> +cd_data_to_raw(s->io_buffer, s->lba);
>  }
> -return ret;
> +
> +s->lba++;
> +s->io_buffer_index = 0;
> +s->status &= ~BUSY_STAT;
> +
> +ide_atapi_cmd_reply_end(s);
> +}
> +
> +static int cd_read_sector(IDEState *s, int lba, void *buf, int sector_size)
> +{
> +if (sector_size != 2048 && sector_size != 2352) {
> +return -EINVAL;
> +}
> +
> +s->iov.iov_base = buf;
> +if (sector_size == 2352) {
> +buf += 4;
> +}
> +
> +s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
> +qemu_iovec_init_external(>qiov, >iov, 1);
> +
> +if (blk_aio_readv(s->blk, (int64_t)lba << 2, >qiov, 4,
> +  cd_read_sector_cb, s) == NULL) {
> +return -EIO;
> +}
> +
> +block_acct_start(blk_get_stats(s->blk), >acct,
> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
> +s->status |= BUSY_STAT;
> +return 0;
>  }
>  
>  void ide_atapi_cmd_ok(IDEState *s)
> @@ -190,10 +210,8 @@ void ide_atapi_cmd_reply_end(IDEState *s)
>  ret = cd_read_sector(s, s->lba, s->io_buffer, s->cd_sector_size);
>  if (ret < 0) {
>  ide_atapi_io_error(s, ret);
> -return;
>  }
> -s->lba++;
> -s->io_buffer_index = 0;
> +return;
>  }
>  if (s->elementary_transfer_size > 0) {
>  /* there are some data left to transmit in this elementary
> @@ -275,7 +293,6 @@ static void ide_atapi_cmd_read_pio(IDEState *s, int lba, 
> int nb_sectors,
>  s->io_buffer_index = sector_size;
>  s->cd_sector_size = sector_size;
>  
> -s->status = READY_STAT | SEEK_STAT;
>  ide_atapi_cmd_reply_end(s);
>  }
>  
> 

For me, this hangs the pio_large test in tests/ide-test, path:
/x86_64/ide/cdrom/pio_large

...I'll debug this over the weekend.

I think I checked this test in after you wrote this patch series, with a
mind of being able to test ATAPI for Alexander's ATAPI-SCSI bridge, but
it seems useful here :)

-- 
—js