[Qemu-devel] [RFC] block io lost in the guest , possible related to qemu?

2013-10-25 Thread Jack Wang
Hi Experts,

We've seen guest block io lost in a VM.any response will be helpful

environment is:
guest os: Ubuntu 1304
running busy database workload with xfs on a disk export with virtio-blk

the exported vdb has very high infight io over 300. Some times later a
lot io process in D state, looks a lot requests is lost in below storage
stack.

We're use qemu-kvm 1.0, host kernel 3.4.51

In qemu log of virtio-blk.c
I found below commit, I wonder is it possible the workload generate some
unknown reqests to qemu that lost in virtio_blk_handle_read?
I do some fio test myself, I cann't generate so call unknown request type.

Any response will be helpful.

Jack


commit 9e72c45033770b81b536ac6091e91807247cc25a
Author: Alexey Zaytsev 
Date:   Thu Dec 13 09:03:43 2012 +0200

virtio-blk: Return UNSUPP for unknown request types

Currently, all unknown requests are treated as VIRTIO_BLK_T_IN

Signed-off-by: Alexey Zaytsev 
Signed-off-by: Stefan Hajnoczi 

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 92c745a..df57b35 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -398,10 +398,14 @@ static void
virtio_blk_handle_request(VirtIOBlockReq *req,
 qemu_iovec_init_external(&req->qiov, &req->elem.out_sg[1],
  req->elem.out_num - 1);
 virtio_blk_handle_write(req, mrb);
-} else {
+} else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
+/* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
 qemu_iovec_init_external(&req->qiov, &req->elem.in_sg[0],
  req->elem.in_num - 1);
 virtio_blk_handle_read(req);
+} else {
+virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
+g_free(req);
 }
 }



Re: [Qemu-devel] [RFC] block io lost in the guest , possible related to qemu?

2013-10-28 Thread Jack Wang
Hello Kevin & Stefan

Any comments or wild guess about the bug?

Regards,
Jack

On 10/25/2013 05:01 PM, Jack Wang wrote:
> Hi Experts,
> 
> We've seen guest block io lost in a VM.any response will be helpful
> 
> environment is:
> guest os: Ubuntu 1304
> running busy database workload with xfs on a disk export with virtio-blk
> 
> the exported vdb has very high infight io over 300. Some times later a
> lot io process in D state, looks a lot requests is lost in below storage
> stack.
> 
> We're use qemu-kvm 1.0, host kernel 3.4.51
> 
> In qemu log of virtio-blk.c
> I found below commit, I wonder is it possible the workload generate some
> unknown reqests to qemu that lost in virtio_blk_handle_read?
> I do some fio test myself, I cann't generate so call unknown request type.
> 
> Any response will be helpful.
> 
> Jack
> 
> 
> commit 9e72c45033770b81b536ac6091e91807247cc25a
> Author: Alexey Zaytsev 
> Date:   Thu Dec 13 09:03:43 2012 +0200
> 
> virtio-blk: Return UNSUPP for unknown request types
> 
> Currently, all unknown requests are treated as VIRTIO_BLK_T_IN
> 
> Signed-off-by: Alexey Zaytsev 
> Signed-off-by: Stefan Hajnoczi 
> 
> diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
> index 92c745a..df57b35 100644
> --- a/hw/virtio-blk.c
> +++ b/hw/virtio-blk.c
> @@ -398,10 +398,14 @@ static void
> virtio_blk_handle_request(VirtIOBlockReq *req,
>  qemu_iovec_init_external(&req->qiov, &req->elem.out_sg[1],
>   req->elem.out_num - 1);
>  virtio_blk_handle_write(req, mrb);
> -} else {
> +} else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
> +/* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>  qemu_iovec_init_external(&req->qiov, &req->elem.in_sg[0],
>   req->elem.in_num - 1);
>  virtio_blk_handle_read(req);
> +} else {
> +virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
> +g_free(req);
>  }
>  }
> 




Re: [Qemu-devel] [RFC] block io lost in the guest , possible related to qemu?

2013-10-28 Thread Jack Wang
On 10/28/2013 10:54 AM, Alexey Zaytsev wrote:
> Hey.
> 
> I very much doubt this commit could be causing the problem, as qemu
> would never set wrong request type in the first place. You can easily
> check by either reverting it, or adding a printk() before
> virtio_blk_req_complete(VIRTIO_BLK_S_UNSUPP).

Hi Alexey,

Thanks for you input.
According to my test results, yes, as you said, virtio-blk never
generate wrong request type. So the commit is only a small cosmetic
extra check:(

As there's nothing abnormal in host server and storage, there must be
some hidden bug somewhere, damn it.

Regards,
Jack



> 
> On Mon, Oct 28, 2013 at 10:15 AM, Jack Wang  wrote:
>> Hello Kevin & Stefan
>>
>> Any comments or wild guess about the bug?
>>
>> Regards,
>> Jack
>>
>> On 10/25/2013 05:01 PM, Jack Wang wrote:
>>> Hi Experts,
>>>
>>> We've seen guest block io lost in a VM.any response will be helpful
>>>
>>> environment is:
>>> guest os: Ubuntu 1304
>>> running busy database workload with xfs on a disk export with virtio-blk
>>>
>>> the exported vdb has very high infight io over 300. Some times later a
>>> lot io process in D state, looks a lot requests is lost in below storage
>>> stack.
>>>
>>> We're use qemu-kvm 1.0, host kernel 3.4.51
>>>
>>> In qemu log of virtio-blk.c
>>> I found below commit, I wonder is it possible the workload generate some
>>> unknown reqests to qemu that lost in virtio_blk_handle_read?
>>> I do some fio test myself, I cann't generate so call unknown request type.
>>>
>>> Any response will be helpful.
>>>
>>> Jack
>>>
>>>
>>> commit 9e72c45033770b81b536ac6091e91807247cc25a
>>> Author: Alexey Zaytsev 
>>> Date:   Thu Dec 13 09:03:43 2012 +0200
>>>
>>> virtio-blk: Return UNSUPP for unknown request types
>>>
>>> Currently, all unknown requests are treated as VIRTIO_BLK_T_IN
>>>
>>> Signed-off-by: Alexey Zaytsev 
>>> Signed-off-by: Stefan Hajnoczi 
>>>
>>> diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
>>> index 92c745a..df57b35 100644
>>> --- a/hw/virtio-blk.c
>>> +++ b/hw/virtio-blk.c
>>> @@ -398,10 +398,14 @@ static void
>>> virtio_blk_handle_request(VirtIOBlockReq *req,
>>>  qemu_iovec_init_external(&req->qiov, &req->elem.out_sg[1],
>>>   req->elem.out_num - 1);
>>>  virtio_blk_handle_write(req, mrb);
>>> -} else {
>>> +} else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>>> +/* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>>>  qemu_iovec_init_external(&req->qiov, &req->elem.in_sg[0],
>>>   req->elem.in_num - 1);
>>>  virtio_blk_handle_read(req);
>>> +} else {
>>> +virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
>>> +g_free(req);
>>>  }
>>>  }
>>>
>>




Re: [Qemu-devel] [RFC] block io lost in the guest , possible related to qemu?

2013-10-28 Thread Alexey Zaytsev
Hey.

I very much doubt this commit could be causing the problem, as qemu
would never set wrong request type in the first place. You can easily
check by either reverting it, or adding a printk() before
virtio_blk_req_complete(VIRTIO_BLK_S_UNSUPP).

On Mon, Oct 28, 2013 at 10:15 AM, Jack Wang  wrote:
> Hello Kevin & Stefan
>
> Any comments or wild guess about the bug?
>
> Regards,
> Jack
>
> On 10/25/2013 05:01 PM, Jack Wang wrote:
>> Hi Experts,
>>
>> We've seen guest block io lost in a VM.any response will be helpful
>>
>> environment is:
>> guest os: Ubuntu 1304
>> running busy database workload with xfs on a disk export with virtio-blk
>>
>> the exported vdb has very high infight io over 300. Some times later a
>> lot io process in D state, looks a lot requests is lost in below storage
>> stack.
>>
>> We're use qemu-kvm 1.0, host kernel 3.4.51
>>
>> In qemu log of virtio-blk.c
>> I found below commit, I wonder is it possible the workload generate some
>> unknown reqests to qemu that lost in virtio_blk_handle_read?
>> I do some fio test myself, I cann't generate so call unknown request type.
>>
>> Any response will be helpful.
>>
>> Jack
>>
>>
>> commit 9e72c45033770b81b536ac6091e91807247cc25a
>> Author: Alexey Zaytsev 
>> Date:   Thu Dec 13 09:03:43 2012 +0200
>>
>> virtio-blk: Return UNSUPP for unknown request types
>>
>> Currently, all unknown requests are treated as VIRTIO_BLK_T_IN
>>
>> Signed-off-by: Alexey Zaytsev 
>> Signed-off-by: Stefan Hajnoczi 
>>
>> diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
>> index 92c745a..df57b35 100644
>> --- a/hw/virtio-blk.c
>> +++ b/hw/virtio-blk.c
>> @@ -398,10 +398,14 @@ static void
>> virtio_blk_handle_request(VirtIOBlockReq *req,
>>  qemu_iovec_init_external(&req->qiov, &req->elem.out_sg[1],
>>   req->elem.out_num - 1);
>>  virtio_blk_handle_write(req, mrb);
>> -} else {
>> +} else if (type == VIRTIO_BLK_T_IN || type == VIRTIO_BLK_T_BARRIER) {
>> +/* VIRTIO_BLK_T_IN is 0, so we can't just & it. */
>>  qemu_iovec_init_external(&req->qiov, &req->elem.in_sg[0],
>>   req->elem.in_num - 1);
>>  virtio_blk_handle_read(req);
>> +} else {
>> +virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
>> +g_free(req);
>>  }
>>  }
>>
>



Re: [Qemu-devel] [RFC] block io lost in the guest , possible related to qemu?

2013-10-30 Thread Stefan Hajnoczi
On Fri, Oct 25, 2013 at 05:01:54PM +0200, Jack Wang wrote:
> We've seen guest block io lost in a VM.any response will be helpful
> 
> environment is:
> guest os: Ubuntu 1304
> running busy database workload with xfs on a disk export with virtio-blk
> 
> the exported vdb has very high infight io over 300. Some times later a
> lot io process in D state, looks a lot requests is lost in below storage
> stack.

Is the image file on a local file system or are you using a network
storage system (e.g. NFS, Gluster, Ceph, Sheepdog)?

If you run "vmstat 5" inside the guest, do you see "bi"/"bo" block I/O
activity?  If that number is very low or zero then there may be a
starvation problem.  If that number is reasonable then the workload is
simply bottlenecked on disk I/O.

virtio-blk only has 128 descriptors available so it's not possible to
have 300 requests pending at the virtio-blk layer.

If you suspect QEMU, try building qemu.git/master from source in case
the bug has already been fixed.

If you want to trace I/O requests, you might find this blog post on
writing trace analysis scripts useful:
http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html

Stefan



Re: [Qemu-devel] [RFC] block io lost in the guest , possible related to qemu?

2013-10-30 Thread Jack Wang
On 10/30/2013 10:50 AM, Stefan Hajnoczi wrote:
> On Fri, Oct 25, 2013 at 05:01:54PM +0200, Jack Wang wrote:
>> We've seen guest block io lost in a VM.any response will be helpful
>>
>> environment is:
>> guest os: Ubuntu 1304
>> running busy database workload with xfs on a disk export with virtio-blk
>>
>> the exported vdb has very high infight io over 300. Some times later a
>> lot io process in D state, looks a lot requests is lost in below storage
>> stack.
> 
> Is the image file on a local file system or are you using a network
> storage system (e.g. NFS, Gluster, Ceph, Sheepdog)?
> 
> If you run "vmstat 5" inside the guest, do you see "bi"/"bo" block I/O
> activity?  If that number is very low or zero then there may be a
> starvation problem.  If that number is reasonable then the workload is
> simply bottlenecked on disk I/O.
> 
> virtio-blk only has 128 descriptors available so it's not possible to
> have 300 requests pending at the virtio-blk layer.
> 
> If you suspect QEMU, try building qemu.git/master from source in case
> the bug has already been fixed.
> 
> If you want to trace I/O requests, you might find this blog post on
> writing trace analysis scripts useful:
> http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html
> 
> Stefan
> 
Thanks Stefan for your valuable input.

The image is on device exported with InfiniBand srp/srpt.
Will follow your suggestions to do further investigation.

The 300 infight ios I memtioned is from the /proc/diskstats  Field  9 --
# of I/Os currently in progress.

Jack