Re: [libvirt] [PATCH] virsh: improve waiting for block job readiness

2016-01-05 Thread Peter Krempa
On Tue, Jan 05, 2016 at 11:19:44 +1100, Michael Chapman wrote:
> On Mon, 4 Jan 2016, Peter Krempa wrote:
> > On Thu, Dec 31, 2015 at 16:34:49 +1100, Michael Chapman wrote:
> >> Wait for a block job event after the job has reached 100% only if
> >> exactly one of the BLOCK_JOB and BLOCK_JOB_2 callbacks were successfully
> >> registered.
> >>
> >> If neither callback was registered, then no event will ever be received.
> >> If both were registered, then any user-supplied path is guaranteed to
> >> match one of the events.
> >>
> >> Signed-off-by: Michael Chapman 
> >> ---
> >>

[...]

> >
> > The new statement is not true. If the user provides a filesystem path
> > different from what is in the definition but matching the same volume [1]
> > the callback will still return the path present in the configuration and
> > thus might not ever match and the job would hang forever.
> >
> > [1] e.g. /path/to/image and /path/to/../to/image are equivalent but
> > won't be equal using strcmp. The problem is even more prominent if you
> > mix in some symlinks.
> 
> As far I can tell the block job won't even be able to be identified if the 
> user specifies a different path than the one in the domain XML.
> 
> At present, the only implementation of the virGetBlockJobInfo API 
> is qemuDomainGetBlockJobInfo. The disk definition is found in the call 
> chain:
> 
>qemuDomainGetBlockJobInfo
>  -> virDomainDiskByName
>-> virDomainDiskIndexByName
> 
> and virDomainDiskIndexByName only does STREQ tests to match the supplied 
> path against the  or  elements.

When responding I didn't look at the code and remembered that we used
the path supplied by the user verbatim in some cases. This one is not
one of them though, so ...

> 
> So at present, the disk path supplied by the user of virGetBlockJobInfo 
> has to be *exactly* the source path or the target name, and these are 
> precisely the two strings used in the two events.

... You are right. qemuBlockJobEventProcess looks up the strings in the
definition by the disk alias and fires both events one with disk->path
and the second one with disk->dst.

> 
> The only safe way to allow different-but-equivalent paths to match the one 
> disk definition would be record the device+inode numbers in the disk 
> definition. We can't simply follow symlinks to resolve the path, since the 

That becomes even harder with various remote and network filesystems.

> symlinks could have changed since the domain was started -- e.g. 
> /path/to/../to/image may now be equivalent to /path/to/image, but 
> /path/to/image may not be the same as what it was when the domain was 
> started.
> 
> So on that basis I think we have to settle for requiring the 
> virGetBlockJobInfo path to match the disk definition exactly.
> 
> >> +if (data->cb_id2 < 0 || data->cb_id2 < 0) {
> >> +if (result == 0)
> >> +break;
> >>
> >> -/* for two-phase jobs we will try to wait in the synchronized 
> >> phase
> >> - * for event arrival since 100% completion doesn't necessarily 
> >> mean that
> >> - * the block job has finished and can be terminated with success 
> >> */
> >> -if (info.end == info.cur && --retries == 0) {
> >> -ret = VIR_DOMAIN_BLOCK_JOB_READY;
> >> -goto cleanup;
> >> +/* only wait in the synchronized phase for event arrival if
> >> + * either callback was registered */
> >> +if (info.end == info.cur &&
> >> +((data->cb_id2 < 0 && data->cb_id2 < 0) || --retries == 
> >> 0)) {
> >> +ret = VIR_DOMAIN_BLOCK_JOB_READY;
> >> +goto cleanup;
> >> +}
> >>  }
> >
> > NACK to this approach, if there was a way how this ugly stuff could be
> > avoided or deleted I'd already do so.

I take back, I've remembered the code incorrectly.
qemuDomainBlockJobAbort passes the user provided string to
qemuDiskPathToAlias which uses virDomainDiskIndexByName.
virDomainDiskIndexByName uses the same data as is present in the two
events combined.

> >
> > Peter
> 
> OK. At any rate, I do think 2.5 seconds is not enough. On a hypervisor 
> with a large amount of memory I was able to generate block jobs that would 
> take 10-15 seconds to fully flush out to disk.

Actually I've witnessed some failures when attempting to pivot to a new
image after a blockcopy operation which made virsh to attempt the pivot
after that so this actually might fix it.

If you might post this again with the bug in the event checking fixed
I'll review it then.

Thanks and sorry for the noise.

Peter


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] virsh: improve waiting for block job readiness

2016-01-04 Thread Michael Chapman

On Mon, 4 Jan 2016, Andrea Bolognani wrote:

On Thu, 2015-12-31 at 16:34 +1100, Michael Chapman wrote:

Wait for a block job event after the job has reached 100% only if
exactly one of the BLOCK_JOB and BLOCK_JOB_2 callbacks were successfully
registered.
 
If neither callback was registered, then no event will ever be received.
If both were registered, then any user-supplied path is guaranteed to
match one of the events.
 
Signed-off-by: Michael Chapman 
---
 
I have found that even a 2.5 second timeout isn't always sufficiently
long for QEMU to flush a disk at the end of a block job.
 
I hope I've understood the code properly here, because as far as I can
tell the comment I'm removing in this commit isn't right. The path the
user supplies *has* to be either the  or  exactly in order for the disk to be matched, and these are
precisely the two strings used by the two events.
 
  tools/virsh-domain.c | 24 +---
  1 file changed, 13 insertions(+), 11 deletions(-)
 
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
index edbbc34..60de9ba 100644
--- a/tools/virsh-domain.c
+++ b/tools/virsh-domain.c
@@ -1942,18 +1942,20 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
  goto cleanup;
  }
  
-/* since virsh can't guarantee that the path provided by the user will
- * later be matched in the event we will need to keep the fallback
- * approach and claim success if the block job finishes or vanishes. */
-if (result == 0)
-break;
+/* if either event could not be registered we can't guarantee that the
+ * path provided by the user will be matched, so keep the fallback
+ * approach and claim success if the block job finishes or vanishes */
+if (data->cb_id2 < 0 || data->cb_id2 < 0) {


I'm not going to comment on the rest of the patch, but the condition
above doesn't look like it would make any sense written that way...


Oops, good catch. I had meant to use cb_id and cb_id2 respectively in the 
two conditions (and in the following test as well).



+if (result == 0)
+break;
  
-/* for two-phase jobs we will try to wait in the synchronized phase
- * for event arrival since 100% completion doesn't necessarily mean 
that
- * the block job has finished and can be terminated with success */
-if (info.end == info.cur && --retries == 0) {
-ret = VIR_DOMAIN_BLOCK_JOB_READY;
-goto cleanup;
+/* only wait in the synchronized phase for event arrival if
+ * either callback was registered */
+if (info.end == info.cur &&
+((data->cb_id2 < 0 && data->cb_id2 < 0) || --retries == 0)) {


... and neither does this one.

Am I missing something?

Cheers.


+ret = VIR_DOMAIN_BLOCK_JOB_READY;
+goto cleanup;
+}
  }
  
  if (data->verbose)

-- 
Andrea Bolognani
Software Engineer - Virtualization Team


- Michael--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] virsh: improve waiting for block job readiness

2016-01-04 Thread Michael Chapman

On Mon, 4 Jan 2016, Peter Krempa wrote:

On Thu, Dec 31, 2015 at 16:34:49 +1100, Michael Chapman wrote:

Wait for a block job event after the job has reached 100% only if
exactly one of the BLOCK_JOB and BLOCK_JOB_2 callbacks were successfully
registered.

If neither callback was registered, then no event will ever be received.
If both were registered, then any user-supplied path is guaranteed to
match one of the events.

Signed-off-by: Michael Chapman 
---

I have found that even a 2.5 second timeout isn't always sufficiently
long for QEMU to flush a disk at the end of a block job.

I hope I've understood the code properly here, because as far as I can
tell the comment I'm removing in this commit isn't right. The path the
user supplies *has* to be either the  or  exactly in order for the disk to be matched, and these are
precisely the two strings used by the two events.

 tools/virsh-domain.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)


In addition to Andrea's review:



diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
index edbbc34..60de9ba 100644
--- a/tools/virsh-domain.c
+++ b/tools/virsh-domain.c
@@ -1942,18 +1942,20 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
 goto cleanup;
 }

-/* since virsh can't guarantee that the path provided by the user will
- * later be matched in the event we will need to keep the fallback
- * approach and claim success if the block job finishes or vanishes. */
-if (result == 0)
-break;
+/* if either event could not be registered we can't guarantee that the
+ * path provided by the user will be matched, so keep the fallback
+ * approach and claim success if the block job finishes or vanishes */


The new statement is not true. If the user provides a filesystem path
different from what is in the definition but matching the same volume [1]
the callback will still return the path present in the configuration and
thus might not ever match and the job would hang forever.

[1] e.g. /path/to/image and /path/to/../to/image are equivalent but
won't be equal using strcmp. The problem is even more prominent if you
mix in some symlinks.


As far I can tell the block job won't even be able to be identified if the 
user specifies a different path than the one in the domain XML.


At present, the only implementation of the virGetBlockJobInfo API 
is qemuDomainGetBlockJobInfo. The disk definition is found in the call 
chain:


  qemuDomainGetBlockJobInfo
-> virDomainDiskByName
  -> virDomainDiskIndexByName

and virDomainDiskIndexByName only does STREQ tests to match the supplied 
path against the  or  elements.


So at present, the disk path supplied by the user of virGetBlockJobInfo 
has to be *exactly* the source path or the target name, and these are 
precisely the two strings used in the two events.


The only safe way to allow different-but-equivalent paths to match the one 
disk definition would be record the device+inode numbers in the disk 
definition. We can't simply follow symlinks to resolve the path, since the 
symlinks could have changed since the domain was started -- e.g. 
/path/to/../to/image may now be equivalent to /path/to/image, but 
/path/to/image may not be the same as what it was when the domain was 
started.


So on that basis I think we have to settle for requiring the 
virGetBlockJobInfo path to match the disk definition exactly.



+if (data->cb_id2 < 0 || data->cb_id2 < 0) {
+if (result == 0)
+break;

-/* for two-phase jobs we will try to wait in the synchronized phase
- * for event arrival since 100% completion doesn't necessarily mean 
that
- * the block job has finished and can be terminated with success */
-if (info.end == info.cur && --retries == 0) {
-ret = VIR_DOMAIN_BLOCK_JOB_READY;
-goto cleanup;
+/* only wait in the synchronized phase for event arrival if
+ * either callback was registered */
+if (info.end == info.cur &&
+((data->cb_id2 < 0 && data->cb_id2 < 0) || --retries == 0)) {
+ret = VIR_DOMAIN_BLOCK_JOB_READY;
+goto cleanup;
+}
 }


NACK to this approach, if there was a way how this ugly stuff could be
avoided or deleted I'd already do so.

Peter


OK. At any rate, I do think 2.5 seconds is not enough. On a hypervisor 
with a large amount of memory I was able to generate block jobs that would 
take 10-15 seconds to fully flush out to disk.


- Michael

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] virsh: improve waiting for block job readiness

2016-01-04 Thread Andrea Bolognani
On Thu, 2015-12-31 at 16:34 +1100, Michael Chapman wrote:
> Wait for a block job event after the job has reached 100% only if
> exactly one of the BLOCK_JOB and BLOCK_JOB_2 callbacks were successfully
> registered.
> 
> If neither callback was registered, then no event will ever be received.
> If both were registered, then any user-supplied path is guaranteed to
> match one of the events.
> 
> Signed-off-by: Michael Chapman 
> ---
> 
> I have found that even a 2.5 second timeout isn't always sufficiently
> long for QEMU to flush a disk at the end of a block job.
> 
> I hope I've understood the code properly here, because as far as I can
> tell the comment I'm removing in this commit isn't right. The path the
> user supplies *has* to be either the  or  dev='name'/> exactly in order for the disk to be matched, and these are
> precisely the two strings used by the two events.
> 
>  tools/virsh-domain.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> index edbbc34..60de9ba 100644
> --- a/tools/virsh-domain.c
> +++ b/tools/virsh-domain.c
> @@ -1942,18 +1942,20 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
>  goto cleanup;
>  }
>  
> -/* since virsh can't guarantee that the path provided by the user 
> will
> - * later be matched in the event we will need to keep the fallback
> - * approach and claim success if the block job finishes or vanishes. 
> */
> -if (result == 0)
> -break;
> +/* if either event could not be registered we can't guarantee that 
> the
> + * path provided by the user will be matched, so keep the fallback
> + * approach and claim success if the block job finishes or vanishes 
> */
> +if (data->cb_id2 < 0 || data->cb_id2 < 0) {

I'm not going to comment on the rest of the patch, but the condition
above doesn't look like it would make any sense written that way...

> +if (result == 0)
> +break;
>  
> -/* for two-phase jobs we will try to wait in the synchronized phase
> - * for event arrival since 100% completion doesn't necessarily mean 
> that
> - * the block job has finished and can be terminated with success */
> -if (info.end == info.cur && --retries == 0) {
> -ret = VIR_DOMAIN_BLOCK_JOB_READY;
> -goto cleanup;
> +/* only wait in the synchronized phase for event arrival if
> + * either callback was registered */
> +if (info.end == info.cur &&
> +((data->cb_id2 < 0 && data->cb_id2 < 0) || --retries == 0)) {

... and neither does this one.

Am I missing something?

Cheers.

> +ret = VIR_DOMAIN_BLOCK_JOB_READY;
> +goto cleanup;
> +}
>  }
>  
>  if (data->verbose)
-- 
Andrea Bolognani
Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] virsh: improve waiting for block job readiness

2016-01-04 Thread Peter Krempa
On Thu, Dec 31, 2015 at 16:34:49 +1100, Michael Chapman wrote:
> Wait for a block job event after the job has reached 100% only if
> exactly one of the BLOCK_JOB and BLOCK_JOB_2 callbacks were successfully
> registered.
> 
> If neither callback was registered, then no event will ever be received.
> If both were registered, then any user-supplied path is guaranteed to
> match one of the events.
> 
> Signed-off-by: Michael Chapman 
> ---
> 
> I have found that even a 2.5 second timeout isn't always sufficiently
> long for QEMU to flush a disk at the end of a block job.
> 
> I hope I've understood the code properly here, because as far as I can
> tell the comment I'm removing in this commit isn't right. The path the
> user supplies *has* to be either the  or  dev='name'/> exactly in order for the disk to be matched, and these are
> precisely the two strings used by the two events.
> 
>  tools/virsh-domain.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)

In addition to Andrea's review:

> 
> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> index edbbc34..60de9ba 100644
> --- a/tools/virsh-domain.c
> +++ b/tools/virsh-domain.c
> @@ -1942,18 +1942,20 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
>  goto cleanup;
>  }
>  
> -/* since virsh can't guarantee that the path provided by the user 
> will
> - * later be matched in the event we will need to keep the fallback
> - * approach and claim success if the block job finishes or vanishes. 
> */
> -if (result == 0)
> -break;
> +/* if either event could not be registered we can't guarantee that 
> the
> + * path provided by the user will be matched, so keep the fallback
> + * approach and claim success if the block job finishes or vanishes 
> */

The new statement is not true. If the user provides a filesystem path
different from what is in the definition but matching the same volume [1]
the callback will still return the path present in the configuration and
thus might not ever match and the job would hang forever.

[1] e.g. /path/to/image and /path/to/../to/image are equivalent but
won't be equal using strcmp. The problem is even more prominent if you
mix in some symlinks.

> +if (data->cb_id2 < 0 || data->cb_id2 < 0) {
> +if (result == 0)
> +break;
>  
> -/* for two-phase jobs we will try to wait in the synchronized phase
> - * for event arrival since 100% completion doesn't necessarily mean 
> that
> - * the block job has finished and can be terminated with success */
> -if (info.end == info.cur && --retries == 0) {
> -ret = VIR_DOMAIN_BLOCK_JOB_READY;
> -goto cleanup;
> +/* only wait in the synchronized phase for event arrival if
> + * either callback was registered */
> +if (info.end == info.cur &&
> +((data->cb_id2 < 0 && data->cb_id2 < 0) || --retries == 0)) {
> +ret = VIR_DOMAIN_BLOCK_JOB_READY;
> +goto cleanup;
> +}
>  }

NACK to this approach, if there was a way how this ugly stuff could be
avoided or deleted I'd already do so.

Peter


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list