Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Soren Hansen
On Sun, Aug 15, 2010 at 11:08:43PM +0200, Soren Hansen wrote:
 Unless multiple threads are reading from the fd (?), I'm quite
 confident in my data. I dumped the whole monitor_response buffer right
 before and right after the recvfrom call along with the return value
 of recvfrom.  I'll rerun the tests tomorrow and show you the results.

I'm running this on another kernel right now and I'm not seeing the
problem. I'll try again with the kernel I used a couple of days ago.

 I'm not sure if I pointed this out in my initial e-mail, but even if I
 didn't have this recvfrom problem, the check seems odd to my eye. Is
 the monitor really going to send a max sized datagram each time? I
 would have expected it to only send a datagram the size of the header
 + the length of the data.

My hunch here seems correct though.

I've applied this diff:
diff --git a/src/uml/uml_driver.c b/src/uml/uml_driver.c
index 9cf669f..c289ad3 100644
--- a/src/uml/uml_driver.c
+++ b/src/uml/uml_driver.c
@@ -729,8 +733,13 @@ static int umlMonitorCommand(const struct uml_driver 
*driver,
 do {
 ssize_t nbytes;
 addrlen = sizeof(addr);
+VIR_DEBUG(res.error: %d, res.extra: %d, res.length: %d, res.data: 
%.*s,
+  res.error, res.extra, res.length, res.length, 
res.data);
 nbytes = recvfrom(priv-monitor, res, sizeof res, 0,
   (struct sockaddr *)addr, addrlen);
+VIR_DEBUG(nbytes: %d, nbytes);
+VIR_DEBUG(res.error: %d, res.extra: %d, res.length: %d, res.data: 
%.*s,
+  res.error, res.extra, res.length, res.length, 
res.data);
 if (nbytes  0) {
 if (errno == EAGAIN || errno == EINTR)
 continue;


And I get this output:
11:28:14.602: debug : umlMonitorCommand:702 : Run command 'config con0'
11:28:14.602: debug : umlMonitorCommand:737 : res.error: 5, res.extra: 0, 
res.length: 4096, res.data: 
11:28:14.602: debug : umlMonitorCommand:740 : nbytes: 28
11:28:14.602: debug : umlMonitorCommand:742 : res.error: 0, res.extra: 0, 
res.length: 16, res.data: pts:/dev/pts/31
11:28:14.602: debug : umlMonitorCommand:771 : Command reply is 'pts:/dev/pts/31'

So the size of the response datagram isn't sizeof(res) as the check in
uml_driver.c expects, but rather sizeof(res.error) + sizeof(res.extra) +
sizeof(res.length) + res.length.

-- 
Soren Hansen so...@linux2go.dk
Systems Architect, The Rackspace Cloud
Ubuntu Developer

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Soren Hansen
On Mon, Aug 16, 2010 at 11:37:07AM +0200, Soren Hansen wrote:
 I'm running this on another kernel right now and I'm not seeing the
 problem. I'll try again with the kernel I used a couple of days ago.

Ok, found the other kernel. Same diff as in my previous e-mail, same
action. These are my results:
09:41:01.134: debug : umlMonitorCommand:698 : Run command 'config con0'
09:41:01.134: debug : umlMonitorCommand:733 : res.error: 6, res.extra: 0, 
res.length: 4096, res.data:
09:41:01.134: debug : umlMonitorCommand:736 : nbytes: 0
09:41:01.134: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
res.length: 4, res.data: pts
09:41:01.134: debug : umlMonitorCommand:761 : Command reply is 'pts'
09:41:01.144: debug : umlMonitorCommand:698 : Run command 'config con0'
09:41:01.144: debug : umlMonitorCommand:733 : res.error: 0, res.extra: 0, 
res.length: 4, res.data: pts
09:41:01.144: debug : umlMonitorCommand:736 : nbytes: 0
09:41:01.144: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
res.length: 4, res.data: pts
09:41:01.144: debug : umlMonitorCommand:761 : Command reply is 'pts'
09:41:01.154: debug : umlMonitorCommand:698 : Run command 'config con0'
09:41:01.154: debug : umlMonitorCommand:733 : res.error: 0, res.extra: 0, 
res.length: 4, res.data: pts
09:41:01.155: debug : umlMonitorCommand:736 : nbytes: 0
09:41:01.155: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
res.length: 4, res.data: pts
09:41:01.155: debug : umlMonitorCommand:761 : Command reply is 'pts'
09:41:01.165: debug : umlMonitorCommand:698 : Run command 'config con0'
09:41:01.165: debug : umlMonitorCommand:733 : res.error: 0, res.extra: 0, 
res.length: 4, res.data: pts
09:41:01.169: debug : umlMonitorCommand:736 : nbytes: 0
09:41:01.169: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
res.length: 4, res.data: pts
09:41:01.169: debug : umlMonitorCommand:761 : Command reply is 'pts'
09:41:01.179: debug : umlMonitorCommand:698 : Run command 'config con0'
09:41:01.179: debug : umlMonitorCommand:733 : res.error: 0, res.extra: 0, 
res.length: 4, res.data: pts
09:41:01.179: debug : umlMonitorCommand:736 : nbytes: 0
09:41:01.179: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
res.length: 15, res.data: pts:/dev/pts/7
09:41:01.179: debug : umlMonitorCommand:761 : Command reply is 'pts:/dev/pts/7'

This one's a 2.6.34.1 kernel. The one where I didn't see the problem is
a 2.6.32.something-or-the-other kernel. Mindboggling.


-- 
Soren Hansen so...@linux2go.dk
Systems Architect, The Rackspace Cloud
Ubuntu Developer

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Eric Blake
On 08/16/2010 03:37 AM, Soren Hansen wrote:
 
 And I get this output:
 11:28:14.602: debug : umlMonitorCommand:702 : Run command 'config con0'
 11:28:14.602: debug : umlMonitorCommand:737 : res.error: 5, res.extra: 0, 
 res.length: 4096, res.data: 
 11:28:14.602: debug : umlMonitorCommand:740 : nbytes: 28
 11:28:14.602: debug : umlMonitorCommand:742 : res.error: 0, res.extra: 0, 
 res.length: 16, res.data: pts:/dev/pts/31
 11:28:14.602: debug : umlMonitorCommand:771 : Command reply is 
 'pts:/dev/pts/31'
 
 So the size of the response datagram isn't sizeof(res) as the check in
 uml_driver.c expects, but rather sizeof(res.error) + sizeof(res.extra) +
 sizeof(res.length) + res.length.

I agree with this analysis.  In other words, the check should be more
like this (two conditions - did we get enough bytes to even have a valid
res.length, and did we get enough bytes to match with what res.length
stated):

if (nbytes  offsetof(struct monitor_request, data) ||
nbytes  res.length + offsetof(struct monitor_request, data))
incomplete reply

instead of the two current but incorrect checks:

if (nbytes  sizeof ret) /* incorrectly true except when
res.length==MONITOR_BUFLEN */
  incomplete reply;
if (sizeof res.data  res.length) /* res.length of MONITOR_BUFLEN+1 does
not trigger this check, but leads to a buffer overflow in subsequent
memmove */
  invalid length;

But before I write such a patch, I'm going to look in more details at
your other reply.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Eric Blake
On 08/16/2010 03:54 AM, Soren Hansen wrote:
 On Mon, Aug 16, 2010 at 11:37:07AM +0200, Soren Hansen wrote:
 I'm running this on another kernel right now and I'm not seeing the
 problem. I'll try again with the kernel I used a couple of days ago.
 
 Ok, found the other kernel. Same diff as in my previous e-mail, same
 action. These are my results:
 09:41:01.134: debug : umlMonitorCommand:698 : Run command 'config con0'
 09:41:01.134: debug : umlMonitorCommand:733 : res.error: 6, res.extra: 0, 
 res.length: 4096, res.data:
 09:41:01.134: debug : umlMonitorCommand:736 : nbytes: 0
 09:41:01.134: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
 res.length: 4, res.data: pts

 
 This one's a 2.6.34.1 kernel. The one where I didn't see the problem is
 a 2.6.32.something-or-the-other kernel. Mindboggling.

Oh my.  This really does look like a kernel bug.  Can you confirm it
with an strace?  Have you reported this regression to the right kernel
folks?

I guess it would help if we could write a simpler test program to
isolate whether this recvfrom bug exists in a bare minimum number of
syscalls.  Meanwhile, I have no idea how to work around a buggy recvfrom
that doesn't return the correct number of bytes.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Daniel P. Berrange
On Mon, Aug 16, 2010 at 10:08:37AM -0600, Eric Blake wrote:
 On 08/16/2010 03:54 AM, Soren Hansen wrote:
  On Mon, Aug 16, 2010 at 11:37:07AM +0200, Soren Hansen wrote:
  I'm running this on another kernel right now and I'm not seeing the
  problem. I'll try again with the kernel I used a couple of days ago.
  
  Ok, found the other kernel. Same diff as in my previous e-mail, same
  action. These are my results:
  09:41:01.134: debug : umlMonitorCommand:698 : Run command 'config con0'
  09:41:01.134: debug : umlMonitorCommand:733 : res.error: 6, res.extra: 0, 
  res.length: 4096, res.data:
  09:41:01.134: debug : umlMonitorCommand:736 : nbytes: 0
  09:41:01.134: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
  res.length: 4, res.data: pts
 
  
  This one's a 2.6.34.1 kernel. The one where I didn't see the problem is
  a 2.6.32.something-or-the-other kernel. Mindboggling.
 
 Oh my.  This really does look like a kernel bug.  Can you confirm it
 with an strace?  Have you reported this regression to the right kernel
 folks?
 
 I guess it would help if we could write a simpler test program to
 isolate whether this recvfrom bug exists in a bare minimum number of
 syscalls.  Meanwhile, I have no idea how to work around a buggy recvfrom
 that doesn't return the correct number of bytes.

If there is an identifiable kernel bug then that should be reported
and fixed ASAP. IMHO we shouldn't try to workaround such serious 
brokenness as this is showing.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Soren Hansen
On 16-08-2010 18:08, Eric Blake wrote:
 On 08/16/2010 03:54 AM, Soren Hansen wrote:
 09:41:01.134: debug : umlMonitorCommand:698 : Run command 'config con0'
 09:41:01.134: debug : umlMonitorCommand:733 : res.error: 6, res.extra: 0, 
 res.length: 4096, res.data:
 09:41:01.134: debug : umlMonitorCommand:736 : nbytes: 0
 09:41:01.134: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
 res.length: 4, res.data: pts
 This one's a 2.6.34.1 kernel. The one where I didn't see the problem is
 a 2.6.32.something-or-the-other kernel. Mindboggling.
 Oh my.  This really does look like a kernel bug.  Can you confirm it
 with an strace?

Annoyingly, no. It doesn't happen in the primary libvirt thread, so I
have to pass -f to strace, and uml doesn't let you PTRACE it, so I can't
trigger the problem.

 Have you reported this regression to the right kernel folks?

It doesn't happen in 2.6.35, so I seems isolated to that particular
kernel. It's really very, very odd.

 I guess it would help if we could write a simpler test program to
 isolate whether this recvfrom bug exists in a bare minimum number of
 syscalls.  Meanwhile, I have no idea how to work around a buggy recvfrom
 that doesn't return the correct number of bytes.

No, you're right. Whatever we do to work around this is going to suck
somehow.

-- 
Soren Hansen
Ubuntu Developer
http://www.ubuntu.com/



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Eric Blake
On 08/16/2010 03:37 AM, Soren Hansen wrote:
  nbytes = recvfrom(priv-monitor, res, sizeof res, 0,
(struct sockaddr *)addr, addrlen);
 +VIR_DEBUG(nbytes: %d, nbytes);

User error, not a kernel bug.  Phew.

%d and ssize_t are not always compatible sizes.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Soren Hansen
On 16-08-2010 18:08, Eric Blake wrote:
 Ok, found the other kernel. Same diff as in my previous e-mail, same
 action. These are my results:
 09:41:01.134: debug : umlMonitorCommand:698 : Run command 'config con0'
 09:41:01.134: debug : umlMonitorCommand:733 : res.error: 6, res.extra: 0, 
 res.length: 4096, res.data:
 09:41:01.134: debug : umlMonitorCommand:736 : nbytes: 0
 09:41:01.134: debug : umlMonitorCommand:738 : res.error: 0, res.extra: 0, 
 res.length: 4, res.data: pts

*blush*

Ok, we can officially stop chasing this now. While working on some of
the other patches I've sent, I apparantly managed to copy a bit of code
from one place to another and by luck it was syntactically correct, so
the compiler didn't complain. The offending *two characters* were:


 nbytes = recvfrom(priv-monitor, res, sizeof res, 0,
-  (struct sockaddr *)addr, addrlen);
+  (struct sockaddr *)addr, addrlen)0;
 if (nbytes  0) {

Henceforth, nbytes was 0.

Sorry about the wasted brain cycles on this.

-- 
Soren Hansen
Ubuntu Developer
http://www.ubuntu.com/



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-16 Thread Soren Hansen
On 16-08-2010 18:04, Eric Blake wrote:
 So the size of the response datagram isn't sizeof(res) as the check in
 uml_driver.c expects, but rather sizeof(res.error) + sizeof(res.extra) +
 sizeof(res.length) + res.length.
 I agree with this analysis.  In other words, the check should be more
 like this (two conditions - did we get enough bytes to even have a valid
 res.length, and did we get enough bytes to match with what res.length
 stated):
 
 if (nbytes  offsetof(struct monitor_request, data) ||
 nbytes  res.length + offsetof(struct monitor_request, data))
 incomplete reply

Yup, this looks good.

 But before I write such a patch, I'm going to look in more details at
 your other reply.

Let's just forget all about that one, shall we? Please? :)

-- 
Soren Hansen
Ubuntu Developer
http://www.ubuntu.com/



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-15 Thread Soren Hansen
On Sat, Aug 14, 2010 at 11:11:40AM -0600, Eric Blake wrote:
 First of all, for me, the call to recvfrom returns 0, even though the
 buffer actually is populated with the response from the monitor.
 I'm not sure about this one.  The recvfrom is called in a loop, and
 POSIX says that recvfrom shall return the size if a message was read, or
 0 if no messages were available but the other end of the connection did
 an orderly shutdown.

Yes, it does. That's part of why it took me so long to figure out. I really
didn't expect data to come back when recvfrom returned 0 :)

 I think we need a bit more understanding of why you are getting a 0 return,
 and whether it only happens after a successful iteration of the loop (in
 which case the contents of res are leftover from the prior iteration).

Unless multiple threads are reading from the fd (?), I'm quite confident
in my data. I dumped the whole monitor_response buffer right before and
right after the recvfrom call along with the return value of recvfrom.
I'll rerun the tests tomorrow and show you the results.

If my results are accurate and it is indeed a kernel bug (well, or
eglibc or whatnot), I believe it's in the spirit of libvirt to accept
that one of the other parts of the equation is doing something silly and
work around it. :) I'm happy to provide more evidence, though.

I'm not sure if I pointed this out in my initial e-mail, but even if I
didn't have this recvfrom problem, the check seems odd to my eye. Is the
monitor really going to send a max sized datagram each time? I would
have expected it to only send a datagram the size of the header + the
length of the data.

-- 
Soren Hansen so...@linux2go.dk
Systems Architect, The Rackspace Cloud
Ubuntu Developer

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-14 Thread Eric Blake
On 08/12/2010 03:00 AM, Soren Hansen wrote:
 The current check for the size of the response from the uml monitor is
 problematic for a couple of reasons:
 
 First of all, for me, the call to recvfrom returns 0, even though the
 buffer actually is populated with the response from the monitor.

I'm not sure about this one.  The recvfrom is called in a loop, and
POSIX says that recvfrom shall return the size if a message was read, or
0 if no messages were available but the other end of the connection did
an orderly shutdown.  I think we need a bit more understanding of why
you are getting a 0 return, and whether it only happens after a
successful iteration of the loop (in which case the contents of res are
leftover from the prior iteration).

 
 At any rate, the only way I can get anything useful to happen here is to
 completely remove the check, so this patch does just that.

In other words, maybe the better thing to do is add an earlier check:
if (nbytes == 0) break;
but I'm not sure of that idea, either, without more understanding of the
root cause.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 3/3] Remove wrong check for uml monitor response size

2010-08-12 Thread Soren Hansen
The current check for the size of the response from the uml monitor is
problematic for a couple of reasons:

First of all, for me, the call to recvfrom returns 0, even though the
buffer actually is populated with the response from the monitor.

Second, it seems to me that (assuming recvfrom actually returned the
number of bytes read) it should be compared against the size of the
response header plus the actual data length rather than the max size of
the datagram.

At any rate, the only way I can get anything useful to happen here is to
completely remove the check, so this patch does just that.

Signed-off-by: Soren Hansen so...@linux2go.dk
---
 src/uml/uml_driver.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/src/uml/uml_driver.c b/src/uml/uml_driver.c
index 04493ba..9cf669f 100644
--- a/src/uml/uml_driver.c
+++ b/src/uml/uml_driver.c
@@ -737,10 +737,6 @@ static int umlMonitorCommand(const struct uml_driver 
*driver,
 virReportSystemError(errno, _(cannot read reply %s), cmd);
 goto error;
 }
-if (nbytes  sizeof res) {
-virReportSystemError(0, _(incomplete reply %s), cmd);
-goto error;
-}
 if (sizeof res.data  res.length) {
 virReportSystemError(0, _(invalid length in reply %s), cmd);
 goto error;
-- 
1.7.0.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list