Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/27/15 10:11 AM, Markus Armbruster wrote: [...] Eduardo, I did try this approach. It takes 2 line changes in exec.c: comment the unlink out, and making sure MAP_SHARED is used when -mem-path and -mem-prealloc are given. It works beautifully, and libvmi accesses are fast. However, the VM is slowed down to a crawl, obviously, because each RAM access by the VM triggers a page fault on the mmapped file. I don't think having a crawling VM is desirable, so this approach goes out the door. Uh, I don't understand why "each RAM access by the VM triggers a page fault". Can you show us the patch you used? Sorry, too brief of an explanation. Every time the guest flips a byte in physical RAM, I think that triggers a page write to the mmaped file. My understanding is that, with MAP_SHARED, each write to RAM triggers a file write, hence the slowness. These are the simple changes I made, to test it - as a proof of concept. Ah, that actually makes sense. Thanks! [...] However, when the guest RAM mmap'ed file resides on a RAMdisk on the host, the guest OS responsiveness is more than acceptable. Perhaps this is a viable approach. It might requires a minimum of changes to the QEMU source and maybe 1 extra command line argument.
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/27/15 9:00 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/26/15 11:52 AM, Eduardo Habkost wrote: I was trying to advocate the use of a shared mmap'ed region. The sharing would be two-ways (RW for both) between the QEMU virtualizer and the libvmi process. I envision that there could be a QEMU command line argument, such as "--mmap-guest-memory " Understand that Eric feels strongly the libvmi client should own the file name - I have not forgotten that. When that command line argument is given, as part of the guest initialization, QEMU creates a file of size equal to the size of the guest memory containing all zeros, mmaps that file to the guest memory with PROT_READ|PROT_WRITE and MAP_FILE|MAP_SHARED, then starts the guest. This is basically what memory-backend-file (and the legacy -mem-path option) already does today, but it unlinks the file just after opening it. We can change it to accept a full filename and/or an option to make it not unlink the file after opening it. I don't remember if memory-backend-file is usable without -numa, but we could make it possible somehow. Eduardo, I did try this approach. It takes 2 line changes in exec.c: comment the unlink out, and making sure MAP_SHARED is used when -mem-path and -mem-prealloc are given. It works beautifully, and libvmi accesses are fast. However, the VM is slowed down to a crawl, obviously, because each RAM access by the VM triggers a page fault on the mmapped file. I don't think having a crawling VM is desirable, so this approach goes out the door. Uh, I don't understand why "each RAM access by the VM triggers a page fault". Can you show us the patch you used? Sorry, too brief of an explanation. Every time the guest flips a byte in physical RAM, I think that triggers a page write to the mmaped file. My understanding is that, with MAP_SHARED, each write to RAM triggers a file write, hence the slowness. These are the simple changes I made, to test it - as a proof of concept. in exec.c of the qemu-2.4.0.1 change --- fd = mkstemp(filename); if (fd < 0) { error_setg_errno(errp, errno, "unable to create backing store for hugepages"); g_free(filename); goto error; } unlink(filename); g_free(filename); memory = (memory+hpagesize-1) & ~(hpagesize-1); /* * ftruncate is not supported by hugetlbfs in older * hosts, so don't bother bailing out on errors. * If anything goes wrong with it under other filesystems, * mmap will fail. */ if (ftruncate(fd, memory)) { perror("ftruncate"); } area = mmap(0, memory, PROT_READ | PROT_WRITE, (block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE), fd, 0); --- to --- fd = mkstemp(filename); if (fd < 0) { error_setg_errno(errp, errno, "unable to create backing store for hugepages"); g_free(filename); goto error; } /* unlink(filename); */ /* Valerio's change to persist guest RAM mmaped file */ g_free(filename); memory = (memory+hpagesize-1) & ~(hpagesize-1); /* * ftruncate is not supported by hugetlbfs in older * hosts, so don't bother bailing out on errors. * If anything goes wrong with it under other filesystems, * mmap will fail. */ if (ftruncate(fd, memory)) { perror("ftruncate"); } area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, /* Valerio's change to persist guest RAM mmaped file */ fd, 0); --- then, recompile qemu. Launch a VM as /usr/local/bin/qemu-system-x86_64 -name Windows10 -S -machine pc-i440fx-2.4,accel=kvm,usb=off [...] -mem-prealloc -mem-path /tmp/maps # I know -mem-path is deprecated, but I used for speeding up the proof of concept. With the above command, I have a the following file $ ls -l /tmp/maps/ -rw--- 1 libvirt-qemu kvm 2147483648 Oct 27 08:31 qemu_back_mem.pc.ram.fP4sKH which is a mmap of the Win VM physical RAM $ hexdump -C /tmp/maps/qemu_back_mem.val.pc.ram.fP4sKH 53 ff 00 f0 53 ff 00 f0 c3 e2 00 f0 53 ff 00 f0 |S...S...S...| [...] 0760 24 02 c3 49 6e 76 61 6c 69 64 20 70 61 72 74 69 |$..Invalid parti| 0770 74 69 6f 6e 20 74 61 62 6c 65 00 45 72 72 6f 72 |tion table.Error| 0780 20 6c 6f 61 64 69 6e 67 20 6f 70 65 72 61 74 69 | loading operati| 0790 6e 67 20 73 79 73 74 65 6d 00 4d 69 73 73 69 6e |ng system.Missin| 07a0 67 20 6f 70 65 72 61 74 69 6e 67 20 73 79 73 74 |g operating syst| 07b0 65 6d 00 00 00 63 7b 9a 73 d8 99 ce 00 00 80 20 |em...c{.s.. | [...] I did not try to mmap'ing to a file on a RAMdisk. Without physical disk I/O, the VM might run faster.
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/27/15 9:18 AM, Valerio Aimale wrote: I did not try to mmap'ing to a file on a RAMdisk. Without physical disk I/O, the VM might run faster. I did try with the file on a ramdisk $ sudo mount -o size=3G -t tmpfs none /ramdisk $ /usr/local/bin/qemu-system-x86_64 -name Windows10 -S -machine pc-i440fx-2.4,accel=kvm,usb=off [...] -mem-prealloc -mem-path /ramdisk with that, the speed of the VM is acceptable. It will not take your breath away, but it is reasonable.
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/26/15 11:52 AM, Eduardo Habkost wrote: I was trying to advocate the use of a shared mmap'ed region. The sharing would be two-ways (RW for both) between the QEMU virtualizer and the libvmi process. I envision that there could be a QEMU command line argument, such as "--mmap-guest-memory " Understand that Eric feels strongly the libvmi client should own the file name - I have not forgotten that. When that command line argument is given, as part of the guest initialization, QEMU creates a file of size equal to the size of the guest memory containing all zeros, mmaps that file to the guest memory with PROT_READ|PROT_WRITE and MAP_FILE|MAP_SHARED, then starts the guest. This is basically what memory-backend-file (and the legacy -mem-path option) already does today, but it unlinks the file just after opening it. We can change it to accept a full filename and/or an option to make it not unlink the file after opening it. I don't remember if memory-backend-file is usable without -numa, but we could make it possible somehow. Eduardo, I did try this approach. It takes 2 line changes in exec.c: comment the unlink out, and making sure MAP_SHARED is used when -mem-path and -mem-prealloc are given. It works beautifully, and libvmi accesses are fast. However, the VM is slowed down to a crawl, obviously, because each RAM access by the VM triggers a page fault on the mmapped file. I don't think having a crawling VM is desirable, so this approach goes out the door. I think we're back at estimating the speed of other approaches as discussed previously: - via UNIX socket as per existing patch - via xp parsing the human readable xp output - via an xp-like command the returns memory content baseXX-encoded into a json string - via shared memory as per existing code and patch Any other?
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/26/15 3:09 AM, Markus Armbruster wrote: [...] Eduardo, I think it would be a common rule of politeness not to pass any judgement on a person that you don't know, but for some texts in a mailing list. I think I understand how mmap() works, and very well. Participating is this discussion has been a struggle for me. For the good of the libvmi users, I have been trying to ignore the judgements, the comments and so on. But, alas, I throw my hands up in the air, and I surrender. I'm sorry we exceeded your tolerance for frustration. This mailing list can be tough. We try to be welcoming (believe it or not), but we too often fail (okay, that part is easily believable). To be honest, I had difficulties understanding your explanation, and ended up guessing. I figure Eduardo did the same, and guessed incorrectly. There but for the grace of God go I. Well, I did scribble my C sample excerpt too fast. Participating in mailing list is not part of my job description - I was short on time, I admit to that. However, there is a big difference in saying "I do no understand your explanation, please try again" and saying "you're confused about mmap()" I was trying to advocate the use of a shared mmap'ed region. The sharing would be two-ways (RW for both) between the QEMU virtualizer and the libvmi process. I envision that there could be a QEMU command line argument, such as "--mmap-guest-memory " Understand that Eric feels strongly the libvmi client should own the file name - I have not forgotten that. When that command line argument is given, as part of the guest initialization, QEMU creates a file of size equal to the size of the guest memory containing all zeros, mmaps that file to the guest memory with PROT_READ|PROT_WRITE and MAP_FILE|MAP_SHARED, then starts the guest. And, if at all possible, makes the filename querable via qmp and/or hmp, so that the filename of the mmap would not need to be maintained in two different places, leading to maintenance nightmares. Shared mmaped regions can be used as inter-process communication, here's a quick and dirty example: p1.c --- #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include void handle_signal(int signal); /* sorry, for ease of development I need these to be global */ int fh; char *p; void handle_signal(int signal) { const char *signal_name; sigset_t pending; switch (signal) { case SIGHUP: signal_name = "SIGHUP"; fprintf(stdout, "Process 1 -- Map now contains: %s\n", p); munmap(p, sysconf(_SC_PAGE_SIZE) ); close(fh); exit(0); break; default: fprintf(stderr, "Caught wrong signal: %d\n", signal); return; } } void main(int argc, char **argv) { struct sigaction sa; sa.sa_handler = &handle_signal; sa.sa_flags = SA_RESTART; sigfillset(&sa.sa_mask); if (sigaction(SIGHUP, &sa, NULL) == -1) { perror("Error: cannot handle SIGHUP"); exit(1); } if ( (fh = open("shared.map", O_RDWR | O_CREAT, S_IRWXU)) ) { p = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ|PROT_WRITE, MAP_FILE|MAP_SHARED, fh, (off_t) 0); if (p == MAP_FAILED) { printf("poop, didn't map: %s\n",strerror(errno)); close(fh); exit(1);} p[0] = 0xcc; fprintf(stdout, "Process 1 -- Writing to map: All your bases are belong to us.\n"); sprintf( (char*) p, "All your bases are belong to us."); while(1); } } --- p2.c --- #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include void main(int argc, char **argv) { int fh; void *p; int pid; pid = atoi(argv[1]); sleep(1); if ( (fh = open("shared.map", O_RDWR, S_IRWXU)) ) { p = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FILE, fh, (off_t) 0); printf("Process 2 -- Map now contains: %s\n", (char*)p); printf("Process 2 -- Writing to map: All your bases *NOW* belong to us.\n"); fflush(stdout); sprintf( (char*) p, "All your bases *NOW* belong to us."); kill(pid, SIGHUP); sleep(3); munmap(p, sysconf(_SC_PAGE_SIZE)); close(fh); } } --- if I run both, in bash, as: rm -f shared.map ; \ gcc -o p1 p1.c ; \ gcc -o p2 p2.c ; \ for i in ` seq 1 `getconf PAGESIZE``; do echo -e -n "\0" > shared.map ; done ; \ ./p1 & ./p2 $! I get the following output: $ rm -f shared.map ; \ > gcc -o p1 p1.c ; \ > gcc -o p2 p2.c ; \ > for i in ` seq 1 `getconf PAGESIZE``; do echo -e -n "\0" > shared.ma
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/23/15 12:55 PM, Eduardo Habkost wrote: On Thu, Oct 22, 2015 at 03:51:28PM -0600, Valerio Aimale wrote: On 10/22/15 3:47 PM, Eduardo Habkost wrote: On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote: On 10/22/15 1:12 PM, Eduardo Habkost wrote: On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote: Valerio Aimale writes: [...] There's also a similar patch, floating around the internet, the uses shared memory, instead of sockets, as inter-process communication between libvmi and QEMU. I've never used that. By the time you built a working IPC mechanism on top of shared memory, you're often no better off than with AF_LOCAL sockets. Crazy idea: can we allocate guest memory in a way that support sharing it with another process? Eduardo, can -mem-path do such wild things? It can't today, but just because it creates a temporary file inside mem-path and unlinks it immediately after opening a file descriptor. We could make memory-backend-file also accept a full filename as argument, or add a mechanism to let QEMU send the open file descriptor to a QMP client. Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest memory footprint work, augmented by Eric's suggestion of having the qmp client pass the filename? The code below doesn't make sense to me. Ok. What I am trying to do is to create a mmapped() memory area of the guest physical memory that can be shared between QEMU and an external process, such that the external process can read arbitrary locations of the qemu guest physical memory. In short, I'm using mmap MAP_SHARED to share the guest memory area with a process that is external to QEMU does it make better sense now? I think you are confused about what mmap() does. It will create a new mapping into the process address space, containing the data from an existing file, not the other way around. Eduardo, I think it would be a common rule of politeness not to pass any judgement on a person that you don't know, but for some texts in a mailing list. I think I understand how mmap() works, and very well. Participating is this discussion has been a struggle for me. For the good of the libvmi users, I have been trying to ignore the judgements, the comments and so on. But, alas, I throw my hands up in the air, and I surrender. I think libvmi can live, as it has for the past years, by patching the QEMU source tree on as needed basis, and keeping the patch in the libvmi source tree, without disturbing any further the QEMU community.
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/23/15 8:56 AM, Eric Blake wrote: On 10/23/2015 08:44 AM, Valerio Aimale wrote: Libvmi dependence on virsh is so strict, that libvmi does not even know if the QEMU VM has an open qmp unix socket or inet socket, to send commands through. Thus, libvmi sends qmp commands (to query registers, as an example) via virsh qemu-monitor-command Windows10B '{"execute": "human-monitor-command", "arguments": {"command-line": "info registers"}}' This is an unsupported back door of libvirt; you should really also consider enhancing libvirt to add a formal API to expose this information so that you don't have to resort to the monitor back door. But that's a topic for the libvirt list. so that libvmi can find the rendezvous unix/inet address and sends commands through that. As of now, each qmp commands requires a popen() that forks virsh, which compounds to the slowness. No, don't blame virsh for your slowness. Write your own C program that links against libvirt.so, and which holds and reuses a persistent connection, using the same libvirt APIs as would be used by virsh. All the overhead of spawning a shell to spawn virsh to open a fresh connection for each command will go away. Any solution that uses popen() to virsh is _screaming_ to be rewritten to use libvirt.so natively. Eric, good points. Libvmi was written quite some time ago; that libvirt api might not even have been there. When we agree on an access method to the guest mem, I will rewrite those legacy parts. I'm not the author of libvmi. I'm just trying to make it better, when accessing QEMU/KVM VMs
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/23/15 2:18 AM, Daniel P. Berrange wrote: [...] It can't today, but just because it creates a temporary file inside mem-path and unlinks it immediately after opening a file descriptor. We could make memory-backend-file also accept a full filename as argument, or add a mechanism to let QEMU send the open file descriptor to a QMP client. Valerio, would an command line option to share guest memory suffice, or does it have to be a monitor command? If the latter, why? IIUC, libvmi wants to be able to connect to arbitrary pre-existing running KVM instances on the host. As such I think it cannot assume anything about the way they have been started, so requiring they be booted with a special command line arg looks impractical for this scenario. Regards, Daniel Daniel, you are correct. libvmi knows of QEMU/KVM VMs only via libvirt/virsh. The scenario would work if libvmi was able to query, via qmp command, the path of the shared memory map.
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/23/15 12:35 AM, Markus Armbruster wrote: Eduardo Habkost writes: On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote: Valerio Aimale writes: [...] There's also a similar patch, floating around the internet, the uses shared memory, instead of sockets, as inter-process communication between libvmi and QEMU. I've never used that. By the time you built a working IPC mechanism on top of shared memory, you're often no better off than with AF_LOCAL sockets. Crazy idea: can we allocate guest memory in a way that support sharing it with another process? Eduardo, can -mem-path do such wild things? It can't today, but just because it creates a temporary file inside mem-path and unlinks it immediately after opening a file descriptor. We could make memory-backend-file also accept a full filename as argument, or add a mechanism to let QEMU send the open file descriptor to a QMP client. Valerio, would an command line option to share guest memory suffice, or does it have to be a monitor command? If the latter, why? As Daniel points out, later in the thread, libvmi knows about QEMU VMs only via libvirt/virsh. Thus, a command line option to share guest memory would work only if there was an info qmp command, info-guestmaps that would return something like { 'enabled' : 'true', 'filename' : '/path/to/the/guest/memory/map' } so that libvmi can find the map. Libvmi dependence on virsh is so strict, that libvmi does not even know if the QEMU VM has an open qmp unix socket or inet socket, to send commands through. Thus, libvmi sends qmp commands (to query registers, as an example) via virsh qemu-monitor-command Windows10B '{"execute": "human-monitor-command", "arguments": {"command-line": "info registers"}}' It would be nice if there was a qmp command to query if there are qmp/hmp sockets open, info-sockets that would return something like { 'unix' : { 'enabled': 'true', 'address': '/path/to/socket/' }, 'inet' : { 'enabled': 'true', 'address': '1.2.3.4:5678' } } so that libvmi can find the rendezvous unix/inet address and sends commands through that. As of now, each qmp commands requires a popen() that forks virsh, which compounds to the slowness.
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/22/15 3:47 PM, Eduardo Habkost wrote: On Thu, Oct 22, 2015 at 01:57:13PM -0600, Valerio Aimale wrote: On 10/22/15 1:12 PM, Eduardo Habkost wrote: On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote: Valerio Aimale writes: [...] There's also a similar patch, floating around the internet, the uses shared memory, instead of sockets, as inter-process communication between libvmi and QEMU. I've never used that. By the time you built a working IPC mechanism on top of shared memory, you're often no better off than with AF_LOCAL sockets. Crazy idea: can we allocate guest memory in a way that support sharing it with another process? Eduardo, can -mem-path do such wild things? It can't today, but just because it creates a temporary file inside mem-path and unlinks it immediately after opening a file descriptor. We could make memory-backend-file also accept a full filename as argument, or add a mechanism to let QEMU send the open file descriptor to a QMP client. Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest memory footprint work, augmented by Eric's suggestion of having the qmp client pass the filename? The code below doesn't make sense to me. Ok. What I am trying to do is to create a mmapped() memory area of the guest physical memory that can be shared between QEMU and an external process, such that the external process can read arbitrary locations of the qemu guest physical memory. In short, I'm using mmap MAP_SHARED to share the guest memory area with a process that is external to QEMU does it make better sense now?
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/22/15 2:03 PM, Eric Blake wrote: On 10/22/2015 01:57 PM, Valerio Aimale wrote: pmemmap would return the following json { 'success' : 'true', 'map_filename' : '/tmp/QEM_mmap_1234567' } In general, it is better if the client controls the filename, and not qemu. This is because things like libvirt like to run qemu in a highly-constrained environment, where the caller can pass in a file descriptor that qemu cannot itself open(). So returning a filename is pointless if the filename was already provided by the caller. Eric, I absolutely and positively agree with you. I was just brainstorming. Consider my pseudo-C code as the mailing list analog of somebody scribbling on a white board and trying explain an idea. I agree with you the file name should come from the qmp client.
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/22/15 1:12 PM, Eduardo Habkost wrote: On Wed, Oct 21, 2015 at 12:54:23PM +0200, Markus Armbruster wrote: Valerio Aimale writes: [...] There's also a similar patch, floating around the internet, the uses shared memory, instead of sockets, as inter-process communication between libvmi and QEMU. I've never used that. By the time you built a working IPC mechanism on top of shared memory, you're often no better off than with AF_LOCAL sockets. Crazy idea: can we allocate guest memory in a way that support sharing it with another process? Eduardo, can -mem-path do such wild things? It can't today, but just because it creates a temporary file inside mem-path and unlinks it immediately after opening a file descriptor. We could make memory-backend-file also accept a full filename as argument, or add a mechanism to let QEMU send the open file descriptor to a QMP client. Eduardo, would my "artisanal" idea of creating an mmap'ed image of the guest memory footprint work, augmented by Eric's suggestion of having the qmp client pass the filename? qmp_pmemmap( [...]) { char *template = "/tmp/QEM_mmap_XXX"; int mmap_fd; uint8_t *local_memspace = malloc( (size_t) 8589934592 /* assuming VM with 8GB RAM */); cpu_physical_memory_rw( (hwaddr) 0, local_memspace , (hwaddr) 8589934592 /* assuming VM with 8GB RAM */, 0 /* no write for now will discuss write later */); mmap_fd = mkstemp("/tmp/QEUM_mmap_XXX"); mmap((void *) local_memspace, (size_t) 8589934592, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, mmap_fd, (off_t) 0); /* etc */ } pmemmap would return the following json { 'success' : 'true', 'map_filename' : '/tmp/QEM_mmap_1234567' }
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/21/15 4:54 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/19/15 1:52 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/16/15 2:15 AM, Markus Armbruster wrote: vale...@aimale.com writes: All- I've produced a patch for the current QEMU HEAD, for libvmi to introspect QEMU/KVM VMs. Libvmi has patches for the old qeum-kvm fork, inside its source tree: https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch This patch adds a hmp and a qmp command, "pmemaccess". When the commands is invoked with a string arguments (a filename), it will open a UNIX socket and spawn a listening thread. The client writes binary commands to the socket, in the form of a c structure: struct request { uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved uint64_t address; // address to read from OR write to uint64_t length;// number of bytes to read OR write }; The client receives as a response, either (length+1) bytes, if it is a read operation, or 1 byte ifit is a write operation. The last bytes of a read operation response indicates success (1 success, 0 failure). The single byte returned for a write operation indicates same (1 success, 0 failure). So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of garbage followed by the "it failed" byte? Markus, that appear to be the case. However, I did not write the communication protocol between libvmi and qemu. I'm assuming that the person that wrote the protocol, did not want to bother with over complicating things. https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c I'm thinking he assumed reads would be small in size and the price of reading garbage was less than the price of writing a more complicated protocol. I can see his point, confronted with the same problem, I might have done the same. All right, the interface is designed for *small* memory blocks then. Makes me wonder why he needs a separate binary protocol on a separate socket. Small blocks could be done just fine in QMP. The problem is speed. if one's analyzing the memory space of a running process (physical and paged), libvmi will make a large number of small and mid-sized reads. If one uses xp, or pmemsave, the overhead is quite significant. xp has overhead due to encoding, and pmemsave has overhead due to file open/write (server), file open/read/close/unlink (client). Others have gone through the problem before me. It appears that pmemsave and xp are significantly slower than reading memory using a socket via pmemaccess. That they're slower isn't surprising, but I'd expect the cost of encoding a small block to be insiginificant compared to the cost of the network roundtrips. As block size increases, the space overhead of encoding will eventually bite. But for that usage, the binary protocol appears ill-suited, unless the client can pretty reliably avoid read failure. I haven't examined its failure modes, yet. The following data is not mine, but it shows the time, in milliseconds, required to resolve the content of a paged memory address via socket (pmemaccess) , pmemsave and xp http://cl.ly/image/322a3s0h1V05 Again, I did not produce those data points, they come from an old libvmi thread. 90ms is a very long time. What exactly was measured? I think it might be conceivable that there could be a QMP command that returns the content of an arbitrarily size memory region as a base64 or a base85 json string. It would still have both time- (due to encoding/decoding) and space- (base64 has 33% and ase85 would be 7%) overhead, + json encoding/decoding overhead. It might still be the case that socket would outperform such a command as well, speed-vise. I don't think it would be any faster than xp. A special-purpose binary protocol over a dedicated socket will always do less than a QMP solution (ignoring foolishness like transmitting crap on read error the client is then expected to throw away). The question is whether the difference in work translates to a worthwhile difference in performance. The larger question is actually whether we have an existing interface that can serve the libvmi's needs. We've discussed monitor commands like xp, pmemsave, pmemread. There's another existing interface: the GDB stub. Have you considered it? There's also a similar patch, floating around the internet, the uses shared memory, instead of sockets, as inter-process communication between libvmi and QEMU. I've never used that. By the time you built a working IPC mechanism on top of shared memory, you're often no better off than with AF_LOCAL sockets. Crazy idea: can we allocate guest memory in a way that support sharing it with another process? Eduardo, can -mem-path do such wild things? Markus, your suggestion led to a lightbulb going off in my head. What if there was a qmp command, say 'pmem
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/22/15 5:50 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/21/15 4:54 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/19/15 1:52 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/16/15 2:15 AM, Markus Armbruster wrote: vale...@aimale.com writes: All- I've produced a patch for the current QEMU HEAD, for libvmi to introspect QEMU/KVM VMs. Libvmi has patches for the old qeum-kvm fork, inside its source tree: https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch This patch adds a hmp and a qmp command, "pmemaccess". When the commands is invoked with a string arguments (a filename), it will open a UNIX socket and spawn a listening thread. The client writes binary commands to the socket, in the form of a c structure: struct request { uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved uint64_t address; // address to read from OR write to uint64_t length;// number of bytes to read OR write }; The client receives as a response, either (length+1) bytes, if it is a read operation, or 1 byte ifit is a write operation. The last bytes of a read operation response indicates success (1 success, 0 failure). The single byte returned for a write operation indicates same (1 success, 0 failure). So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of garbage followed by the "it failed" byte? Markus, that appear to be the case. However, I did not write the communication protocol between libvmi and qemu. I'm assuming that the person that wrote the protocol, did not want to bother with over complicating things. https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c I'm thinking he assumed reads would be small in size and the price of reading garbage was less than the price of writing a more complicated protocol. I can see his point, confronted with the same problem, I might have done the same. All right, the interface is designed for *small* memory blocks then. Makes me wonder why he needs a separate binary protocol on a separate socket. Small blocks could be done just fine in QMP. The problem is speed. if one's analyzing the memory space of a running process (physical and paged), libvmi will make a large number of small and mid-sized reads. If one uses xp, or pmemsave, the overhead is quite significant. xp has overhead due to encoding, and pmemsave has overhead due to file open/write (server), file open/read/close/unlink (client). Others have gone through the problem before me. It appears that pmemsave and xp are significantly slower than reading memory using a socket via pmemaccess. That they're slower isn't surprising, but I'd expect the cost of encoding a small block to be insiginificant compared to the cost of the network roundtrips. As block size increases, the space overhead of encoding will eventually bite. But for that usage, the binary protocol appears ill-suited, unless the client can pretty reliably avoid read failure. I haven't examined its failure modes, yet. The following data is not mine, but it shows the time, in milliseconds, required to resolve the content of a paged memory address via socket (pmemaccess) , pmemsave and xp http://cl.ly/image/322a3s0h1V05 Again, I did not produce those data points, they come from an old libvmi thread. 90ms is a very long time. What exactly was measured? That is a fair question to ask. Unfortunately, I extracted that data plot from an old thread in some libvmi mailing list. I do not have the data and code that produced it. Sifting through the thread, I can see the code was never published. I will take it upon myself to produce code that compares timing - in a fair fashion - of libvmi doing an atomic operation and a larger-scale operation (like listing running processes) via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a version of xp that returns byte streams of memory regions base64 or base85 encoded in json strings. I'll publish results and code. However, given workload and life happening, it will be some time before I complete that task. No problem. I'd like to have your use case addressed, but there's no need for haste. Thanks, Markus. Appreciate your help. [...] Also, the pmemsave commands QAPI should be changed to be usable with 64bit VM's in qapi-schema.json from --- { 'command': 'pmemsave', 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} } --- to --- { 'command': 'pmemsave', 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} } --- In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's confusing. I think it's confusing for the HMP parser too. If you have a VM w
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/21/15 4:54 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/19/15 1:52 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/16/15 2:15 AM, Markus Armbruster wrote: vale...@aimale.com writes: All- I've produced a patch for the current QEMU HEAD, for libvmi to introspect QEMU/KVM VMs. Libvmi has patches for the old qeum-kvm fork, inside its source tree: https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch This patch adds a hmp and a qmp command, "pmemaccess". When the commands is invoked with a string arguments (a filename), it will open a UNIX socket and spawn a listening thread. The client writes binary commands to the socket, in the form of a c structure: struct request { uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved uint64_t address; // address to read from OR write to uint64_t length;// number of bytes to read OR write }; The client receives as a response, either (length+1) bytes, if it is a read operation, or 1 byte ifit is a write operation. The last bytes of a read operation response indicates success (1 success, 0 failure). The single byte returned for a write operation indicates same (1 success, 0 failure). So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of garbage followed by the "it failed" byte? Markus, that appear to be the case. However, I did not write the communication protocol between libvmi and qemu. I'm assuming that the person that wrote the protocol, did not want to bother with over complicating things. https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c I'm thinking he assumed reads would be small in size and the price of reading garbage was less than the price of writing a more complicated protocol. I can see his point, confronted with the same problem, I might have done the same. All right, the interface is designed for *small* memory blocks then. Makes me wonder why he needs a separate binary protocol on a separate socket. Small blocks could be done just fine in QMP. The problem is speed. if one's analyzing the memory space of a running process (physical and paged), libvmi will make a large number of small and mid-sized reads. If one uses xp, or pmemsave, the overhead is quite significant. xp has overhead due to encoding, and pmemsave has overhead due to file open/write (server), file open/read/close/unlink (client). Others have gone through the problem before me. It appears that pmemsave and xp are significantly slower than reading memory using a socket via pmemaccess. That they're slower isn't surprising, but I'd expect the cost of encoding a small block to be insiginificant compared to the cost of the network roundtrips. As block size increases, the space overhead of encoding will eventually bite. But for that usage, the binary protocol appears ill-suited, unless the client can pretty reliably avoid read failure. I haven't examined its failure modes, yet. The following data is not mine, but it shows the time, in milliseconds, required to resolve the content of a paged memory address via socket (pmemaccess) , pmemsave and xp http://cl.ly/image/322a3s0h1V05 Again, I did not produce those data points, they come from an old libvmi thread. 90ms is a very long time. What exactly was measured? That is a fair question to ask. Unfortunately, I extracted that data plot from an old thread in some libvmi mailing list. I do not have the data and code that produced it. Sifting through the thread, I can see the code was never published. I will take it upon myself to produce code that compares timing - in a fair fashion - of libvmi doing an atomic operation and a larger-scale operation (like listing running processes) via gdb, pmemaccess/socket, pmemsave, xp, and hopefully, a version of xp that returns byte streams of memory regions base64 or base85 encoded in json strings. I'll publish results and code. However, given workload and life happening, it will be some time before I complete that task. I think it might be conceivable that there could be a QMP command that returns the content of an arbitrarily size memory region as a base64 or a base85 json string. It would still have both time- (due to encoding/decoding) and space- (base64 has 33% and ase85 would be 7%) overhead, + json encoding/decoding overhead. It might still be the case that socket would outperform such a command as well, speed-vise. I don't think it would be any faster than xp. A special-purpose binary protocol over a dedicated socket will always do less than a QMP solution (ignoring foolishness like transmitting crap on read error the client is then expected to throw away). The question is whether the difference in work translates to a worthwhile difference in performance. The larger question is actually whether we have an existing interface that can serve the libvmi's needs. We've d
Re: [Qemu-devel] [PATCH] QEMU patch for libvmi to introspect QEMU/kvm virtual machines. Usually this patch is distributed with libvmi, but, it might be more useful to have it in the QEMU source perman
Eric, thanks for your comments. I'm going to take the liberty to top posts some notes. On grammar awkwardness, indentation, documentation, and coding style. I agree with you. Mea culpa. I take full responsibility. I was too eager to submit the patch. I'll be less eager in the future. If and when we decide that this patch belongs in the QEMU source tree, I will clean up grammar, documentation and code. However, as per discussion with Markus, that is still up in the air. So I'll hold of on those for now. Below discussions of two issues only, endianness and fprintf. Valerio On 10/19/15 3:33 PM, Eric Blake wrote: On 10/15/2015 05:44 PM, vale...@aimale.com wrote: From: Valerio Aimale Long subject line, and no message body. Remember, you want the subject line to be a one-line short summary of 'what', then the commit body message for 'why', as in: qmp: add command for libvmi memory introspection In the past, libvmi was relying on an out-of-tree patch to qemu that provides a new QMP command pmemaccess. It is now time to make this command part of qemu. pmemaccess is used to create a side-channel communication path that can more effectively be used to query lots of small memory chunks without the overhead of one QMP command per chunk. ... --- You are missing a Signed-off-by: tag. Without that, we cannot take your patch. But at least we can still review it: Makefile.target | 2 +- hmp-commands.hx | 14 hmp.c| 9 +++ hmp.h| 1 + memory-access.c | 206 +++ memory-access.h | 21 ++ qapi-schema.json | 28 qmp-commands.hx | 23 +++ 8 files changed, 303 insertions(+), 1 deletion(-) create mode 100644 memory-access.c create mode 100644 memory-access.h diff --git a/Makefile.target b/Makefile.target index 962d004..940ab51 100644 --- a/Makefile.target +++ b/Makefile.target @@ -131,7 +131,7 @@ endif #CONFIG_BSD_USER # # System emulator target ifdef CONFIG_SOFTMMU -obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o +obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o memory-access.o This line is now over 80 columns; please wrap. obj-y += qtest.o bootdevice.o In fact, you could have just appended it into this line instead. +++ b/hmp-commands.hx @@ -807,6 +807,20 @@ save to disk physical memory dump starting at @var{addr} of size @var{size}. ETEXI { +.name = "pmemaccess", +.args_type = "path:s", +.params = "file", +.help = "open A UNIX Socket access to physical memory at 'path'", s/A/a/ Awkward grammar; better might be: open a Unix socket at 'path' for use in accessing physical memory Please also document where the user can find the protocol that will be used across the side-channel socket thus opened. +++ b/memory-access.c @@ -0,0 +1,206 @@ +/* + * Access guest physical memory via a domain socket. + * + * Copyright (C) 2011 Sandia National Laboratories + * Original Author: Bryan D. Payne (bdpa...@acm.org) + * + * Refurbished for modern QEMU by Valerio Aimale (vale...@aimale.com), in 2015 + */ I would expect at least something under docs/ in addition to this file (the protocol spoken over the socket should be well-documented, and not just by reading the source code). Compare with docs/qmp-spec.txt. +struct request{ +uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved +uint64_t address; // address to read from OR write to +uint64_t length; // number of bytes to read OR write Any particular endianness constraints to worry about? That is a very interesting and insightful comment, that required some thinking. As I see it right now, the issue of endianness can be partitioned in 3 separate problems: 1) Endinanness concordance between libvmi and QEM host. As this patch uses a UNIX socket for inter-process communication, it implicitely assumes that libvmi and the QEMU host will run on the same machine, thus will have the same architecture, no need to endianness correction. If the patch were to use an inet socket, then hton and ntoh conversion would be required. It could be easily arranged by using this very useful header https://gist.github.com/panzi/6856583 , that provides architecture and platform independent implementation of htobe64() and be64toh() that are required to convert the two 64-bit members of the struct request. Of course there is the very interesting and intriguing scenario of somebody tunneling a UNIX socket from one host to another via inet socket with socat. However, as libvmi owns socket creation, that would be, not impossible , but not even that easy. 2) Endinanness concordance between QEM host and QEMU guest. As pmemaccess ju
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/19/15 1:52 AM, Markus Armbruster wrote: Valerio Aimale writes: On 10/16/15 2:15 AM, Markus Armbruster wrote: vale...@aimale.com writes: All- I've produced a patch for the current QEMU HEAD, for libvmi to introspect QEMU/KVM VMs. Libvmi has patches for the old qeum-kvm fork, inside its source tree: https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch This patch adds a hmp and a qmp command, "pmemaccess". When the commands is invoked with a string arguments (a filename), it will open a UNIX socket and spawn a listening thread. The client writes binary commands to the socket, in the form of a c structure: struct request { uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved uint64_t address; // address to read from OR write to uint64_t length;// number of bytes to read OR write }; The client receives as a response, either (length+1) bytes, if it is a read operation, or 1 byte ifit is a write operation. The last bytes of a read operation response indicates success (1 success, 0 failure). The single byte returned for a write operation indicates same (1 success, 0 failure). So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of garbage followed by the "it failed" byte? Markus, that appear to be the case. However, I did not write the communication protocol between libvmi and qemu. I'm assuming that the person that wrote the protocol, did not want to bother with over complicating things. https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c I'm thinking he assumed reads would be small in size and the price of reading garbage was less than the price of writing a more complicated protocol. I can see his point, confronted with the same problem, I might have done the same. All right, the interface is designed for *small* memory blocks then. Makes me wonder why he needs a separate binary protocol on a separate socket. Small blocks could be done just fine in QMP. The problem is speed. if one's analyzing the memory space of a running process (physical and paged), libvmi will make a large number of small and mid-sized reads. If one uses xp, or pmemsave, the overhead is quite significant. xp has overhead due to encoding, and pmemsave has overhead due to file open/write (server), file open/read/close/unlink (client). Others have gone through the problem before me. It appears that pmemsave and xp are significantly slower than reading memory using a socket via pmemaccess. The following data is not mine, but it shows the time, in milliseconds, required to resolve the content of a paged memory address via socket (pmemaccess) , pmemsave and xp http://cl.ly/image/322a3s0h1V05 Again, I did not produce those data points, they come from an old libvmi thread. I think it might be conceivable that there could be a QMP command that returns the content of an arbitrarily size memory region as a base64 or a base85 json string. It would still have both time- (due to encoding/decoding) and space- (base64 has 33% and ase85 would be 7%) overhead, + json encoding/decoding overhead. It might still be the case that socket would outperform such a command as well, speed-vise. I don't think it would be any faster than xp. There's also a similar patch, floating around the internet, the uses shared memory, instead of sockets, as inter-process communication between libvmi and QEMU. I've never used that. The socket API was written by the libvmi author and it works the with current libvmi version. The libvmi client-side implementation is at: https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c As many use kvm VM's for introspection, malware and security analysis, it might be worth thinking about making the pmemaccess a permanent hmp/qmp command, as opposed to having to produce a patch at each QEMU point release. Related existing commands: memsave, pmemsave, dump-guest-memory. Can you explain why these won't do for your use case? For people who do security analysis there are two use cases, static and dynamic analysis. With memsave, pmemsave and dum-guest-memory one can do static analysis. I.e. snapshotting a VM and see what was happening at that point in time. Dynamic analysis require to be able to 'introspect' a VM while it's running. If you take a snapshot of two people exchanging a glass of water, and you happen to take it at the very moment both persons have their hands on the glass, it's hard to tell who passed the glass to whom. If you have a movie of the same scene, it's obvious who's the giver and who's the receiver. Same use case. I understand the need for introspecting a running guest. What exactly makes the existing commands unsuitable for that? Speed. See discussion above. More to the point, there's a host of C and python frameworks to dynamically analyze VMs: volatility, r
Re: [Qemu-devel] QEMU patch to allow VM introspection via libvmi
On 10/16/15 2:15 AM, Markus Armbruster wrote: vale...@aimale.com writes: All- I've produced a patch for the current QEMU HEAD, for libvmi to introspect QEMU/KVM VMs. Libvmi has patches for the old qeum-kvm fork, inside its source tree: https://github.com/libvmi/libvmi/tree/master/tools/qemu-kvm-patch This patch adds a hmp and a qmp command, "pmemaccess". When the commands is invoked with a string arguments (a filename), it will open a UNIX socket and spawn a listening thread. The client writes binary commands to the socket, in the form of a c structure: struct request { uint8_t type; // 0 quit, 1 read, 2 write, ... rest reserved uint64_t address; // address to read from OR write to uint64_t length;// number of bytes to read OR write }; The client receives as a response, either (length+1) bytes, if it is a read operation, or 1 byte ifit is a write operation. The last bytes of a read operation response indicates success (1 success, 0 failure). The single byte returned for a write operation indicates same (1 success, 0 failure). So, if you ask to read 1 MiB, and it fails, you get back 1 MiB of garbage followed by the "it failed" byte? Markus, that appear to be the case. However, I did not write the communication protocol between libvmi and qemu. I'm assuming that the person that wrote the protocol, did not want to bother with over complicating things. https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c I'm thinking he assumed reads would be small in size and the price of reading garbage was less than the price of writing a more complicated protocol. I can see his point, confronted with the same problem, I might have done the same. The socket API was written by the libvmi author and it works the with current libvmi version. The libvmi client-side implementation is at: https://github.com/libvmi/libvmi/blob/master/libvmi/driver/kvm/kvm.c As many use kvm VM's for introspection, malware and security analysis, it might be worth thinking about making the pmemaccess a permanent hmp/qmp command, as opposed to having to produce a patch at each QEMU point release. Related existing commands: memsave, pmemsave, dump-guest-memory. Can you explain why these won't do for your use case? For people who do security analysis there are two use cases, static and dynamic analysis. With memsave, pmemsave and dum-guest-memory one can do static analysis. I.e. snapshotting a VM and see what was happening at that point in time. Dynamic analysis require to be able to 'introspect' a VM while it's running. If you take a snapshot of two people exchanging a glass of water, and you happen to take it at the very moment both persons have their hands on the glass, it's hard to tell who passed the glass to whom. If you have a movie of the same scene, it's obvious who's the giver and who's the receiver. Same use case. More to the point, there's a host of C and python frameworks to dynamically analyze VMs: volatility, rekal, "drakvuf", etc. They all build on top of libvmi. I did not want to reinvent the wheel. Mind you, 99.9% of people that do dynamic VM analysis use xen. They contend that xen has better introspection support. In my case, I did not want to bother with dedicating a full server to be a xen domain 0. I just wanted to do a quick test by standing up a QEMU/kvm VM, in an otherwise purposed server. Also, the pmemsave commands QAPI should be changed to be usable with 64bit VM's in qapi-schema.json from --- { 'command': 'pmemsave', 'data': {'val': 'int', 'size': 'int', 'filename': 'str'} } --- to --- { 'command': 'pmemsave', 'data': {'val': 'int64', 'size': 'int64', 'filename': 'str'} } --- In the QAPI schema, 'int' is actually an alias for 'int64'. Yes, that's confusing. I think it's confusing for the HMP parser too. If you have a VM with 8Gb of RAM and want to snapshot the whole physical memory, via HMP over telnet this is what happens: $ telnet localhost 1234 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. QEMU 2.4.0.1 monitor - type 'help' for more information (qemu) help pmemsave pmemsave addr size file -- save to disk physical memory dump starting at 'addr' of size 'size' (qemu) pmemsave 0 8589934591 "/tmp/memorydump" 'pmemsave' has failed: integer is for 32-bit values Try "help pmemsave" for more information (qemu) quit With the changes I suggested, the command succeeds $ telnet localhost 1234 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. QEMU 2.4.0.1 monitor - type 'help' for more information (qemu) help pmemsave pmemsave addr size file -- save to disk physical memory dump starting at 'addr' of size 'size' (qemu) pmemsave 0 8589934591 "/tmp/memorydump" (qemu) quit However I just noticed that the dump is just about 4GB in size, so there might be more changes needed to snapshot all physical memory of a 64 but VM. I did not investigate any further. ls -l /tmp/memorydum