[Qemu-devel] Current State of Block Filter
The way I see block filter currently implemented is as a special block device with `is_filter` set to true. Is this a correct characterization of the current incarnation? If so, I was wondering if it is possible to insert a block filter layer on top of an existing block device once QEMU is executing (via QMP commands)? It seems possible to add block filter managed block devices, but I don't think I see a way of adding a block filter to an existing block device. -- Wolf
Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
On Fri, Oct 11, 2013 at 11:38 AM, Eric Blake ebl...@redhat.com wrote: On 10/11/2013 09:18 AM, Wolfgang Richter wrote: Idea: Introduce a mode for drive-backup that duplicates writes to another target, not CoW. It is useful for introspecting (my use case), and for keeping a remote block device in sync with writes (helps with migration or backup). This is based off of v1.6.0 code. Best to rebase it against latest qemu.git. Done. +++ b/qapi-schema.json @@ -1311,12 +1311,14 @@ # # @full: copies data from all images to the destination # -# @none: only copy data written from now on +# @none: only copy on write data written from now on +# +# @stream: copy every new write to target Add the designation '(since 1.7)' to make it obvious when this mode was introduced. Done. Is it better to place the updated patch in this thread or start a new one? # # Since: 1.3 ## { 'enum': 'MirrorSyncMode', - 'data': ['top', 'full', 'none'] } + 'data': ['top', 'full', 'none', 'stream'] } MirrorSyncMode is used by multiple commands; your summary mentions how it would affect 'drive-backup', but what happens to 'drive-mirror'? For that matter, why isn't 'drive-mirror' with mode 'none' doing what you already want? Okay, I think my impression might be wrong, but I thought 'drive-mirror' would become deprecated with the new 'drive-backup' command and code. If we look at what they do (current documentation and code), 'drive-backup' AFAIK behaves the same for all modes of 'drive-mirror' _except_ mode 'none' with _better_ consistency guarantees. That is, 'drive-backup' clearly provides a point-in-time snapshot, whereas 'drive-mirror' may create a point-in-time snapshot, but it can not guarantee that. In addition, 'drive-backup's code is cleaner, simpler, and easier to work with (in my opinion) than 'drive-mirror's code. This is because of the new hooks in block.c for tracked requests etc. so that the job can insert code to be run on every write in a clean manner (I think). I think that it would be less confusing to subsume 'drive-mirror' into 'drive-backup' so that we have a single command with clear consistency guarantees, and also it would prevent overloading (and more confusion) with the meaning of the 'MirrorSyncMode's. Perhaps a better naming scheme for the modes would then be: full - as before (same for both commands AFAIK) top - as before (same for both commands AFAIK) none - if we only have drive-backup, rename this to 'overlay' as it creates a low-overhead CoW overlay point-in-time snapshot stream - either keep my name 'stream' to do what 'none' does for drive-mirror, or leave this as the 'none' mode with the same drive-mirror semantics Thus, I think, with a single extra mode, drive-backup can subsume drive-mirror. This reduces the number of commands, the documentation, and the code (all duplicating each other in some manner). -- Wolf
Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
On Sat, Oct 12, 2013 at 1:47 AM, Fam Zheng f...@redhat.com wrote: While mirroring write is a good idea, doing it with drive-backup is probably not. The function of this command is to 'backup' the image with existing data, instead of new data. With your 'stream' mode, this semantic is changed. I'm not so sure. I think that it would be better to switch between semantics with 'modes' rather than 'commands' to reduce documentation, duplicate code, and burden on users to remember different commands. Thus, many of the _modes_ of 'drive-backup' might provide you with point-in-time snapshots of a block device, but some of them might just mirror writes for backup purposes other than a point-in-time snapshot. IMO this feature is best implemented as a block filter, which is currently being discussed and not ready yet. A second option may be doing with another command (e.g. block-mirror, or a new one?) That may be true, as I haven't followed block filters very closely yet, but it seemed simple enough with the nice drive-backup code to easily implement. Perhaps this 'mode' could be refactored in the future to use block filters. -- Wolf
[Qemu-devel] [RFC PATCH] drive-backup 'stream' mode
Idea: Introduce a mode for drive-backup that duplicates writes to another target, not CoW. It is useful for introspecting (my use case), and for keeping a remote block device in sync with writes (helps with migration or backup). Issue with current modes: All of the current modes are well-designed to support point-in-time snapshots, but none of them handle keeping another drive up-to-date as new writes continuously occur. The 'None' mode documentation is a bit ambiguous in this regard, but what it actually implements is a very low overhead CoW snapshot. Patch: Fixes ambiguity in the 'None' mode documentation, introduces a new mode 'stream' which duplicates writes without reading any data from the original disk. I put the logic for copying the write into a new coroutine called 'backup_do_stream' as it needs almost nothing from the original 'backup_do_cow' function (no bit map, no reads from a block device, etc.). The other major change is that tracked requests also contain a handle to the QIOV involved in the write (and it is passed along). This is based off of v1.6.0 code. diff --git a/block.c b/block.c index 01b66d8..159f825 100644 --- a/block.c +++ b/block.c @@ -1872,12 +1872,14 @@ static void tracked_request_end(BdrvTrackedRequest *req) static void tracked_request_begin(BdrvTrackedRequest *req, BlockDriverState *bs, int64_t sector_num, - int nb_sectors, bool is_write) + int nb_sectors, bool is_write, + QEMUIOVector *qiov) { *req = (BdrvTrackedRequest){ .bs = bs, .sector_num = sector_num, .nb_sectors = nb_sectors, +.qiov = qiov, .is_write = is_write, .co = qemu_coroutine_self(), }; @@ -2528,7 +2530,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, wait_for_overlapping_requests(bs, sector_num, nb_sectors); } -tracked_request_begin(req, bs, sector_num, nb_sectors, false); +tracked_request_begin(req, bs, sector_num, nb_sectors, false, NULL); if (flags BDRV_REQ_COPY_ON_READ) { int pnum; @@ -2634,7 +2636,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, wait_for_overlapping_requests(bs, sector_num, nb_sectors); } -tracked_request_begin(req, bs, sector_num, nb_sectors, true); +tracked_request_begin(req, bs, sector_num, nb_sectors, true, qiov); ret = notifier_with_return_list_notify(bs-before_write_notifiers, req); diff --git a/block/backup.c b/block/backup.c index 6ae8a05..686a53f 100644 --- a/block/backup.c +++ b/block/backup.c @@ -84,6 +84,37 @@ static void cow_request_end(CowRequest *req) qemu_co_queue_restart_all(req-wait_queue); } +static int coroutine_fn backup_do_stream(BlockDriverState *bs, + int64_t sector_num, int nb_sectors, + QEMUIOVector *qiov) +{ +BackupBlockJob *job = (BackupBlockJob *)bs-job; +CowRequest cow_request; +int ret = 0; +int64_t start = sector_num, end = sector_num + nb_sectors; + +qemu_co_rwlock_rdlock(job-flush_rwlock); + +wait_for_overlapping_requests(job, start, end); +cow_request_begin(cow_request, job, start, end); + +ret = bdrv_co_writev(job-target, + sector_num, nb_sectors, + qiov); + +/* Publish progress, guest I/O counts as progress too. Note that the + * offset field is an opaque progress value, it is not a disk offset. + */ +job-sectors_read += sector_num; +job-common.offset += sector_num * BDRV_SECTOR_SIZE; + +cow_request_end(cow_request); + +qemu_co_rwlock_unlock(job-flush_rwlock); + +return ret; +} + static int coroutine_fn backup_do_cow(BlockDriverState *bs, int64_t sector_num, int nb_sectors, bool *error_is_read) @@ -181,7 +212,12 @@ static int coroutine_fn backup_before_write_notify( { BdrvTrackedRequest *req = opaque; -return backup_do_cow(req-bs, req-sector_num, req-nb_sectors, NULL); +if (MIRROR_SYNC_MODE_STREAM) { +return backup_do_stream(req-bs, req-sector_num, req-nb_sectors, +req-qiov); +} else { +return backup_do_cow(req-bs, req-sector_num, req-nb_sectors, NULL); +} } static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp) @@ -248,7 +284,8 @@ static void coroutine_fn backup_run(void *opaque) bdrv_add_before_write_notifier(bs, before_write); -if (job-sync_mode == MIRROR_SYNC_MODE_NONE) { +if (job-sync_mode == MIRROR_SYNC_MODE_NONE || +job-sync_mode == MIRROR_SYNC_MODE_STREAM) { while (!block_job_is_cancelled(job-common)) { /* Yield until the job is cancelled. We just let our before_write
Re: [Qemu-devel] drive-backup locks VM if target has issues?
On Mon, Sep 30, 2013 at 3:41 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 30/09/2013 00:46, Wolfgang Richter ha scritto: All writes to the drive-backup source have to first copy the pre-write data to the target. Thus, drive-backup usually works best if you are using werror=stop on the source. That said, I would have expected the job to be cancelled instead. Looks like there are bugs in the handling of on_target_error. Yes, that makes sense and was what I thought as well: it should have been canceled or ended in some bad state. Instead my VM saw drive write errors and remounted root read-only. Not an issue for real work for me, just meant my benchmark couldn't run. My overall goal is to drop the extra write traffic as early as possible to measure overhead of the drive-backup command in a few different scenarios, thus I was hoping /dev/null would help here. I think you need a null backend instead that drops writes at the QEMU level. Perhaps /dev/zero helps too. Yeah, /dev/zero has the same issue. I could make a null backend, or just make my NBD server drop all the writes. There will be extra overhead from TCP, but it'll be good enough for me to measure (NBD is what I am using as a target eventually anyways). -- Wolf
[Qemu-devel] drive-backup locks VM if target has issues?
I wanted to explore overhead with the new drive-backup command and I noticed if I set the target to something like '/dev/null' the guest VM starts having IO errors and loses write access to its root file system. Here is the qmp-shell command I'm using: drive-backup sync=none device=virtio0 target=/dev/null format=raw mode=existing I have a guest running with a single virtio root disk (ext4, Ubuntu guest). After that command, the guest sees write errors to its root block device (virtio0). I didn't trace syscalls or dig deeper yet, but was wondering if you had an idea on why '/dev/null' as a target in a block job would cause the origin device to lockup/fail? My overall goal is to drop the extra write traffic as early as possible to measure overhead of the drive-backup command in a few different scenarios, thus I was hoping /dev/null would help here. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 15, 2013 at 7:54 AM, Paolo Bonzini pbonz...@redhat.com wrote: But does this really cover all use cases a real synchronous active mirror would provide? I understood that Wolf wants to get every single guest request exposed e.g. on an NBD connection. He can use throttling to limit the guest's I/O speed to the size of the asynchronous mirror's buffer. Throttling is fine for me, and actually what I do today (this is the highest source of overhead for a system that wants to see everything), just with the tracing framework. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Thu, May 16, 2013 at 9:44 AM, Richard W.M. Jones rjo...@redhat.comwrote: Ideally I'd like to issue some QMP commands which would set up the point-in-time snapshot, and then connect to this snapshot over (eg) NBD, then when I'm done, send some more QMP commands to tear down the snapshot. This is actually interesting. Does the QEMU nbd server support multiple readers? Essentially, if you're RWMJ (not me), and you're keeping a full mirror, it's clear that the mirror write stream goes to an nbd server, but is it possible to attach a reader to that same nbd server and read things back (read-only)? I know it's possible to name the volumes you attach to, so I think conceptually with the nbd protocol this should work. I think this document would be better with one or more examples showing how this would be used. I think the thread now has me looking at making the mirror command 'active' :-) rather than have a new QMP command. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 12:11 PM, Paolo Bonzini pbonz...@redhat.com wrote: Essentially, if you're RWMJ (not me), and you're keeping a full mirror, it's clear that the mirror write stream goes to an nbd server, but is it possible to attach a reader to that same nbd server and read things back (read-only)? Yes, it can be done with both qemu-nbd and the QEMU nbd server commands. Then this means, if there was an active mirror (or snapshot being created), it would be easy to attach an nbd client as a reader to it even as it is being synchronized (perhaps dangerous?). -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones rjo...@redhat.comwrote: Run up to two extra guestfish instances, with the same result. The fourth guestfish instance hangs at the 'run' command until one of the first three is told to exit. And your interested on being notified when a snapshot is safe to read from? Or is it valuable to try reading immediately? -- Wolf
[Qemu-devel] June 3rd Workshop in Pittsburgh, PA, USA
I am in charge of a workshop happening at CMU with 21 guests currently registered. It will be on using QEMU/KVM, coding inside those codebases, using libvirt, and possibly OpenStack. We will have several talks during the day on how people have used QEMU + KVM in their own research, tips and tricks, best practices they've come across, and any stumbling blocks encountered. At the end of the workshop we will have tutorial sessions on just using QEMU/KVM (possibly in conjunction with libvirt) and also benchmarking with these systems etc. If you're in the Pittsburgh area, and would like to attend, please feel free to contact me. Breakfast and lunch would be included, and currently registration is free. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 3:26 PM, Richard W.M. Jones rjo...@redhat.comwrote: On Wed, May 22, 2013 at 02:32:37PM -0400, Wolfgang Richter wrote: On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones rjo...@redhat.com wrote: Run up to two extra guestfish instances, with the same result. The fourth guestfish instance hangs at the 'run' command until one of the first three is told to exit. And your interested on being notified when a snapshot is safe to read from? Or is it valuable to try reading immediately? I'm not sure I understand the question. I assumed (maybe wrongly) that if we had an NBD address (ie. Unix socket or IP:port) then we'd just connect to that and go. I meant if there was interest in reading from a disk that isn't fully synchronized (yet) to the original disk (it might have old blocks). Or would you only want to connect once a (complete) snapshot is available (synchronized completely to some point-in. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Tue, May 14, 2013 at 4:40 AM, Stefan Hajnoczi stefa...@redhat.comwrote: QEMU is accumulating many different approaches to snapshots and mirroring. They all have their pros and cons so it's not possible to support only one approach for all use cases. The suggested approach is writing a BlockDriver which mirrors I/O to two BlockDriverStates. There has been discussion around breaking BlockDriver into smaller interfaces, including a BlockFilter for intercepting I/O, but this has not been implemented. blkverify is an example of a BlockDriver that manages two child BlockDriverStates and may be a good starting point. BlockFilter sounds interesting. The main reason I proposed 'block-trace' is because that is almost identical to what I currently have implemented with the tracing framework---I just didn't have a nice QMP command. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Tue, May 14, 2013 at 4:50 AM, Kevin Wolf kw...@redhat.com wrote: Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. Yes! Active mirroring is precisely what is needed to implement block-level introspection. You're leaving out the most interesting section: How should block-trace be implemented? Noted, although maybe folding it into 'drive-mirror' as an 'active' option might be best, now that Paolo has spoken up. The other question is how to implement it internally. I don't think adding specific code for each new block job into bdrv_co_do_writev() is acceptable. We really need a generic way to intercept I/O operations. The keyword from earlier discussions is block filters. Essentially the idea is that the block job temporarily adds a BlockDriverState on top of the format driver and becomes able to implement all callbacks it likes to intercept. The bad news is that the infrastructure isn't there yet to actually make this happen in a sane way. Yeah, I'd also really love block filters and probably would have originally used them instead of the tracing subsystem originally if they existed. It would make implementing all kinds of 'block-level' features much, much easier. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Tue, May 14, 2013 at 6:04 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 14/05/2013 10:50, Kevin Wolf ha scritto: Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. Right, I'm waiting for Stefan's block-backup to give me the right hooks for the active mirror. The bulk phase will always be passive, but an active-asynchronous mirror has some interesting properties and it makes sense to implement it. Do you mean you'd model the 'active' mode after 'block-backup,' or actually call functions provided by 'block-backup'? If I knew more about what you had in mind, I wouldn't mind trying to add this 'active' mode to 'drive-mirror' and test it with my use case. I want to avoid duplicate work, so if you want to implement it yourself I can defer this. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Tue, May 14, 2013 at 12:45 PM, Paolo Bonzini pbonz...@redhat.com wrote: No, I'll just reuse the same hooks within block/mirror.c (almost... it looks like I need after_write too, not just before_write :( that's a pity). Basically: 1) before the write, if there is space in the job's buffers, allocate a MirrorOp and a data buffer for the write. Also record whether the block was dirty before; 2) after the write, do nothing if there was no room to allocate the data buffer. Else clear the block from the dirty bitmap. If the block was dirty, read the whole cluster from the source as in passive mirroring. If it wasn't, copy the data from guest memory to the preallocated buffer and write it to the destination; If I knew more about what you had in mind, I wouldn't mind trying to add this 'active' mode to 'drive-mirror' and test it with my use case. I want to avoid duplicate work, so if you want to implement it yourself I can defer this. Also the other way round. If you want to give it a shot based on the above spec just tell me. Talked with my group here as well. I think I'd like to give it a shot based on the above spec rather than refactor my code into a new command. This way it will hopefully reduce duplicated efforts, and provide extra testing for the active mirroring code. I'll take a pass through the mirror code to make sure I understand it better than I currently do. Would you like to coordinate off-list until we have a patch? -- Wolf
[Qemu-devel] drive-mirror sync points
Paolo/anyone who knows - Are drive-mirror sync points (NBD flush commands) reflecting guest write barriers? Are guest write barriers respected by drive-mirror? If so, that would make drive-mirror much more palatable for disk introspection work (a drop-in usable feature of QEMU!). -- Wolf
[Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
I'm working on a new patch series which will add a new QMP command, block-trace, which turns on tracing of writes for a specified block device and sends the stream unmodified to another block device. The 'trace' is meant to be precise meaning that writes are not lost, which differentiates this command from others. It can be turned on and off depending on when it is needed. How is this different from block-backup or drive-mirror? block-backup is designed to create point-in-time snapshots and not clone the entire write stream of a VM to a particular device. It implements copy-on-write to create a snapshot. Thus whenever a write occurs, block-backup is designed to send the original data and not the contents of the new write. drive-mirror is designed to mirror a disk to another location. It operates by periodically scanning a dirty bitmap and cloning blocks when dirtied. This is efficient as it allows for batching of writes, but it does not maintain the order in which guest writes occurred and it can miss intermediate writes when they go to the same location on disk. How can block-trace be used? (1) Disk introspection - systems which analyze the writes going to a disk for introspection require a perfect clone of the write stream to an original disk to stay in-sync with updates to guest file systems. (2) Replicated block device - two block devices could be maintained as exact copies of each other up to a point in the disk write stream that has successfully been written to the destination block device. -- Wolf
Re: [Qemu-devel] drive-mirror sync points
On May 13, 2013, at 5:46 PM, Richard W.M. Jones rjo...@redhat.com wrote: On Mon, May 13, 2013 at 01:50:00PM -0400, Wolfgang Richter wrote: Paolo/anyone who knows - Are drive-mirror sync points (NBD flush commands) reflecting guest write barriers? Are guest write barriers respected by drive-mirror? If so, that would make drive-mirror much more palatable for disk introspection work (a drop-in usable feature of QEMU!). I'm also interested in this question. Further extensions to this (*not* drive-mirror on its own AIUI) which stefanha is working on should allow libguestfs to perform point-in-time snapshots of images, which will mean that we can do complex and long-running inspection operations on live guests. And I'm trying to do complex, long-running inspection on live guests without needing point-in-time snapshots :-)
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
On Wed, Apr 24, 2013 at 4:37 AM, Stefan Hajnoczi stefa...@gmail.com wrote: Has there been any performance analysis of drive-mirror (impact on executing guest)? It slows down guest I/O for a couple of reasons: 1. Writes now require a read from the original device followed by a write to the target device. Only after this completes is the write allowed to proceed. 2. Overlapping read/write requests are serialized to maintain consistency between the guests I/Os and the block-backup I/Os. Makes sense, #2 is what I want/need (I don't care about the original data). But on second thought, I don't think block-backup fits the bill. You don't care about the original data, you care about what new data the guest is writing. Precisely. I crawl and index original data before we start getting the live stream of new data/writes. I think what you really want is a tap block driver which mirrors writes to a target device (typically a NBD volume). You can model this on blkverify or check out Benoit Canet's quorum patches. Something like this, or live replication via drive-mirror which implements #2. -- Wolf
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
On Wed, Apr 24, 2013 at 4:39 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, Apr 23, 2013 at 03:11:26PM -0400, Wolfgang Richter wrote: On Tue, Apr 23, 2013 at 2:31 PM, Wolfgang Richter w...@cs.cmu.edu wrote: On Tue, Apr 23, 2013 at 2:21 PM, Stefan Hajnoczi stefa...@gmail.com wrote: Eric's suggestion to use NBD makes sense to me. The block-backup code can be extended fairly easier using sync mode=none (do not perform a background copy of the entire disk) and by disabling the bitmap (essentially tap mode). Also, as another thought, I think I can actually use the bitmap to implement an optimization. In my code, I already use a bitmap to determine which sectors I want to introspect (ignoring portions of the disk greatly reduces required bandwidth and overhead; swap space for example isn't generally interesting unless you can interpret memory as well). So I think I can adapt my code here as well. Cool. By the way, do you actually care about the data being written or just which sectors were touched? Excellent question, my example wasn't clear. I do want the data _especially_ for sectors containing file system metadata because I interpret metadata (for NTFS and ext4 currently) to figure out new sectors associated with a file or file creations and file deletions. But, if there is a system that is write-heavy, I'm OK with dropping data writes to regular files (not file system metadata). In that case, people interested in say monitoring web server logs would lose the data stream from their log files, but the introspection system as a whole maintains its view of the file system space. -- Wolf
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
On Wed, Apr 24, 2013 at 5:24 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 24/04/2013 10:37, Stefan Hajnoczi ha scritto: Has there been any performance analysis of drive-mirror (impact on executing guest)? What Stefan wrote is about block-backup. drive-mirror has a limited impact on guest performance, but it doesn't pass the writes through to the channel. Instead, it uses a dirty bitmap that it periodically scans to copy new data to the destination. This was my take on drive-mirror from reading the wiki. I was excited about the 'live replication' functionality. It slows down guest I/O for a couple of reasons: 1. Writes now require a read from the original device followed by a write to the target device. Only after this completes is the write allowed to proceed. 2. Overlapping read/write requests are serialized to maintain consistency between the guests I/Os and the block-backup I/Os. But on second thought, I don't think block-backup fits the bill. You don't care about the original data, you care about what new data the guest is writing. Right. However, when block-backup gets in, I will try to change drive-mirror to use an active method. I don't have a timeframe for this, though. This sounds more ideal for what I want (a more 'active' drive mirror). -- Wolf
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
On Wed, Apr 24, 2013 at 5:26 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 23/04/2013 20:31, Wolfgang Richter ha scritto: On Tue, Apr 23, 2013 at 2:21 PM, Stefan Hajnoczi stefa...@gmail.com mailto:stefa...@gmail.com wrote: The tracing subsystem is geared towards tracepoint instrumentation rather than binary dumps. Can you share some specific applications? Well, my main application is in exposing a cloud-inotify service by interpreting sector writes in real-time and publishing the updates as file system manipulations. By using introspection we don't need agents running inside the guest. Example: guest writes to sector 5786907; I reverse-map that sector and notice it belongs to '/etc/passwd' within that guest; I immediately emit a message (currently using Redis pub-sub functionality) to any interested subscribers that '/etc/passwd' changed within this guest running on a certain host within the datacenter. If you are okay with writes being bundled and you are able to handle reordered writes within a small timeframe (usually 0.1-1s), then you can use drive-mirror with an NBD destination. In the purest form, not to miss updates I'm not OK with it. But, I think that introspection can still _mostly_ work given these relaxed constraints. Reordered writes can be difficult to stomach though: imagine that a file inode update goes through before its data writes. Imagine that the inode update simply extends the file size, with the last data block write coming soon after. We might incorrectly report bytes (and their contents) as belonging to this file before we see the final data block write if the data block is currently cached. -- Wolf
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
On Wed, Apr 24, 2013 at 12:15 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 24/04/2013 18:12, Wolfgang Richter ha scritto: In the purest form, not to miss updates I'm not OK with it. But, I think that introspection can still _mostly_ work given these relaxed constraints. Reordered writes can be difficult to stomach though: imagine that a file inode update goes through before its data writes. Imagine that the inode update simply extends the file size, with the last data block write coming soon after. We might incorrectly report bytes (and their contents) as belonging to this file before we see the final data block write if the data block is currently cached. Yes, it's difficult. In case it helps, sync points are marked by a flush command in the NBD protocol. At this point, the disk image is guaranteed to match the source. You can make the SLICE_TIME shorter in block/mirror.c to ensure that writes are more promptly replicated to the destination, but in general it is not a problem. QEMU can sync 10 times a second or more (with a worst-case of 1-1.5 seconds) during a kernel compile (don't remember the details, but something like make -j8). Yes, I was thinking as a stop-gap solution of just using this short term until something with stronger guarantees could be put in place. I think it's coming down to deciding between: (1) New device like 'blkverify' that doesn't actual verify, but just clone operations (2) Creating an active version of drive-mirror with the stronger guarantees (presumably turned on with an option). -- Wolf
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
On Wed, Apr 24, 2013 at 1:37 AM, Stefan Hajnoczi stefa...@gmail.com wrote: I think what you really want is a tap block driver which mirrors writes to a target device (typically a NBD volume). You can model this on blkverify or check out Benoit Canet's quorum patches. Stefan An interesting thought, what we're basically talking about now is a RAID 1 block device exposed by QEMU (no OS support needed). I think (?) it could have wider applicability than just introspection, and it could someday be extended to other forms of RAID. I think I'll implement such a block device, unsure of what to call it (blkraid ?).
[Qemu-devel] Adding Disk-Level Introspection to QEMU
I'm interested in adding introspection of disk writes to QEMU for various applications and research potential. What I mean by introspection of disk writes is that, when enabled, each write passing through QEMU to backing storage would also be copied to an introspection channel for further analysis. I currently have an implementation piggy-backing on the tracing subsystem, but adding binary trace events breaks various assumptions about that subsystem (for example, the stderr backend would no longer be readable when tracing disk writes). I'd really like to someday have introspection in the QEMU mainline, and thus I'm wondering: (1) Should the tracing subsystem be extended to include binary events? or (2) Should a separate introspection subsystem be implemented? I suppose we should keep in mind that introspection could include memory, network, etc. if others wanted that in the future (although I am not working on that). -- Wolf
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
On Tue, Apr 23, 2013 at 2:21 PM, Stefan Hajnoczi stefa...@gmail.com wrote: The tracing subsystem is geared towards tracepoint instrumentation rather than binary dumps. Can you share some specific applications? Well, my main application is in exposing a cloud-inotify service by interpreting sector writes in real-time and publishing the updates as file system manipulations. By using introspection we don't need agents running inside the guest. Example: guest writes to sector 5786907; I reverse-map that sector and notice it belongs to '/etc/passwd' within that guest; I immediately emit a message (currently using Redis pub-sub functionality) to any interested subscribers that '/etc/passwd' changed within this guest running on a certain host within the datacenter. Other applications of VMI that I've seen are usually security-related: detecting rootkits invisible to the guest etc., because once the guest is compromised agents running inside it can not be trusted. Eric's suggestion to use NBD makes sense to me. The block-backup code can be extended fairly easier using sync mode=none (do not perform a background copy of the entire disk) and by disabling the bitmap (essentially tap mode). This makes a lot of sense to me as well. I'm glad there's a built-in mode not to copy the whole disk. I suppose I will have to customize the patch to disable the bitmap? Is there any chance we could also expose that as an option to users? As in, let them decide the granularity of their snapshots/policies regarding snapshots in a streaming mode? -- Wolf
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
On Tue, Apr 23, 2013 at 2:31 PM, Wolfgang Richter w...@cs.cmu.edu wrote: On Tue, Apr 23, 2013 at 2:21 PM, Stefan Hajnoczi stefa...@gmail.comwrote: Eric's suggestion to use NBD makes sense to me. The block-backup code can be extended fairly easier using sync mode=none (do not perform a background copy of the entire disk) and by disabling the bitmap (essentially tap mode). Also, as another thought, I think I can actually use the bitmap to implement an optimization. In my code, I already use a bitmap to determine which sectors I want to introspect (ignoring portions of the disk greatly reduces required bandwidth and overhead; swap space for example isn't generally interesting unless you can interpret memory as well). So I think I can adapt my code here as well.
Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU
-- Wolf On Apr 23, 2013, at 1:22 PM, Eric Blake ebl...@redhat.com wrote: On 04/23/2013 11:12 AM, Wolfgang Richter wrote: I'm interested in adding introspection of disk writes to QEMU for various applications and research potential. What I mean by introspection of disk writes is that, when enabled, each write passing through QEMU to backing storage would also be copied to an introspection channel for further analysis. Sounds like you would be benefited by the block-backup series, with an NBD server as the point where you inject your introspection. https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg04629.html The existing drive-mirror command can also target an NBD destination, with similar effects. Yes, OK as a new member to the list I saw the block-backup series and was starting to have similar thoughts. I'll port my code (analysis side) to work with it (or drive-mirror). Has there been any performance analysis of drive-mirror (impact on executing guest)? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org
Re: [Qemu-devel] Multiple NIC's With Redirected Ports
I am assuming the nics work with -user-net properties, with a simulated router/firewall DHCP server at 10.0.2.2. Is it possible to manually assign an IP (such as 10.0.2.5; is 10.0.2.3 still a nameserver?) and still have access to the internet? Wolfgang Richter wrote: Basically, what I want to accomplish is this. eth0 and eth1 are in bridging mode, with eth0 supposedly leading out to the internet, and eth1 supposedly connecting an internal network to the internet. eth2 connects to a third network, but that doesn't really matter too much. eth0 wants a few ports open and so does eth2. Is this possible at all with QEMU? So far I've had no luck...but will continue trying different configurations. -- Wolfgang Richter [EMAIL PROTECTED] wrote: I am trying to simulate three NIC's, with redirected ports from the host to my simulated system. I want port 22 to go to NIC 1, and port 443 to go to NIC 3. Is this possible? So far, I think only eth0 seems to be working on my guest OS, so maybe my -redir tcp:22::22 -redir tcp:443::443 are screwing up the multiple NIC's?? I have to redirect ports in order for the guest OS to have servers right (SSH, SSL web)? I am using QEMU 0.7.0. I just want to make sure my invocation of QEMU (under Windows XP) isn't screwing anything up: qemu.exe -L \Program Files\Qemu\bios -m 256 -hda C:\Program Files\Qemu\RooHoneynet.img -enable-audio -localtime -nics 3 -redir tcp:22::22 -redir tcp:443::443 -- Wolfgang Richter ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel -- Wolfgang Richter [EMAIL PROTECTED] signature.asc Description: OpenPGP digital signature ___ Qemu-devel mailing list Qemu-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/qemu-devel