[Qemu-devel] Current State of Block Filter

2014-07-15 Thread Wolfgang Richter
The way I see block filter currently implemented is as a special block
device
with `is_filter` set to true.

Is this a correct characterization of the current incarnation?

If so, I was wondering if it is possible to insert a block filter layer
on top
of an existing block device once QEMU is executing (via QMP commands)?

It seems possible to add block filter managed block devices, but I don't
think
I see a way of adding a block filter to an existing block device.

--
Wolf


Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-14 Thread Wolfgang Richter
On Fri, Oct 11, 2013 at 11:38 AM, Eric Blake ebl...@redhat.com wrote:
 On 10/11/2013 09:18 AM, Wolfgang Richter wrote:
 Idea: Introduce a mode for drive-backup that duplicates writes to
 another target, not CoW.  It is useful for introspecting (my use
 case), and for keeping a remote block device in sync with writes
 (helps with migration or backup).




 This is based off of v1.6.0 code.

 Best to rebase it against latest qemu.git.

Done.

 +++ b/qapi-schema.json
 @@ -1311,12 +1311,14 @@
  #
  # @full: copies data from all images to the destination
  #
 -# @none: only copy data written from now on
 +# @none: only copy on write data written from now on
 +#
 +# @stream: copy every new write to target

 Add the designation '(since 1.7)' to make it obvious when this mode was
 introduced.

Done.  Is it better to place the updated patch in this thread or start
a new one?


  #
  # Since: 1.3
  ##
  { 'enum': 'MirrorSyncMode',
 -  'data': ['top', 'full', 'none'] }
 +  'data': ['top', 'full', 'none', 'stream'] }

 MirrorSyncMode is used by multiple commands; your summary mentions how
 it would affect 'drive-backup', but what happens to 'drive-mirror'?  For
 that matter, why isn't 'drive-mirror' with mode 'none' doing what you
 already want?

Okay, I think my impression might be wrong, but I thought
'drive-mirror' would become deprecated with the new 'drive-backup'
command and code.

If we look at what they do (current documentation and code),
'drive-backup' AFAIK behaves the same for all modes of 'drive-mirror'
_except_ mode 'none' with _better_ consistency guarantees.  That is,
'drive-backup' clearly provides a point-in-time snapshot, whereas
'drive-mirror' may create a point-in-time snapshot, but it can not
guarantee that.

In addition, 'drive-backup's code is cleaner, simpler, and easier to
work with (in my opinion) than 'drive-mirror's code.  This is because
of the new hooks in block.c for tracked requests etc. so that the job
can insert code to be run on every write in a clean manner (I think).

I think that it would be less confusing to subsume 'drive-mirror' into
'drive-backup' so that we have a single command with clear consistency
guarantees, and also it would prevent overloading (and more confusion)
with the meaning of the 'MirrorSyncMode's.

Perhaps a better naming scheme for the modes would then be:

full - as before (same for both commands AFAIK)
top - as before (same for both commands AFAIK)
none - if we only have drive-backup, rename this to 'overlay' as it
creates a low-overhead CoW overlay point-in-time snapshot
stream - either keep my name 'stream' to do what 'none' does for
drive-mirror, or leave this as the 'none' mode with the same
drive-mirror semantics

Thus, I think, with a single extra mode, drive-backup can subsume
drive-mirror.  This reduces the number of commands, the documentation,
and the code (all duplicating each other in some manner).

-- 
Wolf



Re: [Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-14 Thread Wolfgang Richter
On Sat, Oct 12, 2013 at 1:47 AM, Fam Zheng f...@redhat.com wrote:
 While mirroring write is a good idea, doing it with drive-backup is probably
 not. The function of this command is to 'backup' the image with existing data,
 instead of new data. With your 'stream' mode, this semantic is changed.

I'm not so sure.  I think that it would be better to switch between
semantics with 'modes' rather than 'commands' to reduce documentation,
duplicate code, and burden on users to remember different commands.

Thus, many of the _modes_ of 'drive-backup' might provide you with
point-in-time snapshots of a block device, but some of them might just
mirror writes for backup purposes other than a point-in-time snapshot.

 IMO this feature is best implemented as a block filter, which is currently
 being discussed and not ready yet. A second option may be doing with another
 command (e.g. block-mirror, or a new one?)

That may be true, as I haven't followed block filters very closely
yet, but it seemed simple enough with the nice drive-backup code to
easily implement.

Perhaps this 'mode' could be refactored in the future to use block filters.

-- 
Wolf



[Qemu-devel] [RFC PATCH] drive-backup 'stream' mode

2013-10-11 Thread Wolfgang Richter
Idea: Introduce a mode for drive-backup that duplicates writes to
another target, not CoW.  It is useful for introspecting (my use
case), and for keeping a remote block device in sync with writes
(helps with migration or backup).



Issue with current modes:  All of the current modes are well-designed
to support point-in-time snapshots, but none of them handle keeping
another drive up-to-date as new writes continuously occur.  The 'None'
mode documentation is a bit ambiguous in this regard, but what it
actually implements is a very low overhead CoW snapshot.



Patch: Fixes ambiguity in the 'None' mode documentation, introduces a
new mode 'stream' which duplicates writes without reading any data
from the original disk.

I put the logic for copying the write into a new coroutine called
'backup_do_stream' as it needs almost nothing from the original
'backup_do_cow' function (no bit map, no reads from a block device,
etc.).  The other major change is that tracked requests also contain a
handle to the QIOV involved in the write (and it is passed along).

This is based off of v1.6.0 code.







diff --git a/block.c b/block.c
index 01b66d8..159f825 100644
--- a/block.c
+++ b/block.c
@@ -1872,12 +1872,14 @@ static void tracked_request_end(BdrvTrackedRequest *req)
 static void tracked_request_begin(BdrvTrackedRequest *req,
   BlockDriverState *bs,
   int64_t sector_num,
-  int nb_sectors, bool is_write)
+  int nb_sectors, bool is_write,
+  QEMUIOVector *qiov)
 {
 *req = (BdrvTrackedRequest){
 .bs = bs,
 .sector_num = sector_num,
 .nb_sectors = nb_sectors,
+.qiov = qiov,
 .is_write = is_write,
 .co = qemu_coroutine_self(),
 };
@@ -2528,7 +2530,7 @@ static int coroutine_fn
bdrv_co_do_readv(BlockDriverState *bs,
 wait_for_overlapping_requests(bs, sector_num, nb_sectors);
 }

-tracked_request_begin(req, bs, sector_num, nb_sectors, false);
+tracked_request_begin(req, bs, sector_num, nb_sectors, false, NULL);

 if (flags  BDRV_REQ_COPY_ON_READ) {
 int pnum;
@@ -2634,7 +2636,7 @@ static int coroutine_fn
bdrv_co_do_writev(BlockDriverState *bs,
 wait_for_overlapping_requests(bs, sector_num, nb_sectors);
 }

-tracked_request_begin(req, bs, sector_num, nb_sectors, true);
+tracked_request_begin(req, bs, sector_num, nb_sectors, true, qiov);

 ret = notifier_with_return_list_notify(bs-before_write_notifiers, req);

diff --git a/block/backup.c b/block/backup.c
index 6ae8a05..686a53f 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -84,6 +84,37 @@ static void cow_request_end(CowRequest *req)
 qemu_co_queue_restart_all(req-wait_queue);
 }

+static int coroutine_fn backup_do_stream(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors,
+ QEMUIOVector *qiov)
+{
+BackupBlockJob *job = (BackupBlockJob *)bs-job;
+CowRequest cow_request;
+int ret = 0;
+int64_t start = sector_num, end = sector_num + nb_sectors;
+
+qemu_co_rwlock_rdlock(job-flush_rwlock);
+
+wait_for_overlapping_requests(job, start, end);
+cow_request_begin(cow_request, job, start, end);
+
+ret = bdrv_co_writev(job-target,
+ sector_num, nb_sectors,
+ qiov);
+
+/* Publish progress, guest I/O counts as progress too.  Note that the
+ * offset field is an opaque progress value, it is not a disk offset.
+ */
+job-sectors_read += sector_num;
+job-common.offset += sector_num * BDRV_SECTOR_SIZE;
+
+cow_request_end(cow_request);
+
+qemu_co_rwlock_unlock(job-flush_rwlock);
+
+return ret;
+}
+
 static int coroutine_fn backup_do_cow(BlockDriverState *bs,
   int64_t sector_num, int nb_sectors,
   bool *error_is_read)
@@ -181,7 +212,12 @@ static int coroutine_fn backup_before_write_notify(
 {
 BdrvTrackedRequest *req = opaque;

-return backup_do_cow(req-bs, req-sector_num, req-nb_sectors, NULL);
+if (MIRROR_SYNC_MODE_STREAM) {
+return backup_do_stream(req-bs, req-sector_num, req-nb_sectors,
+req-qiov);
+} else {
+return backup_do_cow(req-bs, req-sector_num, req-nb_sectors, NULL);
+}
 }

 static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -248,7 +284,8 @@ static void coroutine_fn backup_run(void *opaque)

 bdrv_add_before_write_notifier(bs, before_write);

-if (job-sync_mode == MIRROR_SYNC_MODE_NONE) {
+if (job-sync_mode == MIRROR_SYNC_MODE_NONE ||
+job-sync_mode == MIRROR_SYNC_MODE_STREAM) {
 while (!block_job_is_cancelled(job-common)) {
 /* Yield until the job is cancelled.  We just let our before_write

Re: [Qemu-devel] drive-backup locks VM if target has issues?

2013-09-30 Thread Wolfgang Richter
On Mon, Sep 30, 2013 at 3:41 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 30/09/2013 00:46, Wolfgang Richter ha scritto:
 All writes to the drive-backup source have to first copy the pre-write
 data to the target.  Thus, drive-backup usually works best if you are
 using werror=stop on the source.  That said, I would have expected the
 job to be cancelled instead.  Looks like there are bugs in the handling
 of on_target_error.

Yes, that makes sense and was what I thought as well: it should have been
canceled or ended in some bad state.  Instead my VM saw drive write errors
and remounted root read-only.  Not an issue for real work for me, just meant
my benchmark couldn't run.

 My overall goal is to drop the extra write traffic as early as
 possible to measure overhead of the drive-backup command in a few
 different scenarios, thus I was hoping /dev/null would help here.

 I think you need a null backend instead that drops writes at the QEMU
 level.  Perhaps /dev/zero helps too.

Yeah, /dev/zero has the same issue.  I could make a null backend, or just make
my NBD server drop all the writes.  There will be extra overhead from TCP, but
it'll be good enough for me to measure (NBD is what I am using as a target
eventually anyways).

-- 
Wolf



[Qemu-devel] drive-backup locks VM if target has issues?

2013-09-29 Thread Wolfgang Richter
I wanted to explore overhead with the new drive-backup command and I
noticed if I set the target to something like '/dev/null' the guest VM
starts having IO errors and loses write access to its root file
system.  Here is the qmp-shell command I'm using:

 drive-backup sync=none device=virtio0 target=/dev/null format=raw 
 mode=existing

I have a guest running with a single virtio root disk (ext4, Ubuntu
guest).  After that command, the guest sees write errors to its root
block device (virtio0).

I didn't trace syscalls or dig deeper yet, but was wondering if you
had an idea on why '/dev/null' as a target in a block job would cause
the origin device to lockup/fail?

My overall goal is to drop the extra write traffic as early as
possible to measure overhead of the drive-backup command in a few
different scenarios, thus I was hoping /dev/null would help here.

-- 
Wolf



Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-22 Thread Wolfgang Richter
On Wed, May 15, 2013 at 7:54 AM, Paolo Bonzini pbonz...@redhat.com wrote:

  But does this really cover all use cases a real synchronous active
  mirror would provide? I understood that Wolf wants to get every single
  guest request exposed e.g. on an NBD connection.

 He can use throttling to limit the guest's I/O speed to the size of the
 asynchronous mirror's buffer.


Throttling is fine for me, and actually what I do today (this is the
highest source of
overhead for a system that wants to see everything), just with the tracing
framework.

-- 
Wolf


Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-22 Thread Wolfgang Richter
On Thu, May 16, 2013 at 9:44 AM, Richard W.M. Jones rjo...@redhat.comwrote:

 Ideally I'd like to issue some QMP commands which would set up the
 point-in-time snapshot, and then connect to this snapshot over (eg)
 NBD, then when I'm done, send some more QMP commands to tear down the

snapshot.


This is actually interesting.  Does the QEMU nbd server support multiple
readers?

Essentially, if you're RWMJ (not me), and you're keeping a full mirror,
it's clear that
the mirror write stream goes to an nbd server, but is it possible to attach
a reader
to that same nbd server and read things back (read-only)?  I know it's
possible to name
the volumes you attach to, so I think conceptually with the nbd protocol
this should work.

I think this document would be better with one or more examples
 showing how this would be used.


I think the thread now has me looking at making the mirror command 'active'
:-)
rather than have a new QMP command.

-- 
Wolf


Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-22 Thread Wolfgang Richter
On Wed, May 22, 2013 at 12:11 PM, Paolo Bonzini pbonz...@redhat.com wrote:

  Essentially, if you're RWMJ (not me), and you're keeping a full
  mirror, it's clear that the mirror write stream goes to an nbd server,
  but is it possible to attach a reader to that same nbd server and read
  things back (read-only)?

 Yes, it can be done with both qemu-nbd and the QEMU nbd server commands.


Then this means, if there was an active mirror (or snapshot being created),
it would
be easy to attach an nbd client as a reader to it even as it is being
synchronized
(perhaps dangerous?).

-- 
Wolf


Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-22 Thread Wolfgang Richter
On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones rjo...@redhat.comwrote:

 Run up to two extra guestfish instances, with the same result.  The
 fourth guestfish instance hangs at the 'run' command until one of the
 first three is told to exit.


And your interested on being notified when a snapshot is safe to read
from?
Or is it valuable to try reading immediately?

-- 
Wolf


[Qemu-devel] June 3rd Workshop in Pittsburgh, PA, USA

2013-05-22 Thread Wolfgang Richter
I am in charge of a workshop happening at CMU with
21 guests currently registered.

It will be on using QEMU/KVM, coding inside those codebases,
using libvirt, and possibly OpenStack.

We will have several talks during the day on how people have
used QEMU + KVM in their own research, tips and tricks, best
practices they've come across, and any stumbling blocks
encountered.

At the end of the workshop we will have tutorial sessions on just
using QEMU/KVM (possibly in conjunction with libvirt) and also
benchmarking with these systems etc.


If you're in the Pittsburgh area, and would like to attend, please
feel free to contact me.  Breakfast and lunch would be included,
and currently registration is free.

-- 
Wolf


Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-22 Thread Wolfgang Richter
On Wed, May 22, 2013 at 3:26 PM, Richard W.M. Jones rjo...@redhat.comwrote:

 On Wed, May 22, 2013 at 02:32:37PM -0400, Wolfgang Richter wrote:
  On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones rjo...@redhat.com
 wrote:
 
   Run up to two extra guestfish instances, with the same result.  The
   fourth guestfish instance hangs at the 'run' command until one of the
   first three is told to exit.
 
 
  And your interested on being notified when a snapshot is safe to read
  from?
  Or is it valuable to try reading immediately?

 I'm not sure I understand the question.

 I assumed (maybe wrongly) that if we had an NBD address (ie. Unix
 socket or IP:port) then we'd just connect to that and go.


I meant if there was interest in reading from a disk that isn't fully
synchronized
(yet) to the original disk (it might have old blocks).  Or would you only
want to
connect once a (complete) snapshot is available (synchronized completely to
some point-in.

-- 
Wolf


Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-14 Thread Wolfgang Richter
On Tue, May 14, 2013 at 4:40 AM, Stefan Hajnoczi stefa...@redhat.comwrote:

 QEMU is accumulating many different approaches to snapshots and
 mirroring.  They all have their pros and cons so it's not possible to
 support only one approach for all use cases.

 The suggested approach is writing a BlockDriver which mirrors I/O to two
 BlockDriverStates.  There has been discussion around breaking
 BlockDriver into smaller interfaces, including a BlockFilter for
 intercepting I/O, but this has not been implemented.  blkverify is an
 example of a BlockDriver that manages two child BlockDriverStates and
 may be a good starting point.


BlockFilter sounds interesting.  The main reason I proposed 'block-trace'
is because that is almost identical to what I currently have implemented
with the tracing framework---I just didn't have a nice QMP command.

-- 
Wolf


Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-14 Thread Wolfgang Richter
On Tue, May 14, 2013 at 4:50 AM, Kevin Wolf kw...@redhat.com wrote:

 Or, to translate it into our existing terminology, drive-mirror
 implements a passive mirror, you're proposing an active one (which we
 do want to have).

 With an active mirror, we'll want to have another choice: The mirror can
 be synchronous (guest writes only complete after the mirrored write has
 completed) or asynchronous (completion is based only on the original
 image). It should be easy enough to support both once an active mirror
 exists.


Yes! Active mirroring is precisely what is needed to implement block-level
introspection.


 You're leaving out the most interesting section: How should block-trace
 be implemented?


Noted, although maybe folding it into 'drive-mirror' as an 'active' option
might be best, now that Paolo has spoken up.


 The other question is how to implement it internally. I don't think
 adding specific code for each new block job into bdrv_co_do_writev() is
 acceptable. We really need a generic way to intercept I/O operations.
 The keyword from earlier discussions is block filters. Essentially the
 idea is that the block job temporarily adds a BlockDriverState on top of
 the format driver and becomes able to implement all callbacks it likes
 to intercept. The bad news is that the infrastructure isn't there yet
 to actually make this happen in a sane way.


Yeah, I'd also really love block filters and probably would have
originally used them instead of the tracing subsystem originally if they
existed.  It would make implementing all kinds of 'block-level' features
much, much easier.

-- 
Wolf


Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-14 Thread Wolfgang Richter
On Tue, May 14, 2013 at 6:04 AM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 14/05/2013 10:50, Kevin Wolf ha scritto:
  Or, to translate it into our existing terminology, drive-mirror
  implements a passive mirror, you're proposing an active one (which we
  do want to have).
 
  With an active mirror, we'll want to have another choice: The mirror can
  be synchronous (guest writes only complete after the mirrored write has
  completed) or asynchronous (completion is based only on the original
  image). It should be easy enough to support both once an active mirror
  exists.

 Right, I'm waiting for Stefan's block-backup to give me the right
 hooks for the active mirror.

 The bulk phase will always be passive, but an active-asynchronous mirror
 has some interesting properties and it makes sense to implement it.


Do you mean you'd model the 'active' mode after 'block-backup,' or actually
call functions provided by 'block-backup'?  If I knew more about what you
had in mind, I wouldn't mind trying to add this 'active' mode to
'drive-mirror'
and test it with my use case.  I want to avoid duplicate work, so if you
want to implement it yourself I can defer this.

-- 
Wolf


Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-14 Thread Wolfgang Richter
On Tue, May 14, 2013 at 12:45 PM, Paolo Bonzini pbonz...@redhat.com wrote:

 No, I'll just reuse the same hooks within block/mirror.c (almost... it
 looks like I need after_write too, not just before_write :( that's a
 pity).  Basically:

 1) before the write, if there is space in the job's buffers, allocate a
 MirrorOp and a data buffer for the write.  Also record whether the block
 was dirty before;

 2) after the write, do nothing if there was no room to allocate the data
 buffer.  Else clear the block from the dirty bitmap.  If the block was
 dirty, read the whole cluster from the source as in passive mirroring.
 If it wasn't, copy the data from guest memory to the preallocated buffer
 and write it to the destination;

  If I knew more about what you
  had in mind, I wouldn't mind trying to add this 'active' mode to
  'drive-mirror'
  and test it with my use case.  I want to avoid duplicate work, so if you
  want to implement it yourself I can defer this.

 Also the other way round.  If you want to give it a shot based on the
 above spec just tell me.


Talked with my group here as well.  I think I'd like to give it a shot
based on the
above spec rather than refactor my code into a new command.  This way it
will
hopefully reduce duplicated efforts, and provide extra testing for the
active
mirroring code.

I'll take a pass through the mirror code to make sure I understand it
better than
I currently do.

Would you like to coordinate off-list until we have a patch?

-- 
Wolf


[Qemu-devel] drive-mirror sync points

2013-05-13 Thread Wolfgang Richter
Paolo/anyone who knows -

Are drive-mirror sync points (NBD flush commands) reflecting guest write
barriers?  Are guest write barriers respected by drive-mirror?  If so, that
would make drive-mirror much more palatable for disk introspection work (a
drop-in usable feature of QEMU!).

-- 
Wolf


[Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection

2013-05-13 Thread Wolfgang Richter
I'm working on a new patch series which will add a new QMP command,
block-trace, which turns on tracing of writes for a specified block device
and
sends the stream unmodified to another block device.  The 'trace' is meant
to
be precise meaning that writes are not lost, which differentiates this
command
from others.  It can be turned on and off depending on when it is needed.



How is this different from block-backup or drive-mirror?


block-backup is designed to create point-in-time snapshots and not clone the
entire write stream of a VM to a particular device.  It implements
copy-on-write to create a snapshot.  Thus whenever a write occurs,
block-backup
is designed to send the original data and not the contents of the new write.

drive-mirror is designed to mirror a disk to another location.  It operates
by
periodically scanning a dirty bitmap and cloning blocks when dirtied.  This
is
efficient as it allows for batching of writes, but it does not maintain the
order in which guest writes occurred and it can miss intermediate writes
when
they go to the same location on disk.



How can block-trace be used?


(1) Disk introspection - systems which analyze the writes going to a disk
for
introspection require a perfect clone of the write stream to an original
disk
to stay in-sync with updates to guest file systems.

(2) Replicated block device - two block devices could be maintained as exact
copies of each other up to a point in the disk write stream that has
successfully been written to the destination block device.



--
Wolf


Re: [Qemu-devel] drive-mirror sync points

2013-05-13 Thread Wolfgang Richter
On May 13, 2013, at 5:46 PM, Richard W.M. Jones rjo...@redhat.com wrote:

 On Mon, May 13, 2013 at 01:50:00PM -0400, Wolfgang Richter wrote:
 Paolo/anyone who knows -
 
 Are drive-mirror sync points (NBD flush commands) reflecting guest write
 barriers?  Are guest write barriers respected by drive-mirror?  If so, that
 would make drive-mirror much more palatable for disk introspection work (a
 drop-in usable feature of QEMU!).
 
 I'm also interested in this question.  Further extensions to this
 (*not* drive-mirror on its own AIUI) which stefanha is working on
 should allow libguestfs to perform point-in-time snapshots of images,
 which will mean that we can do complex and long-running inspection
 operations on live guests.

And I'm trying to do complex, long-running inspection on live guests without 
needing point-in-time snapshots :-)


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-24 Thread Wolfgang Richter
On Wed, Apr 24, 2013 at 4:37 AM, Stefan Hajnoczi stefa...@gmail.com wrote:

  Has there been any performance analysis of drive-mirror (impact on
 executing guest)?

 It slows down guest I/O for a couple of reasons:

 1. Writes now require a read from the original device followed by a
write to the target device.  Only after this completes is the write
allowed to proceed.

 2. Overlapping read/write requests are serialized to maintain
consistency between the guests I/Os and the block-backup I/Os.


Makes sense, #2 is what I want/need (I don't care about the original data).


 But on second thought, I don't think block-backup fits the bill.  You
 don't care about the original data, you care about what new data the
 guest is writing.


Precisely.  I crawl and index original data before we start getting the live
stream of new data/writes.


 I think what you really want is a tap block driver which mirrors
 writes to a target device (typically a NBD volume).  You can model this
 on blkverify or check out Benoit Canet's quorum patches.


Something like this, or live replication via drive-mirror which implements
#2.

-- 
Wolf


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-24 Thread Wolfgang Richter
On Wed, Apr 24, 2013 at 4:39 AM, Stefan Hajnoczi stefa...@gmail.com wrote:

 On Tue, Apr 23, 2013 at 03:11:26PM -0400, Wolfgang Richter wrote:
  On Tue, Apr 23, 2013 at 2:31 PM, Wolfgang Richter w...@cs.cmu.edu
 wrote:
 
   On Tue, Apr 23, 2013 at 2:21 PM, Stefan Hajnoczi stefa...@gmail.com
 wrote:
  
   Eric's suggestion to use NBD makes sense to me.  The block-backup code
   can be extended fairly easier using sync mode=none (do not perform a
   background copy of the entire disk) and by disabling the bitmap
   (essentially tap mode).
  
  
  Also, as another thought, I think I can actually use the bitmap to
 implement
  an optimization.  In my code, I already use a bitmap to determine which
  sectors I want to introspect (ignoring portions of the disk greatly
 reduces
  required bandwidth and overhead; swap space for example isn't generally
  interesting unless you can interpret memory as well).   So I think I can
  adapt
  my code here as well.

 Cool.  By the way, do you actually care about the data being written or
 just which sectors were touched?


Excellent question, my example wasn't clear.  I do want the data
_especially_
for sectors containing file system metadata because I interpret metadata
(for
NTFS and ext4 currently) to figure out new sectors associated with a file or
file creations and file deletions.

But, if there is a system that is write-heavy, I'm OK with dropping data
writes
to regular files (not file system metadata).  In that case, people
interested in
say monitoring web server logs would lose the data stream from their log
files,
but the introspection system as a whole maintains its view of the file
system
space.

-- 
Wolf


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-24 Thread Wolfgang Richter
On Wed, Apr 24, 2013 at 5:24 AM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 24/04/2013 10:37, Stefan Hajnoczi ha scritto:
   Has there been any performance analysis of drive-mirror (impact on
 executing guest)?

 What Stefan wrote is about block-backup.

 drive-mirror has a limited impact on guest performance, but it doesn't
 pass the writes through to the channel.  Instead, it uses a dirty bitmap
 that it periodically scans to copy new data to the destination.


This was my take on drive-mirror from reading the wiki.  I was excited
about the 'live replication' functionality.

 It slows down guest I/O for a couple of reasons:
 
  1. Writes now require a read from the original device followed by a
 write to the target device.  Only after this completes is the write
 allowed to proceed.
 
  2. Overlapping read/write requests are serialized to maintain
 consistency between the guests I/Os and the block-backup I/Os.
 
  But on second thought, I don't think block-backup fits the bill.  You
  don't care about the original data, you care about what new data the
  guest is writing.

 Right.  However, when block-backup gets in, I will try to change
 drive-mirror to use an active method.  I don't have a timeframe for
 this, though.


This sounds more ideal for what I want (a more 'active' drive mirror).

-- 
Wolf


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-24 Thread Wolfgang Richter
On Wed, Apr 24, 2013 at 5:26 AM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 23/04/2013 20:31, Wolfgang Richter ha scritto:
  On Tue, Apr 23, 2013 at 2:21 PM, Stefan Hajnoczi stefa...@gmail.com
  mailto:stefa...@gmail.com wrote:
 
  The tracing subsystem is geared towards tracepoint instrumentation
  rather than binary dumps.
 
  Can you share some specific applications?
 
 
  Well, my main application is in exposing a cloud-inotify service by
  interpreting
  sector writes in real-time and publishing the updates as file system
  manipulations.
  By using introspection we don't need agents running inside the guest.
 
  Example: guest writes to sector 5786907; I reverse-map that sector and
  notice
  it belongs to '/etc/passwd' within that guest; I immediately emit a
 message
  (currently using Redis pub-sub functionality) to any interested
  subscribers that
  '/etc/passwd' changed within this guest running on a certain host within
 the
  datacenter.

 If you are okay with writes being bundled and you are able to handle
 reordered writes within a small timeframe (usually 0.1-1s), then you can
 use drive-mirror with an NBD destination.


In the purest form, not to miss updates I'm not OK with it.  But, I think
that introspection can still _mostly_ work given these relaxed constraints.

Reordered writes can be difficult to stomach though: imagine that a file
inode
update goes through before its data writes.  Imagine that the inode update
simply extends the file size, with the last data block write coming soon
after.
We might incorrectly report bytes (and their contents) as belonging to this
file before we see the final data block write if the data block is currently
cached.

-- 
Wolf


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-24 Thread Wolfgang Richter
On Wed, Apr 24, 2013 at 12:15 PM, Paolo Bonzini pbonz...@redhat.com wrote:

 Il 24/04/2013 18:12, Wolfgang Richter ha scritto:
  In the purest form, not to miss updates I'm not OK with it.  But, I
 think
  that introspection can still _mostly_ work given these relaxed
 constraints.
 
  Reordered writes can be difficult to stomach though: imagine that a file
 inode
  update goes through before its data writes.  Imagine that the inode
 update
  simply extends the file size, with the last data block write coming soon
  after.
  We might incorrectly report bytes (and their contents) as belonging to
 this
  file before we see the final data block write if the data block is
 currently
  cached.

 Yes, it's difficult.

 In case it helps, sync points are marked by a flush command in the NBD
 protocol.  At this point, the disk image is guaranteed to match the source.

 You can make the SLICE_TIME shorter in block/mirror.c to ensure that
 writes are more promptly replicated to the destination, but in general
 it is not a problem.  QEMU can sync 10 times a second or more (with a
 worst-case of 1-1.5 seconds) during a kernel compile (don't remember the
 details, but something like make -j8).


Yes, I was thinking as a stop-gap solution of just using this short term
until
something with stronger guarantees could be put in place.

I think it's coming down to deciding between:

(1) New device like 'blkverify' that doesn't actual verify, but just clone
operations
(2) Creating an active version of drive-mirror with the stronger guarantees
 (presumably turned on with an option).

-- 
Wolf


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-24 Thread Wolfgang Richter
On Wed, Apr 24, 2013 at 1:37 AM, Stefan Hajnoczi stefa...@gmail.com wrote:

 I think what you really want is a tap block driver which mirrors
 writes to a target device (typically a NBD volume).  You can model this
 on blkverify or check out Benoit Canet's quorum patches.

 Stefan


An interesting thought, what we're basically talking about now is a
RAID 1 block device exposed by QEMU (no OS support needed).
I think (?) it could have wider applicability than just introspection,
and it could someday be extended to other forms of RAID.

I think I'll implement such a block device, unsure of what to call it
(blkraid ?).


[Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-23 Thread Wolfgang Richter
I'm interested in adding introspection of disk writes to QEMU for various
applications and research potential.

What I mean by introspection of disk writes is that, when enabled, each
write
passing through QEMU to backing storage would also be copied to an
introspection channel for further analysis.

I currently have an implementation piggy-backing on the tracing subsystem,
but
adding binary trace events breaks various assumptions about that subsystem
(for
example, the stderr backend would no longer be readable when tracing disk
writes).

I'd really like to someday have introspection in the QEMU mainline, and thus
I'm wondering:

(1) Should the tracing subsystem be extended to include binary events?

or

(2) Should a separate introspection subsystem be implemented?

I suppose we should keep in mind that introspection could include memory,
network, etc. if others wanted that in the future (although I am not
working on
that).

--
Wolf


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-23 Thread Wolfgang Richter
On Tue, Apr 23, 2013 at 2:21 PM, Stefan Hajnoczi stefa...@gmail.com wrote:

 The tracing subsystem is geared towards tracepoint instrumentation
 rather than binary dumps.

 Can you share some specific applications?


Well, my main application is in exposing a cloud-inotify service by
interpreting
sector writes in real-time and publishing the updates as file system
manipulations.
By using introspection we don't need agents running inside the guest.

Example: guest writes to sector 5786907; I reverse-map that sector and
notice
it belongs to '/etc/passwd' within that guest; I immediately emit a message
(currently using Redis pub-sub functionality) to any interested subscribers
that
'/etc/passwd' changed within this guest running on a certain host within the
datacenter.

Other applications of VMI that I've seen are usually security-related:
detecting
rootkits invisible to the guest etc., because once the guest is compromised
agents running inside it can not be trusted.


 Eric's suggestion to use NBD makes sense to me.  The block-backup code
 can be extended fairly easier using sync mode=none (do not perform a
 background copy of the entire disk) and by disabling the bitmap
 (essentially tap mode).


This makes a lot of sense to me as well.  I'm glad there's a built-in mode
not to copy the whole disk.  I suppose I will have to customize the patch
to disable the bitmap?  Is there any chance we could also expose that as
an option to users?  As in, let them decide the granularity of their
snapshots/policies regarding snapshots in a streaming mode?

--
Wolf


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-23 Thread Wolfgang Richter
On Tue, Apr 23, 2013 at 2:31 PM, Wolfgang Richter w...@cs.cmu.edu wrote:

 On Tue, Apr 23, 2013 at 2:21 PM, Stefan Hajnoczi stefa...@gmail.comwrote:

 Eric's suggestion to use NBD makes sense to me.  The block-backup code
 can be extended fairly easier using sync mode=none (do not perform a
 background copy of the entire disk) and by disabling the bitmap
 (essentially tap mode).


Also, as another thought, I think I can actually use the bitmap to implement
an optimization.  In my code, I already use a bitmap to determine which
sectors I want to introspect (ignoring portions of the disk greatly reduces
required bandwidth and overhead; swap space for example isn't generally
interesting unless you can interpret memory as well).   So I think I can
adapt
my code here as well.


Re: [Qemu-devel] Adding Disk-Level Introspection to QEMU

2013-04-23 Thread Wolfgang Richter

--
Wolf

On Apr 23, 2013, at 1:22 PM, Eric Blake ebl...@redhat.com wrote:

 On 04/23/2013 11:12 AM, Wolfgang Richter wrote:
 I'm interested in adding introspection of disk writes to QEMU for various
 applications and research potential.
 
 What I mean by introspection of disk writes is that, when enabled, each
 write
 passing through QEMU to backing storage would also be copied to an
 introspection channel for further analysis.
 
 Sounds like you would be benefited by the block-backup series, with an
 NBD server as the point where you inject your introspection.
 
 https://lists.gnu.org/archive/html/qemu-devel/2013-04/msg04629.html
 
 The existing drive-mirror command can also target an NBD destination,
 with similar effects.

Yes, OK as a new member to the list I saw the block-backup series and was 
starting to have similar thoughts.  I'll port my code (analysis side) to work 
with it (or drive-mirror).

Has there been any performance analysis of drive-mirror (impact on executing 
guest)?


 -- 
 Eric Blake   eblake redhat com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org



Re: [Qemu-devel] Multiple NIC's With Redirected Ports

2005-06-24 Thread Wolfgang Richter
I am assuming the nics work with -user-net properties, with a simulated
router/firewall DHCP server at 10.0.2.2.  Is it possible to manually
assign an IP (such as 10.0.2.5; is 10.0.2.3 still a nameserver?) and
still have access to the internet?

Wolfgang Richter wrote:

Basically, what I want to accomplish is this.  eth0 and eth1 are in
bridging mode, with eth0 supposedly leading out to the internet, and
eth1 supposedly connecting an internal network to the internet. eth2
connects to a third network, but that doesn't really matter too much. 
eth0 wants a few ports open and so does eth2.  Is this possible at all
with QEMU?  So far I've had no luck...but will continue trying different
configurations.

--
Wolfgang Richter

[EMAIL PROTECTED] wrote:

  

I am trying to simulate three NIC's, with redirected ports from the host to 
my simulated system. I want port 22 to go to NIC 1, and port 443 to go to NIC 
3. Is this possible? So far, I think only eth0 seems to be working on my 
guest OS, so maybe my -redir tcp:22::22 -redir tcp:443::443 are screwing up 
the multiple NIC's?? I have to redirect ports in order for the guest OS to 
have servers right (SSH, SSL web)? I am using QEMU 0.7.0.

I just want to make sure my invocation of QEMU (under Windows XP) isn't 
screwing anything up:

qemu.exe -L \Program Files\Qemu\bios -m 256 -hda C:\Program 
Files\Qemu\RooHoneynet.img -enable-audio -localtime -nics 3 -redir 
tcp:22::22 -redir tcp:443::443

--
Wolfgang Richter


___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel

 





___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel



-- 
Wolfgang Richter
[EMAIL PROTECTED]




signature.asc
Description: OpenPGP digital signature
___
Qemu-devel mailing list
Qemu-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/qemu-devel