date:20141118

Re: [Qemu-devel] [PATCH 1/9] l2tpv3: fix fd leak

2014-11-18 Thread Gonglei

On 2014/11/18 15:50, Markus Armbruster wrote:

 Michael Tokarev m...@tls.msk.ru writes:
 
 15.11.2014 13:06, arei.gong...@huawei.com wrote:
 From: Gonglei arei.gong...@huawei.com

 In this false branch, fd will leak when it is zero.
 Change the testing condition.

 Why fd==0 is a concern here?  It is a very unlikely
 situation that fd0 will be picked - firstly because
 fd0 is almost always open, and second - even if it
 isn't open, it will be picked much earlier than this
 code path, ie, some other file will use fd0 before.

 But even if the concern is real (after all, better
 stay correct than spread bad code pattern, even if
 in reality we don't care as this can't happen), why
 not add 0 to the equality?

 Why people especially compare with -1?  Any negative
 value is illegal here and in lots of other places,
 and many software packages used to return -errno in
 error cases, which is definitely != -1.  I'm not
 saying that comparing with -1 is bad in _this_
 particular case, but why not do it generally in
 all cases?

 More, comparing with 0 is faster and shorter than
 comparing with -1...

 So if it were me, I'd change it to = 0, not to
 == -1.  Here and in all other millions of places
 in qemu code where it compares with -1... ;)
 
 Yup.
 
 In the case of close(), I wouldn't even bother to check, except Coverity
 gets excited when it sees an invalid fd flow into close().  Rightfully
 so when the invalid fd is non-negative, overeager when it's negative.

Thank you, guys.
Paolo had fixed it :)

Best regards,
-Gonglei

Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold reporting for block devices

2014-11-18 Thread Francesco Romani

- Original Message -
 From: Stefan Hajnoczi stefa...@gmail.com
 To: Francesco Romani from...@redhat.com
 Cc: kw...@redhat.com, lcapitul...@redhat.com, qemu-devel@nongnu.org, 
 stefa...@redhat.com, mdr...@linux.vnet.ibm.com
 Sent: Monday, November 17, 2014 5:49:36 PM
 Subject: Re: [Qemu-devel] [RFC][PATCH v2] block: add write threshold 
 reporting for block devices

 On Fri, Nov 07, 2014 at 02:12:13PM +0100, Francesco Romani wrote:

 Sorry for the long review delay.  Looks pretty good, just one real issue
 to think about at the bottom.

Hi Stefan, thanks for the review and no problem for the delay :)

  +static void usage_threshold_disable(BlockDriverState *bs)
  +{

 It would be safest to make this idempotent:

 if (!usage_threshold_is_set(bs)) {
 return;
 }

 That way it can be called multiple times.

Will change.

  +notifier_with_return_remove(bs-wr_usage_threshold_notifier);
  +bs-wr_offset_threshold = 0;
  +}
  +
  +static int usage_threshold_is_set(const BlockDriverState *bs)
  +{
  +return !!(bs-wr_offset_threshold  0);
  +}

 Please use the bool type instead of an int return value.

Sure, will fix.

  +static int coroutine_fn before_write_notify(NotifierWithReturn *notifier,
  +void *opaque)
  +{
  +BdrvTrackedRequest *req = opaque;
  +BlockDriverState *bs = req-bs;
  +int64_t amount = 0;
  +
  +assert((req-offset  (BDRV_SECTOR_SIZE - 1)) == 0);
  +assert((req-bytes  (BDRV_SECTOR_SIZE - 1)) == 0);

 Does the code still make these assumptions or are the asserts left over
 from previous versions of the patch?

It's a leftover.
I understood they don't hurt and add a bit of safety, but if they are confusing
I'll remove them.

  +void bdrv_set_usage_threshold(BlockDriverState *bs, int64_t
  threshold_bytes)
  +{
  +BlockDriverState *target_bs = bs;
  +if (bs-file) {
  +target_bs = bs-file;
  +}

 Hmm...I think now I understand why you are trying to use bs-file.  This
 is an attempt to make image formats work with the threshold.

 Unfortunately the BlockDriverState topology can be more complicated than
 just 1 level.

I thought so but I can't reproduce yet more complex topologies.
Disclosure: I'm testing against the topology I know to be used on oVirt, lacking
immediate availability of others: suggestions welcome.

At risk of being redundant, in oVirt the usecase is QCOW2 over lvm block device,
and we'd like to be notified about the allocation of the lvm block device, 
which (IIUC)
is the last bs-file.

This is a simple topology (unless I'm missing something), and that's
the reason why I go down just one level.

Of course I want a general solution for this change, so...

 If we hardcode a strategy to traverse bs-file then it will work in most
 cases:

   while (bs-file) {
   bs = bs-file;
   }

 But there are cases like VMDK extent files where a BlockDriverState
 actually has multiple children.

 One way to solve this is to require that the management tool tells QEMU
 which exact BlockDriverState node the threshold applies to.  Then QEMU
 doesn't need any hardcoded policy.  But I'm not sure how realistic that
 it at the moment (whether management tools are uses node names for each
 node yet), so it may be best to hardcode the bs-file traversal that
 I've suggested.

oVirt relies on libvirt[1], and uses device node (e.g. 'vda').

BTW, I haven't found a way to inspect from the outside (e.g. monitor command) 
the BlockDriverState
topology, there is a way I'm missing?

Another issue I don't yet have a proper solution is related to this one.

The management app will have deal with VM with more than one block device disk, 
so we need
a way to map the notification with the corresponding device.

If we descend the bs-file chain, AFAIU the easiest mapping, being the device 
name,
is easily lost because only the outermost BlockDriverState has a device name 
attached, so when the
notification trigger
bdrv_get_device_name() will return NULL

I believe we don't necessarily need a device name in the notification, for 
example something
like an opaque cookie set together with the threshold and returned back with 
the notification
will suffice. Of course the device name is nicer :)

+++

[1] if that can help further to understand the usecase, these are the libvirt 
APIs being used
by oVirt:
http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBlockStatsFlags
http://libvirt.org/html/libvirt-libvirt-domain.html#virDomainGetBlockInfo
both relies on the output[2] of 'query-blockstats' monitor command.

[2] AFAIU -but this is just my guesswork!- it also assumes a quite simple 
topology like I did

Thanks and bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R  D
Phone: 8261328
IRC: fromani

Re: [Qemu-devel] [Bug 1393486] [NEW] hw/virtio/virtio-rng.c:150: bad test ?

2014-11-18 Thread Amit Shah

On (Mon) 17 Nov 2014 [17:15:03], dcb wrote:
 Public bug reported:
 
 hw/virtio/virtio-rng.c:150:31: warning: logical not is only applied to
 the left hand side of comparison [-Wlogical-not-parentheses]
 
 if (!vrng-conf.period_ms  0) {
 error_setg(errp, 'period' parameter expects a positive integer);
 return;
 }
 
 Maybe better code
 
 if (vrng-conf.period_ms = 0) {
 error_setg(errp, 'period' parameter expects a positive integer);
 return;
 }

Thanks!

Do you want to submit a patch, since you've identified the fix as
well?


Amit

Re: [Qemu-devel] [PATCH v3 for-2.2 3/3] raw-posix: The SEEK_HOLE code is flawed, rewrite it

2014-11-18 Thread Max Reitz


On 2014-11-17 at 17:43, Eric Blake wrote:

On 11/17/2014 03:18 AM, Markus Armbruster wrote:

On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Maybe worth a comment that this is not fatal, but also not optimal.


Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
   -ENXIO, there's in fact a trailing hole.  Can happen only when
   something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
   then try_seek_hole() reports a trailing hole.  This is okay only
   when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
   found by SEEK_HOLE has since become trailing somehow).  For other
   failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
   then try_seek_hole() reports bogus data [-1,start), which its caller
   raw_co_get_block_status() turns into zero sectors of data.  Could
   theoretically lead to infinite loops in code that attempts to scan
   data vs. hole forward.

Rewrite from scratch, with very careful comments.

Thanks for the careful commit message as well as the careful comments :)


Signed-off-by: Markus Armbruster arm...@redhat.com
---
  block/raw-posix.c | 111 +-
  1 file changed, 85 insertions(+), 26 deletions(-)

@@ -1542,25 +1600,26 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
  nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
  }
  
+} else if (data == start) {

  /* On a data extent, compute sectors to the end of the extent.  */
  *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);

I think we are safe for now that no file system supports holes smaller
than 512 bytes, so (hole - start) should always be a non-zero multiple
of sectors.  Similarly for the hole case of (data - start).  Maybe it's
worth assert(*pnum  0) to ensure that we never hit a situation where we
go into an infinite loop where we aren't progressing because pnum is
never advancing to the next sector?


That's something the callers of bdrv_get_block_status() have to take 
care of. See commit 4b25bbc4c22cf39350b75bd250d568a4d975f7c5, for example.


Max


But that would be okay as a
separate patch, and I don't want to delay getting _this_ patch into 2.2.

Reviewed-by: Eric Blake ebl...@redhat.com

Re: [Qemu-devel] [PATCH v3 for-2.2 0/3] raw-posix: Get rid of FIEMAP, fix SEEK_HOLE

2014-11-18 Thread Max Reitz


On 2014-11-17 at 11:18, Markus Armbruster wrote:

PATCH 1 is just a comment fix.  PATCH 2 drops FIEMAP use and explains
why it needs to go.  PATCH 3 carefully rewrites the SEEK_HOLE code.

Why 2.2?  The series fixes bugs, but the bugs are either not terribly
severe, or not particularly likely to bite.  The reason I want it
included is we've already changed the code fundamentally in 2.2, from

 Incorrect use of FIEMAP if available, else somewhat incorrect use
 of SEEK_HOLE if available, else pretend no holes

to

 Somewhat incorrect use of SEEK_HOLE if available, else correct but
 slow-as-molasses use of FIEMAP if available, else pretend no holes

Let's finish the job:

 Correct use of SEEK_HOLE if available, else pretend no holes

Less complex, more robust, and no half-broken intermediate version
released.

v3:
* PATCH 12 unchanged, except for a commit message typo [Eric]
* PATCH 34 redone as PATCH 3, very carefully [Eric, Kevin]

v2:
* PATCH 1 unchanged
* PATCH 2 revised and split up [Paolo, Fam, Eric, Max]

Markus Armbruster (3):
   raw-posix: Fix comment for raw_co_get_block_status()
   raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
   raw-posix: The SEEK_HOLE code is flawed, rewrite it

  block/raw-posix.c | 167 --
  1 file changed, 86 insertions(+), 81 deletions(-)


Thanks, applied to my block tree with the hunk removing skip_fiemap 
squashed into patch 2:


https://github.com/XanClic/qemu/commits/block

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Amit Shah

On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
 This patchset fixes CVE-2014-7840: invalid
 migration stream can cause arbitrary qemu memory
 overwrite.
 First patch includes the minimal fix for the issue.
 Follow-up patches on top add extra checking to reduce the
 chance this kind of bug recurs.
 
 Note: these are already (tentatively-pending review)
 queued in my tree, so only review/ack
 is necessary.

Any more acks for this?  Dave?

Also, Michael agreed to just push patch 1 for now, and 2-4 can go in
later (2.3)?  Michael, is that fine with you?


Amit

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Amit Shah

On (Mon) 17 Nov 2014 [14:36:45], Michael S. Tsirkin wrote:
 On Mon, Nov 17, 2014 at 05:50:34PM +0530, Amit Shah wrote:
  On (Mon) 17 Nov 2014 [13:48:58], Michael S. Tsirkin wrote:
   On Mon, Nov 17, 2014 at 04:37:50PM +0530, Amit Shah wrote:
On (Mon) 17 Nov 2014 [12:52:59], Michael S. Tsirkin wrote:
 On Mon, Nov 17, 2014 at 04:08:58PM +0530, Amit Shah wrote:
  On (Mon) 17 Nov 2014 [12:32:57], Michael S. Tsirkin wrote:
   On Mon, Nov 17, 2014 at 12:06:38PM +0530, Amit Shah wrote:
On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
 This patchset fixes CVE-2014-7840: invalid
 migration stream can cause arbitrary qemu memory
 overwrite.
 First patch includes the minimal fix for the issue.
 Follow-up patches on top add extra checking to reduce the
 chance this kind of bug recurs.
 
 Note: these are already (tentatively-pending review)
 queued in my tree, so only review/ack
 is necessary.

Why not let this go in via the migration tree?
   
   Well I Cc'd Juan and David, so if they had a problem with this, I 
   expect
   they'd complain.  David acked so I assume it's ok.  Since I 
   wasted time
   testing this and have it on my tree already, might as well just 
   merge.
  
  IMO asking as a courtesy would've been better than just stating it.
 
 Right, thanks for reminding me.
 
 BTW, there is actually a good reason to special-case it: it's a CVE 
 fix,
 which I handle.  So they stay on my private queue and are passed
 to vendors so vendors can fix downstreams, until making fix public is
 cleared with all reporters and vendors.
 After reporting is cleared, I try to collect acks but don't normally 
 route
 patches through separate queues - that would make it harder to
 track the status which we need for CVEs.

Patch is public, so all of this doesn't really matter.

But: involving maintainers in their areas, even if the patch is
embargoed, should be a pre-requisite.  I'm not sure if we're doing
that, but please do that so patches get a proper review from the
maintainers.
   
   Involving more people means more back and forth with reporters which
   must approve any disclosure.  If the issue isn't clear, I do involve
   maintainers.  I send patches on list and try to merge them only after
   they get ack from relevant people. I'm sorry, but this is as far as I
   have the time to go.
  
  The other aspect of the thing is sub-optimal, or patches with bugs,
  get pushed in, because the maintainers didn't get involved.  
 
 Patches don't get merged before they are on list for a while.
 I typically ping people if I don't get acks.

BTW I was talking about embargoed bugs / patches.  That's not relevant
for this discussion.  I'll create a new thread to discuss qemu's
security policy for embargoed bugs.


Amit

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Dr. David Alan Gilbert

* Amit Shah (amit.s...@redhat.com) wrote:
 On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
  This patchset fixes CVE-2014-7840: invalid
  migration stream can cause arbitrary qemu memory
  overwrite.
  First patch includes the minimal fix for the issue.
  Follow-up patches on top add extra checking to reduce the
  chance this kind of bug recurs.
  
  Note: these are already (tentatively-pending review)
  queued in my tree, so only review/ack
  is necessary.
 
 Any more acks for this?  Dave?

Yep,
Reviewed-by: Dr. David Alan Gilbert dgilb...@redhat.com
(for the set)

 Also, Michael agreed to just push patch 1 for now, and 2-4 can go in
 later (2.3)?  Michael, is that fine with you?
 
 
   Amit

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] mute button not work

2014-11-18 Thread li

Hi all,
 
i use qemu-system-x86_64 to run windows 7, and fund that mute/unmute button 
in the right bottom of guest desktop not work.
 
related info is as below:
 
$qemu-system-x86_64 --version
QEMU emulator version 1.4.0, Copyright (c) 2003-2008 Fabrice Bellard
 
$ps -elf | grep qemu-system-x86_64
qemu-system-x86_64 -machine accel=kvm:tcg -m 4096 -smp 4 -M pc-i440fx-1.4 -spi
ce port=5906,addr=0.0.0.0,disable-ticketing,jpeg-wan-compression=always -vga 
qxl -global qxl-vga.vram_size=268435456 
/opt/instances/win7_sh_2.1.1_driver.qcow2 -net 
nic,macaddr=00:16:3e:00:00:04,model=virtio
-net tap,ifname=tap04 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6  -chardev 
pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev 
spicevmc,id=charchannel0,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device 
hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7
 
$uname -a
Linux mymachine 3.8.0-34-generic #49~precise1-Ubuntu SMP Wed Nov 13 18:05:00 
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
 
Best Regards,
Stone

Re: [Qemu-devel] [PATCH 2/5] block: Add AioContextNotifier functions to BB

2014-11-18 Thread Max Reitz


On 2014-11-17 at 18:26, Paolo Bonzini wrote:

On 17/11/2014 16:30, Max Reitz wrote:

Because all BlockDriverStates behind a single BlockBackend reside in a
single AioContext, it is fine to just pass these functions
(blk_add_aio_context_notifier() and blk_remove_aio_context_notifier())
through to the root BlockDriverState.

Signed-off-by: Max Reitz mre...@redhat.com

The logical question then is: are there any function in BlockDriverState
that do not make sense as a BlockBackend API?


Well, surely bdrv_swap(), or bdrv_drop_intermediate(). These are 
functions which work on multiple BDSs (in the same tree, that is, behind 
the same BB) so they don't make sense on the BB.


Others could simply be passed through to the root BDS but it somehow 
doesn't make sense to execute them on the BB. For instance, 
bdrv_set_key(); this is something for an individual BDS, in contrast to 
other operations like reading and writing which will probably be passed 
through the BDS tree; or bdrv_get_info().


The functions added in this patch do make sense on a BB level: Since all 
BDSs behind one BB are always in the same AioContext, it makes sense to 
consider that the BB's AioContext.


The next patch is more difficult to justify. Closing a BDS is somehow 
passed through, but at first glance it doesn't make a whole lot of sense 
on the BB level. However, when you consider (as far as I looked into it) 
that a BDS is only closed when there are either no references to it 
(which will not happen as long as it has a BB) or when it is ejected, it 
suddenly does make sense: Ejecting really is something for the BB, so 
it makes sense to wait for that event (even though the name close 
notifier doesn't sound much like it...). Maybe I should sometimes take 
a deeper look into when a BDS belonging to a BB may be closed and if 
it's really only due to ejection, rename the close notifiers to 
something like eject notifiers (on the BB level).


Max

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Michael S. Tsirkin

On Tue, Nov 18, 2014 at 02:31:34PM +0530, Amit Shah wrote:
 On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
  This patchset fixes CVE-2014-7840: invalid
  migration stream can cause arbitrary qemu memory
  overwrite.
  First patch includes the minimal fix for the issue.
  Follow-up patches on top add extra checking to reduce the
  chance this kind of bug recurs.
  
  Note: these are already (tentatively-pending review)
  queued in my tree, so only review/ack
  is necessary.
 
 Any more acks for this?  Dave?
 
 Also, Michael agreed to just push patch 1 for now, and 2-4 can go in
 later (2.3)?  Michael, is that fine with you?
 
 
   Amit

I'm fine with putting them off.
Someone wants to take them or should I?


-- 
MST

Re: [Qemu-devel] [PATCH 5/5] nbd: Use BlockBackend internally

2014-11-18 Thread Max Reitz


On 2014-11-17 at 18:29, Paolo Bonzini wrote:


On 17/11/2014 16:30, Max Reitz wrote:

With all externally visible functions changed to use BlockBackend, this
patch makes nbd use BlockBackend for everything internally as well.

While touching them, substitute 512 by BDRV_SECTOR_SIZE in the calls to
blk_read(), blk_write() and blk_co_discard().

Signed-off-by: Max Reitz mre...@redhat.com

qemu-nbd.c has more uses of bs that probably should be changed to
blk.  Not sure if it should be in this patch, patch 4 or an additional
patch 6 though.


Good point. I'll go for patch 6.


If the third possibility, series

Reviewed-by: Paolo Bonzini pbonz...@redhat.com


Thanks!

Max

Re: [Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-18 Thread Dr. David Alan Gilbert

* Michael S. Tsirkin (m...@redhat.com) wrote:
 On Tue, Nov 18, 2014 at 02:31:34PM +0530, Amit Shah wrote:
  On (Wed) 12 Nov 2014 [11:44:35], Michael S. Tsirkin wrote:
   This patchset fixes CVE-2014-7840: invalid
   migration stream can cause arbitrary qemu memory
   overwrite.
   First patch includes the minimal fix for the issue.
   Follow-up patches on top add extra checking to reduce the
   chance this kind of bug recurs.
   
   Note: these are already (tentatively-pending review)
   queued in my tree, so only review/ack
   is necessary.
  
  Any more acks for this?  Dave?
  
  Also, Michael agreed to just push patch 1 for now, and 2-4 can go in
  later (2.3)?  Michael, is that fine with you?
  
  
  Amit
 
 I'm fine with putting them off.
 Someone wants to take them or should I?

exec.c stuff seems to be pretty much all over, so I think you're
as good as anyone.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH 2/5] block: Add AioContextNotifier functions to BB

2014-11-18 Thread Paolo Bonzini

On 18/11/2014 10:26, Max Reitz wrote:
 However, when you consider (as far as I looked into it) that a BDS is
 only closed when there are either no references to it (which will not
 happen as long as it has a BB) or when it is ejected, it suddenly does
 make sense: Ejecting really is something for the BB, so it makes sense
 to wait for that event (even though the name close notifier doesn't
 sound much like it...).

I agree, and indeed the notifier was added to deal with ejection.

Thanks for clarifying.  I guess if QEMU were written in C++ we would
have a hierarchy like this:

BlockDevice (abstract)
BlockBackend
BlockDriverState

and there would be no duplication of function names; the BlockBackend
implementation would still have to forward to the top BlockDriverState.

Paolo

[Qemu-devel] hotremoving a disk qmp/hmp

2014-11-18 Thread William Dauchy

Hello,

When hotremoving a disk I'm using the QMP API with device_del command;

Previous query-block command result:

{   u'device': u'disk1',
u'inserted': {   u'backing_file_depth': 0,
 u'bps': 0,
 u'bps_rd': 0,
 u'bps_wr': 0,
 u'detect_zeroes': u'off',
 u'drv': u'raw',
 u'encrypted': False,
 u'encryption_key_missing': False,
 u'file': u'/dev/sda',
 u'image': {   u'actual-size': 0,
   u'dirty-flag': False,
   u'filename': u'/dev/sda',
   u'format': u'raw',
   u'virtual-size': 3221225472},
 u'iops': 0,
 u'iops_rd': 0,
 u'iops_wr': 0,
 u'ro': False},
u'io-status': u'ok',
u'locked': True,
u'removable': True,
u'tray_open': False,
u'type': u'unknown'}

After a device_del command I have the same thing but 'locked': False
Then I can also do `eject device=disk1` which bring me to:

{   u'device': u'disk1',
   u'io-status': u'ok',
   u'locked': False,
   u'removable': True,
   u'tray_open': False,
   u'type': u'unknown'}

But I did not found a way to completely remove the disk1 entry in order
to be able to add it again.
To complete the operation I need to use the old HMP API and use
`drive_del` command.

Did I miss something? or is it still something missing in QMP API?

Best regards,
-- 
William


signature.asc
Description: Digital signature

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Paolo Bonzini



On 17/11/2014 22:20, Don Slutz wrote:
 The other callers to blk_set_enable_write_cache() in this file
 already check for s-blk == NULL.
 
 Signed-off-by: Don Slutz dsl...@verizon.com
 ---
 
 I think this is a bugfix that should be back ported to stable
 releases.
 
 I also think this should be done in xen's copy of QEMU for 4.5 with
 back port(s) to active stable releases.
 
 Note: In 2.1 and earlier the routine is
 bdrv_set_enable_write_cache(); variable is s-bs.
 
  hw/ide/core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/hw/ide/core.c b/hw/ide/core.c
 index 00e21cf..d4af5e2 100644
 --- a/hw/ide/core.c
 +++ b/hw/ide/core.c
 @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int 
 version_id)
  {
  IDEState *s = opaque;
  
 -if (s-identify_set) {
 +if (s-blk  s-identify_set) {
  blk_set_enable_write_cache(s-blk, !!(s-identify_data[85]  (1  
 5)));
  }
  return 0;
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

[Qemu-devel] [PATCH for-2.2 0/3] block/raw-posix: Fixes for preallocation

2014-11-18 Thread Max Reitz

This series brings some (minor) fixes for full preallocation in
raw-posix; thanks to László for finding these bugs.

Since the fixes are not too grave, I would be fine with not getting them
into 2.2. But because they are rather simple and they are bug fixes
after all, I sent this series with the for-2.2 infix.


Max Reitz (3):
  block/raw-posix: Fix preallocating write() loop
  block/raw-posix: Only sync after successful preallocation
  block/raw-posix: Catch fsync() errors

 block/raw-posix.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH for-2.2 2/3] block/raw-posix: Only sync after successful preallocation

2014-11-18 Thread Max Reitz

The loop which filled the file with zeroes may have been left early due
to an error. In that case, the fsync() should be skipped.

Signed-off-by: Max Reitz mre...@redhat.com
---
 block/raw-posix.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 4e6552f..f67fb11 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1459,7 +1459,9 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 left -= result;
 }
-fsync(fd);
+if (result = 0) {
+fsync(fd);
+}
 g_free(buf);
 break;
 }
-- 
1.9.3

[Qemu-devel] [PATCH for-2.2 1/3] block/raw-posix: Fix preallocating write() loop

2014-11-18 Thread Max Reitz

write() may write less bytes than requested; in this case, the number of
bytes written is returned. This is the byte count we should be
subtracting from the number of bytes still to be written, and not the
byte count we requested to write.

Reported-by: László Érsek ler...@redhat.com
Signed-off-by: Max Reitz mre...@redhat.com
---
An interesting anecdote: My German man page for write(2) says the
following about its return value:

 Bei Erfolg wird Null zurückgegeben.

Which translates to:

 On success, zero is returned.

Whereas write(2) for LANG=C says the following (which is correct):

 On success, the number of bytes written is returned (zero indicates
 nothing was written).

I think I know why I somehow prefer the English versions...
---
 block/raw-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..4e6552f 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1457,7 +1457,7 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
  Could not write to the new file);
 break;
 }
-left -= num;
+left -= result;
 }
 fsync(fd);
 g_free(buf);
-- 
1.9.3

[Qemu-devel] [PATCH for-2.2 3/3] block/raw-posix: Catch fsync() errors

2014-11-18 Thread Max Reitz

fsync() may fail, and that case should be handled.

Reported-by: László Érsek ler...@redhat.com
Signed-off-by: Max Reitz mre...@redhat.com
---
 block/raw-posix.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index f67fb11..666cdec 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1460,7 +1460,12 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 left -= result;
 }
 if (result = 0) {
-fsync(fd);
+result = fsync(fd);
+if (result  0) {
+result = -errno;
+error_setg_errno(errp, -result,
+ Could not flush new file to disk);
+}
 }
 g_free(buf);
 break;
-- 
1.9.3

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread Vladimir Sementsov-Ogievskiy




(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt 
if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE? 

- The name of the bitmap and also the size of this name



(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's the 
primary benefit you're seeking? 
Hmm. It may be used for faster sync. Maybe, save some of bitmap levels 
on timer while vm is running and save the last level on shutdown?


CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:



On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi

I'd just like to start working on persistent dirty bitmap. My thoughts
about it are the following:
- qemu -drive file=file,dirty_bitmap=file
 so,  bitmap will be loaded with drive open and saved with drive 
close.

- save only meaningful (the last) level of the bitmap, restore all
levels on bitmap loading
- bool parameter persistent for bdrv_create_dirty_bitmap and
BdrvDirtyBitmap
- internal dirty_bitmaps, saved in qcow2 file

Best regards,
Vladimir


I am thinking:

(1) Command Lines

If you enable dirty bitmaps and give it a file that doesn't exist, it 
should error out on you.


If you enable dirty bitmaps and give it a file that's blank, it 
understands that it is to create a persistent bitmap file in this 
location and it should enable persistence.


If a bitmap file is given and it has valid magic, this should imply 
persistence.


I am hesitant to have it auto-create files that don't already exist in 
case the files become large in size and a misconfiguration leads to 
repeated creation of these files that get orphaned in random folders. 
Perhaps we can add a create=auto flag or similar to allow this 
behavior if wanted.


(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt 
if we give ourselves a sector's worth to write metadata within.)

- Data starting at... PAGESIZE?

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the write until 
we can mark the bitmap as dirty first. This incurs a write penalty 
when we try to use the bitmap at first...

- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty 
information to the file and mark the file as clean once more and 
re-take the persistence lock.


(4) Synchronization Policy

- Sync after so many bits become dirty in the bitmap, either as an 
absolute threshold or a density percentage?

- Sync periodically on a fixed timer?
- Sync periodically opportunistically when I/O utilization becomes 
relatively low? (With some sort of starvation prevention timer?)

- Sync only at shutdown?

In discussing with Stefan, I think we rather liked the idea of a timer 
that tries to re-commit the block data during lulls in the I/O.


(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's the 
primary benefit you're seeking?


(6) Inclusion as qcow2 Metadata

And lastly, we did discuss the inclusion of the bitmap as qcow2 
metadata, but decided it wasn't our principle target for the format to 
allow generality to other file formats. We didn't really discuss the 
idea of having it as an option or an extension, but I don't (off the 
top of my head) have any reasonings against it, but I will likely not 
work on it myself.



You didn't CC qemu-devel on this (so I won't!), but perhaps we should 
re-send out our ideas to the wider list for feedback before we proceed 
any further. Maybe we can split the work if we agree upon a design.


Thanks!
--js

P.S.: I'm still cleaning up Fam's first patchset based on Max's and 
your feedback. Hope to have it out by the end of this week.



On 11.11.2014 18:59, John Snow wrote:



On 11/10/2014 03:15 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi Fam, hi Jorn.

Jagane's project - http://wiki.qemu.org/Features/Livebackup

In two words:
Normal delta - like in qemu, while backuping, we save all new 
writes to

separate virtual disk - delta. When backup is done, we can merge delta
back to original image.
Reverse delta - while backuping, we don't stop writing to original 
image
(and qemu works with it, not with delta), but before every write

Re: [Qemu-devel] [PATCH for-2.2 0/3] block/raw-posix: Fixes for preallocation

2014-11-18 Thread Kevin Wolf

Am 18.11.2014 um 11:23 hat Max Reitz geschrieben:
 This series brings some (minor) fixes for full preallocation in
 raw-posix; thanks to László for finding these bugs.
 
 Since the fixes are not too grave, I would be fine with not getting them
 into 2.2. But because they are rather simple and they are bug fixes
 after all, I sent this series with the for-2.2 infix.

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH] exec: Handle multipage ranges in invalidate_and_set_dirty()

2014-11-18 Thread Peter Maydell

On 17 November 2014 11:03, Paolo Bonzini pbonz...@redhat.com wrote:


 On 16/11/2014 20:44, Peter Maydell wrote:
 The code in invalidate_and_set_dirty() needs to handle addr/length
 combinations which cross guest physical page boundaries. This can happen,
 for example, when disk I/O reads large blocks into guest RAM which previously
 held code that we have cached translations for. Unfortunately we were only
 checking the clean/dirty status of the first page in the range, and then
 were calling a tb_invalidate function which only handles ranges that don't
 cross page boundaries. Fix the function to deal with multipage ranges.

 The symptoms of this bug were that guest code would misbehave (eg segfault),
 in particular after a guest reboot but potentially any time the guest
 reused a page of its physical RAM for new code.

 Cc: qemu-sta...@nongnu.org
 Signed-off-by: Peter Maydell peter.mayd...@linaro.org
 ---
 This seems pretty nasty, and I have no idea why it hasn't been wreaking
 more havoc than this before. I'm not entirely sure why we invalidate TBs
 if any of the dirty bits is set rather than only if the code bit is set,
 but I left that logic as it is.

 I think it's a remain of when we had a single bitmap with three bits in
 it.  We can clean up in 2.3.

 Review appreciated -- it would be nice to get this into rc2 if
 we can, I think.

 Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Applied to master; thanks.

-- PMM

[Qemu-devel] [PATCH v2 3/6] block: Add blk_add_close_notifier() for BB

2014-11-18 Thread Max Reitz

Adding something like a delete notifier to a BlockBackend would not
make much sense, because whoever is interested in registering there will
probably hold a reference to that BlockBackend; therefore, the notifier
will never be called (or only when the notifiee already relinquished its
reference and thus most probably is no longer interested in that
notification).

Therefore, this patch just passes through the close notifier interface
of the root BDS. This will be called when the device is ejected, for
instance, and therefore does make sense.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
---
 block/block-backend.c  | 5 +
 include/sysemu/block-backend.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 7a7f690..ef16d73 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -642,6 +642,11 @@ void blk_remove_aio_context_notifier(BlockBackend *blk,
  detach_aio_context, opaque);
 }
 
+void blk_add_close_notifier(BlockBackend *blk, Notifier *notify)
+{
+bdrv_add_close_notifier(blk-bs, notify);
+}
+
 void blk_io_plug(BlockBackend *blk)
 {
 bdrv_io_plug(blk-bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index d9c1337..8871a02 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -143,6 +143,7 @@ void blk_remove_aio_context_notifier(BlockBackend *blk,
   void *),
  void (*detach_aio_context)(void *),
  void *opaque);
+void blk_add_close_notifier(BlockBackend *blk, Notifier *notify);
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
-- 
1.9.3

[Qemu-devel] [PATCH v2 4/6] nbd: Change external interface to BlockBackend

2014-11-18 Thread Max Reitz

Substitute BlockDriverState by BlockBackend in every globally visible
function provided by nbd.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
---
 blockdev-nbd.c  | 15 ---
 include/block/nbd.h |  7 +++
 nbd.c   | 11 ++-
 qemu-nbd.c  |  2 +-
 4 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 06f901e..22e95d1 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -10,6 +10,7 @@
  */
 
 #include sysemu/blockdev.h
+#include sysemu/block-backend.h
 #include hw/block/block.h
 #include monitor/monitor.h
 #include qapi/qmp/qerror.h
@@ -73,7 +74,7 @@ static void nbd_close_notifier(Notifier *n, void *data)
 void qmp_nbd_server_add(const char *device, bool has_writable, bool writable,
 Error **errp)
 {
-BlockDriverState *bs;
+BlockBackend *blk;
 NBDExport *exp;
 NBDCloseNotifier *n;
 
@@ -87,12 +88,12 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 return;
 }
 
-bs = bdrv_find(device);
-if (!bs) {
+blk = blk_by_name(device);
+if (!blk) {
 error_set(errp, QERR_DEVICE_NOT_FOUND, device);
 return;
 }
-if (!bdrv_is_inserted(bs)) {
+if (!blk_is_inserted(blk)) {
 error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
 return;
 }
@@ -100,18 +101,18 @@ void qmp_nbd_server_add(const char *device, bool 
has_writable, bool writable,
 if (!has_writable) {
 writable = false;
 }
-if (bdrv_is_read_only(bs)) {
+if (blk_is_read_only(blk)) {
 writable = false;
 }
 
-exp = nbd_export_new(bs, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL);
+exp = nbd_export_new(blk, 0, -1, writable ? 0 : NBD_FLAG_READ_ONLY, NULL);
 
 nbd_export_set_name(exp, device);
 
 n = g_new0(NBDCloseNotifier, 1);
 n-n.notify = nbd_close_notifier;
 n-exp = exp;
-bdrv_add_close_notifier(bs, n-n);
+blk_add_close_notifier(blk, n-n);
 QTAILQ_INSERT_TAIL(close_notifiers, n, next);
 }
 
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 9e835d2..348302c 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -85,14 +85,13 @@ int nbd_disconnect(int fd);
 typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;
 
-NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset,
-  off_t size, uint32_t nbdflags,
-  void (*close)(NBDExport *));
+NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
+  uint32_t nbdflags, void (*close)(NBDExport *));
 void nbd_export_close(NBDExport *exp);
 void nbd_export_get(NBDExport *exp);
 void nbd_export_put(NBDExport *exp);
 
-BlockDriverState *nbd_export_get_blockdev(NBDExport *exp);
+BlockBackend *nbd_export_get_blockdev(NBDExport *exp);
 
 NBDExport *nbd_export_find(const char *name);
 void nbd_export_set_name(NBDExport *exp, const char *name);
diff --git a/nbd.c b/nbd.c
index a7bce45..3fd5743 100644
--- a/nbd.c
+++ b/nbd.c
@@ -19,6 +19,7 @@
 #include block/nbd.h
 #include block/block.h
 #include block/block_int.h
+#include sysemu/block-backend.h
 
 #include block/coroutine.h
 
@@ -957,10 +958,10 @@ static void bs_aio_detach(void *opaque)
 exp-ctx = NULL;
 }
 
-NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset,
-  off_t size, uint32_t nbdflags,
-  void (*close)(NBDExport *))
+NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
+  uint32_t nbdflags, void (*close)(NBDExport *))
 {
+BlockDriverState *bs = blk_bs(blk);
 NBDExport *exp = g_malloc0(sizeof(NBDExport));
 exp-refcount = 1;
 QTAILQ_INIT(exp-clients);
@@ -1056,9 +1057,9 @@ void nbd_export_put(NBDExport *exp)
 }
 }
 
-BlockDriverState *nbd_export_get_blockdev(NBDExport *exp)
+BlockBackend *nbd_export_get_blockdev(NBDExport *exp)
 {
-return exp-bs;
+return exp-bs-blk;
 }
 
 void nbd_export_close_all(void)
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 5cd6c6d..60ce50f 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -730,7 +730,7 @@ int main(int argc, char **argv)
 }
 }
 
-exp = nbd_export_new(bs, dev_offset, fd_size, nbdflags, nbd_export_closed);
+exp = nbd_export_new(blk, dev_offset, fd_size, nbdflags, 
nbd_export_closed);
 
 if (sockpath) {
 fd = unix_socket_incoming(sockpath);
-- 
1.9.3

[Qemu-devel] [PATCH v2 0/6] nbd: Use BlockBackend

2014-11-18 Thread Max Reitz

From the block layer's perspective, the nbd server is pretty similar to
a guest device. Therefore, it should use BlockBackend to access block
devices, just like any other guest device does.

This series consequently makes the nbd server use BlockBackend for
referencing block devices.


v2:
- Added patch 6, which converts qemu-nbd to BlockBackend as far as
  reasonable [Paolo]


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/6:[] [--] 'block: Lift more functions into BlockBackend'
002/6:[] [--] 'block: Add AioContextNotifier functions to BB'
003/6:[] [--] 'block: Add blk_add_close_notifier() for BB'
004/6:[] [--] 'nbd: Change external interface to BlockBackend'
005/6:[] [--] 'nbd: Use BlockBackend internally'
006/6:[down] 'qemu-nbd: Use BlockBackend where reasonable'


Max Reitz (6):
  block: Lift more functions into BlockBackend
  block: Add AioContextNotifier functions to BB
  block: Add blk_add_close_notifier() for BB
  nbd: Change external interface to BlockBackend
  nbd: Use BlockBackend internally
  qemu-nbd: Use BlockBackend where reasonable

 block/block-backend.c  | 38 +
 blockdev-nbd.c | 15 +-
 include/block/nbd.h|  7 ++---
 include/sysemu/block-backend.h | 12 
 nbd.c  | 63 +-
 qemu-nbd.c | 12 
 6 files changed, 99 insertions(+), 48 deletions(-)

-- 
1.9.3

[Qemu-devel] [PATCH v2 5/6] nbd: Use BlockBackend internally

2014-11-18 Thread Max Reitz

With all externally visible functions changed to use BlockBackend, this
patch makes nbd use BlockBackend for everything internally as well.

While touching them, substitute 512 by BDRV_SECTOR_SIZE in the calls to
blk_read(), blk_write() and blk_co_discard().

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
---
 nbd.c | 56 
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/nbd.c b/nbd.c
index 3fd5743..53cf82b 100644
--- a/nbd.c
+++ b/nbd.c
@@ -17,8 +17,6 @@
  */
 
 #include block/nbd.h
-#include block/block.h
-#include block/block_int.h
 #include sysemu/block-backend.h
 
 #include block/coroutine.h
@@ -102,7 +100,7 @@ struct NBDExport {
 int refcount;
 void (*close)(NBDExport *exp);
 
-BlockDriverState *bs;
+BlockBackend *blk;
 char *name;
 off_t dev_offset;
 off_t size;
@@ -930,7 +928,7 @@ static void nbd_request_put(NBDRequest *req)
 nbd_client_put(client);
 }
 
-static void bs_aio_attached(AioContext *ctx, void *opaque)
+static void blk_aio_attached(AioContext *ctx, void *opaque)
 {
 NBDExport *exp = opaque;
 NBDClient *client;
@@ -944,7 +942,7 @@ static void bs_aio_attached(AioContext *ctx, void *opaque)
 }
 }
 
-static void bs_aio_detach(void *opaque)
+static void blk_aio_detach(void *opaque)
 {
 NBDExport *exp = opaque;
 NBDClient *client;
@@ -961,24 +959,23 @@ static void bs_aio_detach(void *opaque)
 NBDExport *nbd_export_new(BlockBackend *blk, off_t dev_offset, off_t size,
   uint32_t nbdflags, void (*close)(NBDExport *))
 {
-BlockDriverState *bs = blk_bs(blk);
 NBDExport *exp = g_malloc0(sizeof(NBDExport));
 exp-refcount = 1;
 QTAILQ_INIT(exp-clients);
-exp-bs = bs;
+exp-blk = blk;
 exp-dev_offset = dev_offset;
 exp-nbdflags = nbdflags;
-exp-size = size == -1 ? bdrv_getlength(bs) : size;
+exp-size = size == -1 ? blk_getlength(blk) : size;
 exp-close = close;
-exp-ctx = bdrv_get_aio_context(bs);
-bdrv_ref(bs);
-bdrv_add_aio_context_notifier(bs, bs_aio_attached, bs_aio_detach, exp);
+exp-ctx = blk_get_aio_context(blk);
+blk_ref(blk);
+blk_add_aio_context_notifier(blk, blk_aio_attached, blk_aio_detach, exp);
 /*
  * NBD exports are used for non-shared storage migration.  Make sure
  * that BDRV_O_INCOMING is cleared and the image is ready for write
  * access since the export could be available before migration handover.
  */
-bdrv_invalidate_cache(bs, NULL);
+blk_invalidate_cache(blk, NULL);
 return exp;
 }
 
@@ -1025,11 +1022,11 @@ void nbd_export_close(NBDExport *exp)
 }
 nbd_export_set_name(exp, NULL);
 nbd_export_put(exp);
-if (exp-bs) {
-bdrv_remove_aio_context_notifier(exp-bs, bs_aio_attached,
- bs_aio_detach, exp);
-bdrv_unref(exp-bs);
-exp-bs = NULL;
+if (exp-blk) {
+blk_remove_aio_context_notifier(exp-blk, blk_aio_attached,
+blk_aio_detach, exp);
+blk_unref(exp-blk);
+exp-blk = NULL;
 }
 }
 
@@ -1059,7 +1056,7 @@ void nbd_export_put(NBDExport *exp)
 
 BlockBackend *nbd_export_get_blockdev(NBDExport *exp)
 {
-return exp-bs-blk;
+return exp-blk;
 }
 
 void nbd_export_close_all(void)
@@ -1138,7 +1135,7 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, 
struct nbd_request *reque
 
 command = request-type  NBD_CMD_MASK_COMMAND;
 if (command == NBD_CMD_READ || command == NBD_CMD_WRITE) {
-req-data = qemu_blockalign(client-exp-bs, request-len);
+req-data = blk_blockalign(client-exp-blk, request-len);
 }
 if (command == NBD_CMD_WRITE) {
 TRACE(Reading %u byte(s), request-len);
@@ -1204,7 +1201,7 @@ static void nbd_trip(void *opaque)
 TRACE(Request type is READ);
 
 if (request.type  NBD_CMD_FLAG_FUA) {
-ret = bdrv_co_flush(exp-bs);
+ret = blk_co_flush(exp-blk);
 if (ret  0) {
 LOG(flush failed);
 reply.error = -ret;
@@ -1212,8 +1209,9 @@ static void nbd_trip(void *opaque)
 }
 }
 
-ret = bdrv_read(exp-bs, (request.from + exp-dev_offset) / 512,
-req-data, request.len / 512);
+ret = blk_read(exp-blk,
+   (request.from + exp-dev_offset) / BDRV_SECTOR_SIZE,
+   req-data, request.len / BDRV_SECTOR_SIZE);
 if (ret  0) {
 LOG(reading from file failed);
 reply.error = -ret;
@@ -1235,8 +1233,9 @@ static void nbd_trip(void *opaque)
 
 TRACE(Writing to device);
 
-ret = bdrv_write(exp-bs, (request.from + exp-dev_offset) / 512,
- req-data, request.len / 512);
+ret = blk_write(exp-blk,
+(request.from + exp-dev_offset) / BDRV_SECTOR_SIZE,
+

[Qemu-devel] [PATCH v2 6/6] qemu-nbd: Use BlockBackend where reasonable

2014-11-18 Thread Max Reitz

Because qemu-nbd creates the BlockBackend by itself, it should create
the according BlockDriverState tree by itself as well; that means, it
has call bdrv_open() on its own. This is one of the places where
qemu-nbd still needs to use a BlockDriverState directly (the root BDS
below the BB); other places are the configuration of zero detection
(which may be lifted into the BB eventually, but is not yet) and
temporarily loading a snapshot.

Everywhere else, though, qemu-nbd can and thus should use BlockBackend.

Suggested-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Max Reitz mre...@redhat.com
---
 qemu-nbd.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 60ce50f..d222512 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -146,7 +146,7 @@ static void read_partition(uint8_t *p, struct 
partition_record *r)
 r-nb_sectors_abs = p[12] | p[13]  8 | p[14]  16 | p[15]  24;
 }
 
-static int find_partition(BlockDriverState *bs, int partition,
+static int find_partition(BlockBackend *blk, int partition,
   off_t *offset, off_t *size)
 {
 struct partition_record mbr[4];
@@ -155,7 +155,7 @@ static int find_partition(BlockDriverState *bs, int 
partition,
 int ext_partnum = 4;
 int ret;
 
-if ((ret = bdrv_read(bs, 0, data, 1))  0) {
+if ((ret = blk_read(blk, 0, data, 1))  0) {
 errno = -ret;
 err(EXIT_FAILURE, error while reading);
 }
@@ -175,7 +175,7 @@ static int find_partition(BlockDriverState *bs, int 
partition,
 uint8_t data1[512];
 int j;
 
-if ((ret = bdrv_read(bs, mbr[i].start_sector_abs, data1, 1))  0) {
+if ((ret = blk_read(blk, mbr[i].start_sector_abs, data1, 1))  0) {
 errno = -ret;
 err(EXIT_FAILURE, error while reading);
 }
@@ -720,10 +720,10 @@ int main(int argc, char **argv)
 }
 
 bs-detect_zeroes = detect_zeroes;
-fd_size = bdrv_getlength(bs);
+fd_size = blk_getlength(blk);
 
 if (partition != -1) {
-ret = find_partition(bs, partition, dev_offset, fd_size);
+ret = find_partition(blk, partition, dev_offset, fd_size);
 if (ret  0) {
 errno = -ret;
 err(EXIT_FAILURE, Could not find partition %d, partition);
-- 
1.9.3

[Qemu-devel] [PATCH v2 1/6] block: Lift more functions into BlockBackend

2014-11-18 Thread Max Reitz

There are already some blk_aio_* functions, so we might as well have
blk_co_* functions (as far as we need them). This patch adds
blk_co_flush(), blk_co_discard(), and also blk_invalidate_cache() (which
is not a blk_co_* function but is needed nonetheless).

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
---
 block/block-backend.c  | 15 +++
 include/sysemu/block-backend.h |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index d0692b1..89f69b7 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -497,6 +497,16 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long 
int req, void *buf,
 return bdrv_aio_ioctl(blk-bs, req, buf, cb, opaque);
 }
 
+int blk_co_discard(BlockBackend *blk, int64_t sector_num, int nb_sectors)
+{
+return bdrv_co_discard(blk-bs, sector_num, nb_sectors);
+}
+
+int blk_co_flush(BlockBackend *blk)
+{
+return bdrv_co_flush(blk-bs);
+}
+
 int blk_flush(BlockBackend *blk)
 {
 return bdrv_flush(blk-bs);
@@ -549,6 +559,11 @@ void blk_set_enable_write_cache(BlockBackend *blk, bool 
wce)
 bdrv_set_enable_write_cache(blk-bs, wce);
 }
 
+void blk_invalidate_cache(BlockBackend *blk, Error **errp)
+{
+bdrv_invalidate_cache(blk-bs, errp);
+}
+
 int blk_is_inserted(BlockBackend *blk)
 {
 return bdrv_is_inserted(blk-bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 52d13c1..0c46b82 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -108,6 +108,8 @@ int blk_aio_multiwrite(BlockBackend *blk, BlockRequest 
*reqs, int num_reqs);
 int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf);
 BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
   BlockCompletionFunc *cb, void *opaque);
+int blk_co_discard(BlockBackend *blk, int64_t sector_num, int nb_sectors);
+int blk_co_flush(BlockBackend *blk);
 int blk_flush(BlockBackend *blk);
 int blk_flush_all(void);
 void blk_drain_all(void);
@@ -120,6 +122,7 @@ int blk_is_read_only(BlockBackend *blk);
 int blk_is_sg(BlockBackend *blk);
 int blk_enable_write_cache(BlockBackend *blk);
 void blk_set_enable_write_cache(BlockBackend *blk, bool wce);
+void blk_invalidate_cache(BlockBackend *blk, Error **errp);
 int blk_is_inserted(BlockBackend *blk);
 void blk_lock_medium(BlockBackend *blk, bool locked);
 void blk_eject(BlockBackend *blk, bool eject_flag);
-- 
1.9.3

[Qemu-devel] [PATCH v2 2/6] block: Add AioContextNotifier functions to BB

2014-11-18 Thread Max Reitz

Because all BlockDriverStates behind a single BlockBackend reside in a
single AioContext, it is fine to just pass these functions
(blk_add_aio_context_notifier() and blk_remove_aio_context_notifier())
through to the root BlockDriverState.

Signed-off-by: Max Reitz mre...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
---
 block/block-backend.c  | 18 ++
 include/sysemu/block-backend.h |  8 
 2 files changed, 26 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 89f69b7..7a7f690 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -624,6 +624,24 @@ void blk_set_aio_context(BlockBackend *blk, AioContext 
*new_context)
 bdrv_set_aio_context(blk-bs, new_context);
 }
 
+void blk_add_aio_context_notifier(BlockBackend *blk,
+void (*attached_aio_context)(AioContext *new_context, void *opaque),
+void (*detach_aio_context)(void *opaque), void *opaque)
+{
+bdrv_add_aio_context_notifier(blk-bs, attached_aio_context,
+  detach_aio_context, opaque);
+}
+
+void blk_remove_aio_context_notifier(BlockBackend *blk,
+ void (*attached_aio_context)(AioContext *,
+  void *),
+ void (*detach_aio_context)(void *),
+ void *opaque)
+{
+bdrv_remove_aio_context_notifier(blk-bs, attached_aio_context,
+ detach_aio_context, opaque);
+}
+
 void blk_io_plug(BlockBackend *blk)
 {
 bdrv_io_plug(blk-bs);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 0c46b82..d9c1337 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -135,6 +135,14 @@ void blk_op_block_all(BlockBackend *blk, Error *reason);
 void blk_op_unblock_all(BlockBackend *blk, Error *reason);
 AioContext *blk_get_aio_context(BlockBackend *blk);
 void blk_set_aio_context(BlockBackend *blk, AioContext *new_context);
+void blk_add_aio_context_notifier(BlockBackend *blk,
+void (*attached_aio_context)(AioContext *new_context, void *opaque),
+void (*detach_aio_context)(void *opaque), void *opaque);
+void blk_remove_aio_context_notifier(BlockBackend *blk,
+ void (*attached_aio_context)(AioContext *,
+  void *),
+ void (*detach_aio_context)(void *),
+ void *opaque);
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
-- 
1.9.3

[Qemu-devel] [PATCH 1/1] migration: fix parameter validation on ram load

2014-11-18 Thread Amit Shah

From: Michael S. Tsirkin m...@redhat.com

During migration, the values read from migration stream during ram load
are not validated. Especially offset in host_from_stream_offset() and
also the length of the writes in the callers of said function.

To fix this, we need to make sure that the [offset, offset + length]
range fits into one of the allocated memory regions.

Validating addr  len should be sufficient since data seems to always be
managed in TARGET_PAGE_SIZE chunks.

Fixes: CVE-2014-7840

Note: follow-up patches add extra checks on each block-host access.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Reviewed-by: Dr. David Alan Gilbert dgilb...@redhat.com
Signed-off-by: Amit Shah amit.s...@redhat.com
---
 arch_init.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 88a5ba0..593a990 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1006,7 +1006,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 uint8_t len;
 
 if (flags  RAM_SAVE_FLAG_CONTINUE) {
-if (!block) {
+if (!block || block-length = offset) {
 error_report(Ack, bad migration stream!);
 return NULL;
 }
@@ -1019,8 +1019,9 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 id[len] = 0;
 
 QTAILQ_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id)))
+if (!strncmp(id, block-idstr, sizeof(id))  block-length  offset) {
 return memory_region_get_ram_ptr(block-mr) + offset;
+}
 }
 
 error_report(Can't find block %s!, id);
-- 
2.1.0

[Qemu-devel] [PULL] migration: fix for CVE-2014-7840

2014-11-18 Thread Amit Shah

The following changes since commit d6be29e3fb5659102ac0e48e295d177cb67e32c5:

  target-arm: handle address translations that start at level 3 (2014-11-17 
19:30:28 +)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/qemu/amit/migration.git tags/for-2.2

for you to fetch changes up to 0be839a2701369f669532ea5884c15bead1c6e08:

  migration: fix parameter validation on ram load (2014-11-18 16:49:44 +0530)


Fix for CVE-2014-7840, avoiding arbitrary qemu memory overwrite for
migration by Michael S. Tsirkin.


Michael S. Tsirkin (1):
  migration: fix parameter validation on ram load

 arch_init.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.1.0

[Qemu-devel] [PULL 2/6] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP

2014-11-18 Thread Kevin Wolf

From: Markus Armbruster arm...@redhat.com

Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
follows:

1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl

2. Else if defined(SEEK_HOLE)  defined(SEEK_DATA), use lseek()

3. Else pretend there are no holes

Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().

Commit 4f11aa8 (May 2014) changed it to try the three methods in order
until success, because there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa.

Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
speed hit, the next commit 7c159037 put SEEK_HOLE/SEEK_DATA first.

As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow.  I guess this puts it somewhere between -7 The obvious use is
wrong and -10 It's impossible to get right on Rusty Russel's Hard
to Misuse scale[*].

Fortunately, the FIEMAP code is used only when

* SEEK_HOLE/SEEK_DATA aren't defined, but CONFIG_FIEMAP is

  Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
  was introduced for ext4 and btrfs) and 2012.

* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails

  Unlikely.

Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
bugs.  Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.

I don't want to worry about this crap, not even theoretically.  Get
rid of it.

[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

Signed-off-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Max Reitz mre...@redhat.com
---
 block/raw-posix.c | 63 ---
 1 file changed, 4 insertions(+), 59 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 706d3c0..a29130e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,9 +60,6 @@
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
 #endif
-#ifdef CONFIG_FIEMAP
-#include linux/fiemap.h
-#endif
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
 #include linux/falloc.h
 #endif
@@ -151,9 +148,6 @@ typedef struct BDRVRawState {
 bool has_write_zeroes:1;
 bool discard_zeroes:1;
 bool needs_alignment;
-#ifdef CONFIG_FIEMAP
-bool skip_fiemap;
-#endif
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -1481,52 +1475,6 @@ out:
 return result;
 }
 
-static int try_fiemap(BlockDriverState *bs, off_t start, off_t *data,
-  off_t *hole, int nb_sectors)
-{
-#ifdef CONFIG_FIEMAP
-BDRVRawState *s = bs-opaque;
-int ret = 0;
-struct {
-struct fiemap fm;
-struct fiemap_extent fe;
-} f;
-
-if (s-skip_fiemap) {
-return -ENOTSUP;
-}
-
-f.fm.fm_start = start;
-f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
-f.fm.fm_flags = FIEMAP_FLAG_SYNC;
-f.fm.fm_extent_count = 1;
-f.fm.fm_reserved = 0;
-if (ioctl(s-fd, FS_IOC_FIEMAP, f) == -1) {
-s-skip_fiemap = true;
-return -errno;
-}
-
-if (f.fm.fm_mapped_extents == 0) {
-/* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
- * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
- */
-off_t length = lseek(s-fd, 0, SEEK_END);
-*hole = f.fm.fm_start;
-*data = MIN(f.fm.fm_start + f.fm.fm_length, length);
-} else {
-*data = f.fe.fe_logical;
-*hole = f.fe.fe_logical + f.fe.fe_length;
-if (f.fe.fe_flags  FIEMAP_EXTENT_UNWRITTEN) {
-ret |= BDRV_BLOCK_ZERO;
-}
-}
-
-return ret;
-#else
-return -ENOTSUP;
-#endif
-}
-
 static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
  off_t *hole)
 {
@@ -1593,13 +1541,10 @@ static int64_t coroutine_fn 
raw_co_get_block_status(BlockDriverState *bs,
 
 ret = try_seek_hole(bs, start, data, hole);
 if (ret  0) {
-ret = try_fiemap(bs, start, data, hole, nb_sectors);
-if (ret  0) {
-/* Assume everything is allocated. */
-data = 0;
-hole = start + nb_sectors * BDRV_SECTOR_SIZE;
-ret = 0;
-}
+/* Assume everything is allocated. */
+data = 0;
+hole = start + nb_sectors * BDRV_SECTOR_SIZE;
+ret = 0;
 }
 
 assert(ret = 0);
-- 
1.8.3.1

[Qemu-devel] [PULL 6/6] block/raw-posix: Catch fsync() errors

2014-11-18 Thread Kevin Wolf

From: Max Reitz mre...@redhat.com

fsync() may fail, and that case should be handled.

Reported-by: László Érsek ler...@redhat.com
Signed-off-by: Max Reitz mre...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/raw-posix.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index d106fc4..b1af77e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1454,7 +1454,12 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 left -= result;
 }
 if (result = 0) {
-fsync(fd);
+result = fsync(fd);
+if (result  0) {
+result = -errno;
+error_setg_errno(errp, -result,
+ Could not flush new file to disk);
+}
 }
 g_free(buf);
 break;
-- 
1.8.3.1

[Qemu-devel] [PULL 1/6] raw-posix: Fix comment for raw_co_get_block_status()

2014-11-18 Thread Kevin Wolf

From: Markus Armbruster arm...@redhat.com

Missed in commit 705be72.

Signed-off-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Reviewed-by: Fam Zheng f...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Signed-off-by: Max Reitz mre...@redhat.com
---
 block/raw-posix.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..706d3c0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t 
start, off_t *data,
 }
 
 /*
- * Returns true iff the specified sector is present in the disk image. Drivers
- * not implementing the functionality are assumed to not support backing files,
- * hence all their sectors are reported as allocated.
+ * Returns the allocation status of the specified sectors.
  *
  * If 'sector_num' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
-- 
1.8.3.1

[Qemu-devel] [PULL 0/6] Block patches for 2.2.0-rc2

2014-11-18 Thread Kevin Wolf

The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-11-17 17:22:03 +)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 098ffa6674a82ceac0e3ccb3a8a5bf6ca44adcd5:

  block/raw-posix: Catch fsync() errors (2014-11-18 12:09:00 +0100)


Block patches for 2.2.0-rc2


Kevin Wolf (1):
  Merge remote-tracking branch 'mreitz/block' into queue-block

Markus Armbruster (3):
  raw-posix: Fix comment for raw_co_get_block_status()
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  raw-posix: The SEEK_HOLE code is flawed, rewrite it

Max Reitz (3):
  block/raw-posix: Fix preallocating write() loop
  block/raw-posix: Only sync after successful preallocation
  block/raw-posix: Catch fsync() errors

 block/raw-posix.c | 173 --
 1 file changed, 91 insertions(+), 82 deletions(-)

[Qemu-devel] [PULL 3/6] raw-posix: The SEEK_HOLE code is flawed, rewrite it

2014-11-18 Thread Kevin Wolf

From: Markus Armbruster arm...@redhat.com

On systems where SEEK_HOLE in a trailing hole seeks to EOF (Solaris,
but not Linux), try_seek_hole() reports trailing data instead.

Additionally, unlikely lseek() failures are treated badly:

* When SEEK_HOLE fails, try_seek_hole() reports trailing data.  For
  -ENXIO, there's in fact a trailing hole.  Can happen only when
  something truncated the file since we opened it.

* When SEEK_HOLE succeeds, SEEK_DATA fails, and SEEK_END succeeds,
  then try_seek_hole() reports a trailing hole.  This is okay only
  when SEEK_DATA failed with -ENXIO (which means the non-trailing hole
  found by SEEK_HOLE has since become trailing somehow).  For other
  failures (unlikely), it's wrong.

* When SEEK_HOLE succeeds, SEEK_DATA fails, SEEK_END fails (unlikely),
  then try_seek_hole() reports bogus data [-1,start), which its caller
  raw_co_get_block_status() turns into zero sectors of data.  Could
  theoretically lead to infinite loops in code that attempts to scan
  data vs. hole forward.

Rewrite from scratch, with very careful comments.

Signed-off-by: Markus Armbruster arm...@redhat.com
Reviewed-by: Max Reitz mre...@redhat.com
Reviewed-by: Eric Blake ebl...@redhat.com
Signed-off-by: Max Reitz mre...@redhat.com
---
 block/raw-posix.c | 111 +-
 1 file changed, 85 insertions(+), 26 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index a29130e..414e6d1 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1475,28 +1475,86 @@ out:
 return result;
 }
 
-static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
- off_t *hole)
+/*
+ * Find allocation range in @bs around offset @start.
+ * May change underlying file descriptor's file offset.
+ * If @start is not in a hole, store @start in @data, and the
+ * beginning of the next hole in @hole, and return 0.
+ * If @start is in a non-trailing hole, store @start in @hole and the
+ * beginning of the next non-hole in @data, and return 0.
+ * If @start is in a trailing hole or beyond EOF, return -ENXIO.
+ * If we can't find out, return a negative errno other than -ENXIO.
+ */
+static int find_allocation(BlockDriverState *bs, off_t start,
+   off_t *data, off_t *hole)
 {
 #if defined SEEK_HOLE  defined SEEK_DATA
 BDRVRawState *s = bs-opaque;
+off_t offs;
 
-*hole = lseek(s-fd, start, SEEK_HOLE);
-if (*hole == -1) {
-return -errno;
+/*
+ * SEEK_DATA cases:
+ * D1. offs == start: start is in data
+ * D2. offs  start: start is in a hole, next data at offs
+ * D3. offs  0, errno = ENXIO: either start is in a trailing hole
+ *  or start is beyond EOF
+ * If the latter happens, the file has been truncated behind
+ * our back since we opened it.  All bets are off then.
+ * Treating like a trailing hole is simplest.
+ * D4. offs  0, errno != ENXIO: we learned nothing
+ */
+offs = lseek(s-fd, start, SEEK_DATA);
+if (offs  0) {
+return -errno;  /* D3 or D4 */
+}
+assert(offs = start);
+
+if (offs  start) {
+/* D2: in hole, next data at offs */
+*hole = start;
+*data = offs;
+return 0;
 }
 
-if (*hole  start) {
+/* D1: in data, end not yet known */
+
+/*
+ * SEEK_HOLE cases:
+ * H1. offs == start: start is in a hole
+ * If this happens here, a hole has been dug behind our back
+ * since the previous lseek().
+ * H2. offs  start: either start is in data, next hole at offs,
+ *   or start is in trailing hole, EOF at offs
+ * Linux treats trailing holes like any other hole: offs ==
+ * start.  Solaris seeks to EOF instead: offs  start (blech).
+ * If that happens here, a hole has been dug behind our back
+ * since the previous lseek().
+ * H3. offs  0, errno = ENXIO: start is beyond EOF
+ * If this happens, the file has been truncated behind our
+ * back since we opened it.  Treat it like a trailing hole.
+ * H4. offs  0, errno != ENXIO: we learned nothing
+ * Pretend we know nothing at all, i.e. forget about D1.
+ */
+offs = lseek(s-fd, start, SEEK_HOLE);
+if (offs  0) {
+return -errno;  /* D1 and (H3 or H4) */
+}
+assert(offs = start);
+
+if (offs  start) {
+/*
+ * D1 and H2: either in data, next hole at offs, or it was in
+ * data but is now in a trailing hole.  In the latter case,
+ * all bets are off.  Treating it as if it there was data all
+ * the way to EOF is safe, so simply do that.
+ */
 *data = start;
-} else {
-/* On a hole.  We need another syscall to find its end.  */
-*data = lseek(s-fd, start, SEEK_DATA);
-if (*data == -1) {
-*data = lseek(s-fd, 0, SEEK_END);
-

[Qemu-devel] [PULL 5/6] block/raw-posix: Only sync after successful preallocation

2014-11-18 Thread Kevin Wolf

From: Max Reitz mre...@redhat.com

The loop which filled the file with zeroes may have been left early due
to an error. In that case, the fsync() should be skipped.

Signed-off-by: Max Reitz mre...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/raw-posix.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e0e48c5..d106fc4 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1453,7 +1453,9 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 left -= result;
 }
-fsync(fd);
+if (result = 0) {
+fsync(fd);
+}
 g_free(buf);
 break;
 }
-- 
1.8.3.1

[Qemu-devel] [PULL 4/6] block/raw-posix: Fix preallocating write() loop

2014-11-18 Thread Kevin Wolf

From: Max Reitz mre...@redhat.com

write() may write less bytes than requested; in this case, the number of
bytes written is returned. This is the byte count we should be
subtracting from the number of bytes still to be written, and not the
byte count we requested to write.

Reported-by: László Érsek ler...@redhat.com
Signed-off-by: Max Reitz mre...@redhat.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/raw-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 414e6d1..e0e48c5 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1451,7 +1451,7 @@ static int raw_create(const char *filename, QemuOpts 
*opts, Error **errp)
  Could not write to the new file);
 break;
 }
-left -= num;
+left -= result;
 }
 fsync(fd);
 g_free(buf);
-- 
1.8.3.1

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Stefan Hajnoczi

On Mon, Nov 17, 2014 at 04:20:39PM -0500, Don Slutz wrote:
 The other callers to blk_set_enable_write_cache() in this file
 already check for s-blk == NULL.
 
 Signed-off-by: Don Slutz dsl...@verizon.com
 ---
 
 I think this is a bugfix that should be back ported to stable
 releases.
 
 I also think this should be done in xen's copy of QEMU for 4.5 with
 back port(s) to active stable releases.
 
 Note: In 2.1 and earlier the routine is
 bdrv_set_enable_write_cache(); variable is s-bs.
 
  hw/ide/core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgp82QmAe9sMB.pgp
Description: PGP signature

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Kevin Wolf

Am 17.11.2014 um 22:20 hat Don Slutz geschrieben:
 The other callers to blk_set_enable_write_cache() in this file
 already check for s-blk == NULL.
 
 Signed-off-by: Don Slutz dsl...@verizon.com
 ---
 
 I think this is a bugfix that should be back ported to stable
 releases.

Please remember to include a 'Cc: qemu-sta...@nongnu.org' line in your
commit messages for such patches. I've added it and applied the patch to
my block branch, thanks. (Saw it too late for my -rc2 pull request,
though, so it'll have to wait for -rc3.)

Kevin

Re: [Qemu-devel] [PATCH 4/5] memory: interface to allocate device ram

2014-11-18 Thread Michael S. Tsirkin

On Mon, Nov 17, 2014 at 08:21:00PM +, Peter Maydell wrote:
 On 17 November 2014 20:08, Michael S. Tsirkin m...@redhat.com wrote:
  Add API to allocate on-device RAM.
  This looks just like regular RAM from migration POV,
  but has two special properties internally:
   - it is never exposed to guest
 
 If it's not exposed to the guest why is it a MemoryRegion?
 Those are pretty much the definition of regions of memory
 we expose to the guest (via one address space or another)...
 
 -- PMM

Mostly for migration: we are using page migration machinery to
migrate this memory early rather than as part of device state
after VM is stopped.

-- 
MST

Re: [Qemu-devel] [PATCH v2 6/6] qemu-nbd: Use BlockBackend where reasonable

2014-11-18 Thread Paolo Bonzini



On 18/11/2014 12:21, Max Reitz wrote:
 Because qemu-nbd creates the BlockBackend by itself, it should create
 the according BlockDriverState tree by itself as well; that means, it
 has call bdrv_open() on its own. This is one of the places where
 qemu-nbd still needs to use a BlockDriverState directly (the root BDS
 below the BB); other places are the configuration of zero detection
 (which may be lifted into the BB eventually, but is not yet) and
 temporarily loading a snapshot.
 
 Everywhere else, though, qemu-nbd can and thus should use BlockBackend.

... which is not much, but is nevertheless a start. :)

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

 Suggested-by: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  qemu-nbd.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)
 
 diff --git a/qemu-nbd.c b/qemu-nbd.c
 index 60ce50f..d222512 100644
 --- a/qemu-nbd.c
 +++ b/qemu-nbd.c
 @@ -146,7 +146,7 @@ static void read_partition(uint8_t *p, struct 
 partition_record *r)
  r-nb_sectors_abs = p[12] | p[13]  8 | p[14]  16 | p[15]  24;
  }
  
 -static int find_partition(BlockDriverState *bs, int partition,
 +static int find_partition(BlockBackend *blk, int partition,
off_t *offset, off_t *size)
  {
  struct partition_record mbr[4];
 @@ -155,7 +155,7 @@ static int find_partition(BlockDriverState *bs, int 
 partition,
  int ext_partnum = 4;
  int ret;
  
 -if ((ret = bdrv_read(bs, 0, data, 1))  0) {
 +if ((ret = blk_read(blk, 0, data, 1))  0) {
  errno = -ret;
  err(EXIT_FAILURE, error while reading);
  }
 @@ -175,7 +175,7 @@ static int find_partition(BlockDriverState *bs, int 
 partition,
  uint8_t data1[512];
  int j;
  
 -if ((ret = bdrv_read(bs, mbr[i].start_sector_abs, data1, 1))  
 0) {
 +if ((ret = blk_read(blk, mbr[i].start_sector_abs, data1, 1))  
 0) {
  errno = -ret;
  err(EXIT_FAILURE, error while reading);
  }
 @@ -720,10 +720,10 @@ int main(int argc, char **argv)
  }
  
  bs-detect_zeroes = detect_zeroes;
 -fd_size = bdrv_getlength(bs);
 +fd_size = blk_getlength(blk);
  
  if (partition != -1) {
 -ret = find_partition(bs, partition, dev_offset, fd_size);
 +ret = find_partition(blk, partition, dev_offset, fd_size);
  if (ret  0) {
  errno = -ret;
  err(EXIT_FAILURE, Could not find partition %d, partition);

Re: [Qemu-devel] [PATCH] linux-headers: update to 3.18-rc5

2014-11-18 Thread Peter Maydell

On 18 November 2014 05:19, Paolo Bonzini pbonz...@redhat.com wrote:


 On 17/11/2014 19:32, Peter Maydell wrote:
 On 17 November 2014 18:28, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 This updates the Linux header to version 3.18-rc5, adding support for
 (among other things) read-only memslots on ARM and arm64.

 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org

 So, to be clear, the idea is that this should go into 2.2 because
 it (effectively) fixes a bug where running from emulated flash
 devices under KVM/ARM doesn't work?

 A KVM header update seems pretty safe to me; cc'ing Paolo
 as KVM maintainer to ack/nak it.

 Sure, this is a good time (kernel-wise) to update headers.  And if it
 fixes a bug, it's acceptable for hard freeze.

OK; applied to master.

thanks
-- PMM

[Qemu-devel] [PATCH 0/4] MAINTAINERS: add myself to migration, rng; update others

2014-11-18 Thread Amit Shah

Hello,

These patches add myself to the migration maintainers list, and a new
entry for virtio-rng.  Also update the virtio-serial entry for an
include file.

Please ack.

Amit Shah (4):
  MAINTAINERS: Add myself to migration maintainers
  MAINTAINERS: migration: add vmstate static checker files
  MAINTAINERS: add entry for virtio-rng
  MAINTAINERS: add include files to virtio-serial entry

 MAINTAINERS | 11 +++
 1 file changed, 11 insertions(+)

-- 
2.1.0

[Qemu-devel] [PATCH 1/4] MAINTAINERS: Add myself to migration maintainers

2014-11-18 Thread Amit Shah

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bcb69e8..e517c41 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -928,6 +928,7 @@ F: scripts/checkpatch.pl
 
 Migration
 M: Juan Quintela quint...@redhat.com
+M: Amit Shah amit.s...@redhat.com
 S: Maintained
 F: include/migration/
 F: migration*
-- 
2.1.0

[Qemu-devel] [PATCH 2/4] MAINTAINERS: migration: add vmstate static checker files

2014-11-18 Thread Amit Shah

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e517c41..af8d856 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -935,6 +935,8 @@ F: migration*
 F: savevm.c
 F: arch_init.c
 F: vmstate.c
+F: scripts/vmstate-static-checker.py
+F: tests/vmstate-static-checker-data/
 
 Seccomp
 M: Eduardo Otubo eduardo.ot...@profitbricks.com
-- 
2.1.0

[Qemu-devel] [PATCH 3/4] MAINTAINERS: add entry for virtio-rng

2014-11-18 Thread Amit Shah

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index af8d856..4c46cfe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -697,6 +697,13 @@ S: Supported
 F: hw/char/virtio-serial-bus.c
 F: hw/char/virtio-console.c
 
+virtio-rng
+M: Amit Shah amit.s...@redhat.com
+S: Supported
+F: hw/virtio/virtio-rng.c
+F: include/hw/virtio/virtio-rng.h
+F: backends/rng*.c
+
 nvme
 M: Keith Busch keith.bu...@intel.com
 S: Supported
-- 
2.1.0

[Qemu-devel] [PATCH 4/4] MAINTAINERS: add include files to virtio-serial entry

2014-11-18 Thread Amit Shah

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4c46cfe..d2f4a11 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -696,6 +696,7 @@ M: Amit Shah amit.s...@redhat.com
 S: Supported
 F: hw/char/virtio-serial-bus.c
 F: hw/char/virtio-console.c
+F: include/hw/virtio/virtio-serial.h
 
 virtio-rng
 M: Amit Shah amit.s...@redhat.com
-- 
2.1.0

Re: [Qemu-devel] [PATCH Part1 1/5] acpi, pc: Add hotunplug request cb for pc machine.

2014-11-18 Thread Igor Mammedov

On Mon, 17 Nov 2014 13:03:13 +0800
Tang Chen tangc...@cn.fujitsu.com wrote:

in subj s/cb/callback|handler/

 Memory and CPU hot unplug are both asynchronize procedures.
s/asynchronize/asynchronous/
 They both need unplug request cb when the unplug operation happens.
s/cb when the unplug operation happens/callback to initiate unplug operation/

 
 This patch adds hotunplug request cb for pc machine, and memory and CPU
 hot unplug will base on it.
Add unplug handler to pc machine that will be used by following
CPU and memory unplug patches.


 
 Signed-off-by: Tang Chen tangc...@cn.fujitsu.com
 ---
  hw/i386/pc.c | 8 
  1 file changed, 8 insertions(+)
 
 diff --git a/hw/i386/pc.c b/hw/i386/pc.c
 index 1205db8..5c48435 100644
 --- a/hw/i386/pc.c
 +++ b/hw/i386/pc.c
 @@ -1647,6 +1647,13 @@ static void pc_machine_device_plug_cb(HotplugHandler 
 *hotplug_dev,
  }
  }
  
 +static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 +DeviceState *dev, Error 
 **errp)
 +{
 +error_setg(errp, acpi: device unplug request for not supported device
 +type: %s, object_get_typename(OBJECT(dev)));
it's not necessarily acpi related in general so maybe drop 'acpi:' prefix.
Also it would be nice to add device's ID or use it instead of type name.

 +}
 +
  static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
   DeviceState *dev)
  {
 @@ -1753,6 +1760,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
 *data)
  pcmc-get_hotplug_handler = mc-get_hotplug_handler;
  mc-get_hotplug_handler = pc_get_hotpug_handler;
  hc-plug = pc_machine_device_plug_cb;
 +hc-unplug_request = pc_machine_device_unplug_request_cb;
  }
  
  static const TypeInfo pc_machine_info = {

Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support

2014-11-18 Thread Frank Blaschka

On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:
 
 
 On 10.11.14 15:20, Frank Blaschka wrote:
  From: Frank Blaschka frank.blasc...@de.ibm.com
  
  This patch implements a pci bus for s390x together with infrastructure
  to generate and handle hotplug events, to configure/unconfigure via
  sclp instruction, to do iommu translations and provide s390 support for
  MSI/MSI-X notification processing.
  
  Signed-off-by: Frank Blaschka frank.blasc...@de.ibm.com
  ---
   default-configs/s390x-softmmu.mak |   1 +
   hw/s390x/Makefile.objs|   1 +
   hw/s390x/css.c|   5 +
   hw/s390x/css.h|   1 +
   hw/s390x/s390-pci-bus.c   | 485 
  ++
   hw/s390x/s390-pci-bus.h   | 254 
   hw/s390x/s390-virtio-ccw.c|   3 +
   hw/s390x/sclp.c   |  10 +-
   include/hw/s390x/sclp.h   |   8 +
   target-s390x/ioinst.c |  52 
   target-s390x/ioinst.h |   1 +
   11 files changed, 820 insertions(+), 1 deletion(-)
   create mode 100644 hw/s390x/s390-pci-bus.c
   create mode 100644 hw/s390x/s390-pci-bus.h
  
  diff --git a/default-configs/s390x-softmmu.mak 
  b/default-configs/s390x-softmmu.mak
  index 126d88d..6ee2ff8 100644
  --- a/default-configs/s390x-softmmu.mak
  +++ b/default-configs/s390x-softmmu.mak
  @@ -1,3 +1,4 @@
  +include pci.mak
   CONFIG_VIRTIO=y
   CONFIG_SCLPCONSOLE=y
   CONFIG_S390_FLIC=y
  diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
  index 1ba6c3a..428d957 100644
  --- a/hw/s390x/Makefile.objs
  +++ b/hw/s390x/Makefile.objs
  @@ -8,3 +8,4 @@ obj-y += ipl.o
   obj-y += css.o
   obj-y += s390-virtio-ccw.o
   obj-y += virtio-ccw.o
  +obj-y += s390-pci-bus.o
  diff --git a/hw/s390x/css.c b/hw/s390x/css.c
  index b67c039..7553085 100644
  --- a/hw/s390x/css.c
  +++ b/hw/s390x/css.c
  @@ -1299,6 +1299,11 @@ void css_generate_chp_crws(uint8_t cssid, uint8_t 
  chpid)
   /* TODO */
   }
   
  +void css_generate_css_crws(uint8_t cssid)
  +{
  +css_queue_crw(CRW_RSC_CSS, 0, 0, 0);
  +}
  +
   int css_enable_mcsse(void)
   {
   trace_css_enable_facility(mcsse);
  diff --git a/hw/s390x/css.h b/hw/s390x/css.h
  index 33104ac..7e53148 100644
  --- a/hw/s390x/css.h
  +++ b/hw/s390x/css.h
  @@ -101,6 +101,7 @@ void css_queue_crw(uint8_t rsc, uint8_t erc, int chain, 
  uint16_t rsid);
   void css_generate_sch_crws(uint8_t cssid, uint8_t ssid, uint16_t schid,
  int hotplugged, int add);
   void css_generate_chp_crws(uint8_t cssid, uint8_t chpid);
  +void css_generate_css_crws(uint8_t cssid);
   void css_adapter_interrupt(uint8_t isc);
   
   #define CSS_IO_ADAPTER_VIRTIO 1
  diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
  new file mode 100644
  index 000..f2fa6ba
  --- /dev/null
  +++ b/hw/s390x/s390-pci-bus.c
  @@ -0,0 +1,485 @@
  +/*
  + * s390 PCI BUS
  + *
  + * Copyright 2014 IBM Corp.
  + * Author(s): Frank Blaschka frank.blasc...@de.ibm.com
  + *Hong Bo Li lih...@cn.ibm.com
  + *Yi Min Zhao zyi...@cn.ibm.com
  + *
  + * This work is licensed under the terms of the GNU GPL, version 2 or (at
  + * your option) any later version. See the COPYING file in the top-level
  + * directory.
  + */
  +
  +#include hw/pci/pci.h
  +#include hw/pci/pci_bus.h
  +#include hw/s390x/css.h
  +#include hw/s390x/sclp.h
  +#include hw/pci/msi.h
  +#include qemu/error-report.h
  +#include s390-pci-bus.h
  +
  +/* #define DEBUG_S390PCI_BUS */
  +#ifdef DEBUG_S390PCI_BUS
  +#define DPRINTF(fmt, ...) \
  +do { fprintf(stderr, S390pci-bus:  fmt, ## __VA_ARGS__); } while (0)
  +#else
  +#define DPRINTF(fmt, ...) \
  +do { } while (0)
  +#endif
  +
  +static const unsigned long be_to_le = BITS_PER_LONG - 1;
  +static QTAILQ_HEAD(, SeiContainer) pending_sei =
  +QTAILQ_HEAD_INITIALIZER(pending_sei);
  +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
  +QTAILQ_HEAD_INITIALIZER(device_list);
 
 Please get rid of all statics ;). All state has to live in objects.


be_to_le was misleading and unnecesary will remove this one but
static QTAILQ_HEAD seems to be a common practice for list anchors.
If you really want me to change this do you have any prefered way,
or can you point me to some code doing this?

  +
  +int chsc_sei_nt2_get_event(void *res)
  +{
  +ChscSeiNt2Res *nt2_res = (ChscSeiNt2Res *)res;
  +PciCcdfAvail *accdf;
  +PciCcdfErr *eccdf;
  +int rc = 1;
  +SeiContainer *sei_cont;
  +
  +sei_cont = QTAILQ_FIRST(pending_sei);
  +if (sei_cont) {
  +QTAILQ_REMOVE(pending_sei, sei_cont, link);
  +nt2_res-nt = 2;
  +nt2_res-cc = sei_cont-cc;
  +switch (sei_cont-cc) {
  +case 1: /* error event */
  +eccdf = (PciCcdfErr *)nt2_res-ccdf;
  +eccdf-fid = cpu_to_be32(sei_cont-fid);
  +eccdf-fh = cpu_to_be32(sei_cont-fh);
  +break;
  +

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread Vladimir Sementsov-Ogievskiy


(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the write until 
we can mark the bitmap as dirty first. This incurs a write penalty 
when we try to use the bitmap at first...

- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty 
information to the file and mark the file as clean once more and 
re-take the persistence lock. 

Correct me if I'm wrong.

#Read bitmap:
read in blockdev_init, before any write to device, so no lock is needed.

#Set bits in bitmap:
if bitmap.dirty_flag:
   set bits
else:
   LOCK
   set bits
   set bitmap.dirty_flag
   set dirty_flag in bitmap file
   UNLOCK

#Sync:
if not bitmap.dirty_flag:
   skip sync
else:
   LOCK
   save one of bitmap levels (saving the last one is too long and not 
very good idea, because it is fast-updateing)

   unset dirty_flag in bitmap file
   unset bitmap.dirty_flag
   UNLOCK

#Last sync in bdrv_close:
Just save the last bitmap level and unset dirty_flag in bitmap file

Also.. I'm not quite sure about locking.. As I understand, co-routines 
in qemu are not running in parallel, is locking required? Or sync timer 
will not be co-routine based?


Best regards,
Vladimir

On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:



(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt 
if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE? 

- The name of the bitmap and also the size of this name



(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's 
the primary benefit you're seeking? 
Hmm. It may be used for faster sync. Maybe, save some of bitmap levels 
on timer while vm is running and save the last level on shutdown?


CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:



On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi

I'd just like to start working on persistent dirty bitmap. My thoughts
about it are the following:
- qemu -drive file=file,dirty_bitmap=file
 so,  bitmap will be loaded with drive open and saved with drive 
close.

- save only meaningful (the last) level of the bitmap, restore all
levels on bitmap loading
- bool parameter persistent for bdrv_create_dirty_bitmap and
BdrvDirtyBitmap
- internal dirty_bitmaps, saved in qcow2 file

Best regards,
Vladimir


I am thinking:

(1) Command Lines

If you enable dirty bitmaps and give it a file that doesn't exist, it 
should error out on you.


If you enable dirty bitmaps and give it a file that's blank, it 
understands that it is to create a persistent bitmap file in this 
location and it should enable persistence.


If a bitmap file is given and it has valid magic, this should imply 
persistence.


I am hesitant to have it auto-create files that don't already exist 
in case the files become large in size and a misconfiguration leads 
to repeated creation of these files that get orphaned in random 
folders. Perhaps we can add a create=auto flag or similar to allow 
this behavior if wanted.


(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt 
if we give ourselves a sector's worth to write metadata within.)

- Data starting at... PAGESIZE?

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the write until 
we can mark the bitmap as dirty first. This incurs a write penalty 
when we try to use the bitmap at first...

- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty 
information to the file and mark the file as clean once more and 
re-take the persistence lock.


(4) Synchronization Policy

- Sync after so many bits become dirty in the bitmap, either as an 
absolute threshold or a density percentage?

- Sync periodically on a fixed timer?
- Sync periodically opportunistically when I/O utilization becomes

Re: [Qemu-devel] runtime configurable semihosting

2014-11-18 Thread Peter Maydell

On 13 November 2014 09:44, Liviu Ionescu i...@livius.net wrote:
 Peter Maydell wrote:
 I took a quick look at the syntax of command options and monitor
 commands, and I would suggest the following:

 - extend the option -semihosting with an optional
 target=native|gdb|auto, default auto

 unfortunately the parser is not able to detect a missing optional, and
 will always consume the next option, so this syntax is not available.

Sorry, yes, you're right; I should have looked more carefully.
So we do need a new option; however I think it would be better
for the new option to be a general qemuopts option, like this:

 -semihosting-config target=[native|gdb|auto]

because that then gives us a place to put future semihosting
related config options, and also allows the config files used
by -readconfig/-writeconfig to change this setting. (We can
have -semihosting-config have an implied 'enable=true', so
you don't have to use the old -semihosting option if you're
specifying -semihosting-config.)

thanks
-- PMM

Re: [Qemu-devel] [PULL] migration: fix for CVE-2014-7840

2014-11-18 Thread Peter Maydell

On 18 November 2014 11:29, Amit Shah amit.s...@redhat.com wrote:
 The following changes since commit d6be29e3fb5659102ac0e48e295d177cb67e32c5:

   target-arm: handle address translations that start at level 3 (2014-11-17 
 19:30:28 +)

 are available in the git repository at:

   git://git.kernel.org/pub/scm/virt/qemu/amit/migration.git tags/for-2.2

 for you to fetch changes up to 0be839a2701369f669532ea5884c15bead1c6e08:

   migration: fix parameter validation on ram load (2014-11-18 16:49:44 +0530)

 
 Fix for CVE-2014-7840, avoiding arbitrary qemu memory overwrite for
 migration by Michael S. Tsirkin.

 

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH 00/13] linux-aio/virtio-scsi: support AioContext wide IO submission as batch

2014-11-18 Thread Paolo Bonzini



On 09/11/2014 08:42, Ming Lei wrote:
 This patch implements AioContext wide IO submission as batch, and
 the idea behind is very simple:
 
   - linux native aio(io_submit) supports to enqueue read/write requests
   to different files
 
   - in one AioContext, I/O requests from VM can be submitted to different
   backend in host, one typical example is multi-lun scsi 
 
 This patch changes 'struct qemu_laio_state' as per AioContext, and
 multiple 'bs' can be associted with one single instance of
 'struct qemu_laio_state', then AioContext wide IO submission as batch
 becomes easy to implement.
 
 One simple test in my laptop shows ~20% throughput improvement
 on randread from VM(using AioContext wide IO batch vs. not using io batch)
 with below config:
 
   -drive 
 id=drive_scsi1-0-0-0,if=none,format=raw,cache=none,aio=native,file=/dev/nullb2
  \
   -drive 
 id=drive_scsi1-0-0-1,if=none,format=raw,cache=none,aio=native,file=/dev/nullb3
  \
   -device 
 virtio-scsi-pci,num_queues=4,id=scsi1,addr=07,iothread=iothread0 \
   -device 
 scsi-disk,bus=scsi1.0,channel=0,scsi-id=1,lun=0,drive=drive_scsi1-0-0-0,id=scsi1-0-0-0
  \
   -device 
 scsi-disk,bus=scsi1.0,channel=0,scsi-id=1,lun=1,drive=drive_scsi1-0-0-1,id=scsi1-0-0-1
  \
 
 BTW, maybe more boost can be obtained since ~33K/sec write() system call
 can be observed when this test case is running, and it might be a recent
 regression(BH?).

Ming,

these patches are interesting.  I would like to compare them with the
opposite approach (and, I think, more similar to your old work) where
the qemu_laio_state API is moved entirely into AioContext, with lazy
allocation (reference-counted too, probably).

Most of the patches would be the same, but you would replace
aio_attach_aio_bs/aio_detach_aio_bs with something like
aio_native_get/aio_native_unref.  Ultimately block/{linux,win32}-aio.c
could be merged into block/aio-{posix,win32}.c, but you do not have to
do that now.

Could you try that?  This way we can see which API turns out to be nicer.

Thanks,

Paolo

 This patchset can be found on below tree too:
 
   git://kernel.ubuntu.com/ming/qemu.git aio-io-batch.2
 
 and these patches depend on linux-aio: fix batch submission patches
 in below link:
 
   http://marc.info/?l=qemu-develm=141528663106557w=2
 
 Any comments and suggestions are welcome.
 
  async.c |1 +
  block.c |   16 +++
  block/linux-aio.c   |  251 
 ++-
  block/raw-aio.h |6 +-
  block/raw-posix.c   |4 +-
  hw/scsi/virtio-scsi-dataplane.c |8 ++
  hw/scsi/virtio-scsi.c   |2 -
  include/block/aio.h |   27 +
  include/block/block.h   |3 +
  9 files changed, 259 insertions(+), 59 deletions(-)
 
 Thanks,
 Ming Lei

Re: [Qemu-devel] [PATCH v2 1/5] libqos: Change use of pointers to uint64_t in virtio

2014-11-18 Thread Stefan Hajnoczi

On Mon, Nov 17, 2014 at 04:25:50PM +0100, Andreas Färber wrote:
 Am 17.11.2014 um 16:19 schrieb Marc Marí:
  El Mon, 17 Nov 2014 15:16:21 +
  Stefan Hajnoczi stefa...@gmail.com escribió:
  On Sat, Nov 01, 2014 at 06:02:26PM +0100, Marc Marí wrote:
  Convert use of pointers in functions of virtio to uint64_t in order
  to make it platform-independent.
 
  Add casting from pointers (in PCI functions) to uint64_t and vice
  versa through uintptr_t.
 
  Signed-off-by: Marc Marí marc.mari.barc...@gmail.com
  ---
   tests/libqos/virtio-pci.c |   20 +++-
   tests/libqos/virtio.c |8 
   tests/libqos/virtio.h |   16 
   tests/virtio-blk-test.c   |   21 ++---
   4 files changed, 37 insertions(+), 28 deletions(-)
 
  This makes sense.  To fully abolish pointers the PCI code also needs
  to be converted.  Do plan plan to do that?
 
  Stefan
  
  Not planned, but if asked, I may do it.
 
 Pretty please. :) I was wondering the same thing when reading the patch.

That said, you do not need to convert PCI in this patch series.

If you do the PCI conversion in a separate patch series we can keep
merging code incrementally.  It's more satisfying for you that way and
less intimidating for code reviews to review short series.

Stefan


pgpK9RujbspDI.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH] Tracing: Fix simpletrace.py error on tcg enabled binary traces

2014-11-18 Thread Stefan Hajnoczi

On Mon, Nov 17, 2014 at 07:57:08PM -0600, Lluís Vilanova wrote:
 Stefan Hajnoczi writes:
 
  On Sun, Nov 02, 2014 at 10:37:59PM +0100, christoph.seif...@posteo.de wrote:
  From: Christoph Seifert christoph.seif...@posteo.de
  
  simpletrace.py does not recognize the tcg option while reading trace-events
  file. In result simpletrace does not work on binary traces and tcg enabled
  events. Moved transformation of tcg enabled events to _read_events() which 
  is
  used by simpletrace.
  
  Signed-off-by: Christoph Seifert christoph.seif...@posteo.de
  ---
  scripts/tracetool/__init__.py | 67 
  +--
  1 file changed, 33 insertions(+), 34 deletions(-)
 
  Looks good to me.
 
  Lluís: Any comments?
 
 I've looked at how it affects some patches I did not have time yet to send, 
 and
 everything looks good.

Fantastic, thanks for your time, Lluís!

Stefan


pgpVCTHO1OlEy.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 2/3] linux-aio: handling -EAGAIN for !s-io_q.plugged case

2014-11-18 Thread Paolo Bonzini



On 06/11/2014 16:10, Ming Lei wrote:
 +/* don't submit until next completion for -EAGAIN of non plug case */
 +if (unlikely(!s-io_q.plugged)) {
 +return 0;
 +}
 +

Is this an optimization or a fix for something?

 +/*
 + * Switch to queue mode until -EAGAIN is handled, we suppose
 + * there is always uncompleted I/O, so try to enqueue it first,
 + * and will be submitted again in following aio completion cb.
 + */
 +if (ret == -EAGAIN) {
 +goto enqueue;
 +} else if (ret  0) {
  goto out_free_aiocb;
  }

Better:

 if (!s-io_q.plugged  !s-io_q.idx) {
ret = io_submit(s-ctx, 1, iocbs);
if (ret = 0) {
return laiocb-common;
}
if (ret != -EAGAIN) {
goto out_free_aiocb;
}
}

/* code for queue mode.  */

Paolo

Re: [Qemu-devel] [PATCH v3 3/3] linux-aio: remove 'node' from 'struct qemu_laiocb'

2014-11-18 Thread Paolo Bonzini



On 06/11/2014 16:10, Ming Lei wrote:
 No one uses the 'node' field any more, so remove it
 from 'struct qemu_laiocb', and this can save 16byte
 for the struct on 64bit arch.
 
 Signed-off-by: Ming Lei ming@canonical.com
 ---
  block/linux-aio.c |1 -
  1 file changed, 1 deletion(-)
 
 diff --git a/block/linux-aio.c b/block/linux-aio.c
 index f5ca41d..b12da25 100644
 --- a/block/linux-aio.c
 +++ b/block/linux-aio.c
 @@ -35,7 +35,6 @@ struct qemu_laiocb {
  size_t nbytes;
  QEMUIOVector *qiov;
  bool is_read;
 -QLIST_ENTRY(qemu_laiocb) node;
  };
  
  /*
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Re: [Qemu-devel] [PATCH] Tracing: Fix simpletrace.py error on tcg enabled binary traces

2014-11-18 Thread Stefan Hajnoczi

On Sun, Nov 02, 2014 at 10:37:59PM +0100, christoph.seif...@posteo.de wrote:
 From: Christoph Seifert christoph.seif...@posteo.de
 
 simpletrace.py does not recognize the tcg option while reading trace-events  
 file. In result simpletrace does not work on binary traces and tcg enabled 
 events. Moved transformation of tcg enabled events to _read_events() which is 
 used by simpletrace.
 
 Signed-off-by: Christoph Seifert christoph.seif...@posteo.de
 ---
  scripts/tracetool/__init__.py | 67 
 +--
  1 file changed, 33 insertions(+), 34 deletions(-)

Thanks, applied to my tracing tree:
https://github.com/stefanha/qemu/commits/tracing

Stefan


pgpZe1fWea7WZ.pgp
Description: PGP signature

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Stefano Stabellini

Konrad,
I think we should have this fix in Xen 4.5. Should I go ahead and
backport it?

On Mon, 17 Nov 2014, Don Slutz wrote:
 The other callers to blk_set_enable_write_cache() in this file
 already check for s-blk == NULL.
 
 Signed-off-by: Don Slutz dsl...@verizon.com
 ---
 
 I think this is a bugfix that should be back ported to stable
 releases.
 
 I also think this should be done in xen's copy of QEMU for 4.5 with
 back port(s) to active stable releases.
 
 Note: In 2.1 and earlier the routine is
 bdrv_set_enable_write_cache(); variable is s-bs.
 
  hw/ide/core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/hw/ide/core.c b/hw/ide/core.c
 index 00e21cf..d4af5e2 100644
 --- a/hw/ide/core.c
 +++ b/hw/ide/core.c
 @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int 
 version_id)
  {
  IDEState *s = opaque;
  
 -if (s-identify_set) {
 +if (s-blk  s-identify_set) {
  blk_set_enable_write_cache(s-blk, !!(s-identify_data[85]  (1  
 5)));
  }
  return 0;
 -- 
 1.8.4

Re: [Qemu-devel] [PATCH v3 1/3] linux-aio: fix submit aio as a batch

2014-11-18 Thread Paolo Bonzini



 @@ -137,6 +145,12 @@ static void qemu_laio_completion_bh(void *opaque)
  }
  }
  
 +static void qemu_laio_start_retry(struct qemu_laio_state *s)
 +{
 +if (s-io_q.idx)
 +qemu_bh_schedule(s-io_q.retry);
 +}
 +
  static void qemu_laio_completion_cb(EventNotifier *e)
  {
  struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
 @@ -144,6 +158,7 @@ static void qemu_laio_completion_cb(EventNotifier *e)
  if (event_notifier_test_and_clear(s-e)) {
  qemu_bh_schedule(s-completion_bh);
  }
 +qemu_laio_start_retry(s);

I think you do not even need two bottom halves.  Just call ioq_submit
from completion_bh instead, after the call to io_getevents.

  }
  
  static void laio_cancel(BlockAIOCB *blockacb)
 @@ -163,6 +178,9 @@ static void laio_cancel(BlockAIOCB *blockacb)
  }
  
  laiocb-common.cb(laiocb-common.opaque, laiocb-ret);
 +
 +/* check if there are requests in io queue */
 +qemu_laio_start_retry(laiocb-ctx);
  }
  
  static const AIOCBInfo laio_aiocb_info = {
 @@ -177,45 +195,80 @@ static void ioq_init(LaioQueue *io_q)
  io_q-plugged = 0;
  }
  
 -static int ioq_submit(struct qemu_laio_state *s)
 +static void abort_queue(struct qemu_laio_state *s)
 +{
 +int i;
 +for (i = 0; i  s-io_q.idx; i++) {
 +struct qemu_laiocb *laiocb = container_of(s-io_q.iocbs[i],
 +  struct qemu_laiocb,
 +  iocb);
 +laiocb-ret = -EIO;
 +qemu_laio_process_completion(s, laiocb);
 +}
 +}
 +
 +static int ioq_submit(struct qemu_laio_state *s, bool enqueue)
  {
  int ret, i = 0;
  int len = s-io_q.idx;
 +int j = 0;
  
 -do {
 -ret = io_submit(s-ctx, len, s-io_q.iocbs);
 -} while (i++  3  ret == -EAGAIN);
 +if (!len) {
 +return 0;
 +}
  
 -/* empty io queue */
 -s-io_q.idx = 0;
 +ret = io_submit(s-ctx, len, s-io_q.iocbs);
 +if (ret == -EAGAIN) { /* retry in following completion cb */
 +return 0;
 +} else if (ret  0) {
 +if (enqueue) {
 +return ret;
 +}
  
 -if (ret  0) {
 -i = 0;
 -} else {
 -i = ret;
 +/* in non-queue path, all IOs have to be completed */
 +abort_queue(s);
 +ret = len;
 +} else if (ret == 0) {
 +goto out;

No need for goto; just move the for loop inside this conditional.  Or
better, just use memmove.  That is:

if (ret  0) {
if (ret == -EAGAIN) {
return 0;
}
if (enqueue) {
return ret;
}

abort_queue(s);
ret = len;
}

if (ret  0) {
memmove(...)
s-io_q.idx -= ret;
}
return ret;

 + * update io queue, for partial completion, retry will be
 + * started automatically in following completion cb.
 + */
 +s-io_q.idx -= ret;
 +
  return ret;
  }
  
 -static void ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
 +static void ioq_submit_retry(void *opaque)
 +{
 +struct qemu_laio_state *s = opaque;
 +ioq_submit(s, false);
 +}
 +
 +static int ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
  {
  unsigned int idx = s-io_q.idx;
  
 +if (unlikely(idx == s-io_q.size)) {
 +return -1;

return -EAGAIN?

Thanks,

Paolo

 +}
 +
  s-io_q.iocbs[idx++] = iocb;
  s-io_q.idx = idx;
  
 -/* submit immediately if queue is full */
 -if (idx == s-io_q.size) {
 -ioq_submit(s);
 +/* submit immediately if queue depth is above 2/3 */
 +if (idx  s-io_q.size * 2 / 3) {
 +return ioq_submit(s, true);
  }
 +
 +return 0;
  }
  
  void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
 @@ -237,7 +290,7 @@ int laio_io_unplug(BlockDriverState *bs, void *aio_ctx, 
 bool unplug)
  }
  
  if (s-io_q.idx  0) {
 -ret = ioq_submit(s);
 +ret = ioq_submit(s, false);
  }
  
  return ret;
 @@ -281,7 +334,9 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, void 
 *aio_ctx, int fd,
  goto out_free_aiocb;
  }
  } else {
 -ioq_enqueue(s, iocbs);
 +if (ioq_enqueue(s, iocbs)  0) {
 +goto out_free_aiocb;
 +}
  }
  return laiocb-common;
  
 @@ -296,12 +351,14 @@ void laio_detach_aio_context(void *s_, AioContext 
 *old_context)
  
  aio_set_event_notifier(old_context, s-e, NULL);
  qemu_bh_delete(s-completion_bh);
 +qemu_bh_delete(s-io_q.retry);
  }
  
  void laio_attach_aio_context(void *s_, AioContext *new_context)
  {
  struct qemu_laio_state *s = s_;
  
 +s-io_q.retry = aio_bh_new(new_context, ioq_submit_retry, s);
  s-completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
  aio_set_event_notifier(new_context, s-e, qemu_laio_completion_cb);
  }

Re: [Qemu-devel] hotremoving a disk qmp/hmp

2014-11-18 Thread Markus Armbruster

William Dauchy will...@gandi.net writes:

 Hello,

 When hotremoving a disk I'm using the QMP API with device_del command;

 Previous query-block command result:

 {   u'device': u'disk1',
 u'inserted': {   u'backing_file_depth': 0,
  u'bps': 0,
  u'bps_rd': 0,
  u'bps_wr': 0,
  u'detect_zeroes': u'off',
  u'drv': u'raw',
  u'encrypted': False,
  u'encryption_key_missing': False,
  u'file': u'/dev/sda',
  u'image': {   u'actual-size': 0,
u'dirty-flag': False,
u'filename': u'/dev/sda',
u'format': u'raw',
u'virtual-size': 3221225472},
  u'iops': 0,
  u'iops_rd': 0,
  u'iops_wr': 0,
  u'ro': False},
 u'io-status': u'ok',
 u'locked': True,
 u'removable': True,
 u'tray_open': False,
 u'type': u'unknown'}

This is block backend disk1.

 After a device_del command I have the same thing but 'locked': False

I presume you're deleting the device using backend disk1.

What kind of device is this?  PCI, perhaps?

 Then I can also do `eject device=disk1` which bring me to:

 {   u'device': u'disk1',
u'io-status': u'ok',
u'locked': False,
u'removable': True,
u'tray_open': False,
u'type': u'unknown'}

 But I did not found a way to completely remove the disk1 entry in order
 to be able to add it again.
 To complete the operation I need to use the old HMP API and use
 `drive_del` command.

 Did I miss something? or is it still something missing in QMP API?

Please show us a complete QMP conversation.

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread Vladimir Sementsov-Ogievskiy

Also, if we sync not the last level, bitmap.dirty_flag should be related 
to syncing level, not to the whole bitmap, to reduce sync overhead.
Or, if we implement difficult sync policy, there should be dirty flags 
for each bitmap level. Despite this, only one level is needed to be 
saved in the bitmap file.


PS: more ideas about file format - thanks to Denis V. Lunev 
d...@parallels.com
1) Shouldn't we consider a possibility of storing several bitmaps in one 
file? Or one bitmap = one file?

2) Implement header extensions like in qcow2.

Best regards,
Vladimir

On 18.11.2014 16:09, Vladimir Sementsov-Ogievskiy wrote:

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent bitmap, delay the write until 
we can mark the bitmap as dirty first. This incurs a write penalty 
when we try to use the bitmap at first...

- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty 
information to the file and mark the file as clean once more and 
re-take the persistence lock. 

Correct me if I'm wrong.

#Read bitmap:
read in blockdev_init, before any write to device, so no lock is needed.

#Set bits in bitmap:
if bitmap.dirty_flag:
   set bits
else:
   LOCK
   set bits
   set bitmap.dirty_flag
   set dirty_flag in bitmap file
   UNLOCK

#Sync:
if not bitmap.dirty_flag:
   skip sync
else:
   LOCK
   save one of bitmap levels (saving the last one is too long and not 
very good idea, because it is fast-updateing)

   unset dirty_flag in bitmap file
   unset bitmap.dirty_flag
   UNLOCK

#Last sync in bdrv_close:
Just save the last bitmap level and unset dirty_flag in bitmap file

Also.. I'm not quite sure about locking.. As I understand, co-routines 
in qemu are not running in parallel, is locking required? Or sync 
timer will not be co-routine based?


Best regards,
Vladimir

On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:



(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't 
hurt if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE? 

- The name of the bitmap and also the size of this name



(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's 
the primary benefit you're seeking? 
Hmm. It may be used for faster sync. Maybe, save some of bitmap 
levels on timer while vm is running and save the last level on shutdown?


CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:



On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi

I'd just like to start working on persistent dirty bitmap. My thoughts
about it are the following:
- qemu -drive file=file,dirty_bitmap=file
 so,  bitmap will be loaded with drive open and saved with 
drive close.

- save only meaningful (the last) level of the bitmap, restore all
levels on bitmap loading
- bool parameter persistent for bdrv_create_dirty_bitmap and
BdrvDirtyBitmap
- internal dirty_bitmaps, saved in qcow2 file

Best regards,
Vladimir


I am thinking:

(1) Command Lines

If you enable dirty bitmaps and give it a file that doesn't exist, 
it should error out on you.


If you enable dirty bitmaps and give it a file that's blank, it 
understands that it is to create a persistent bitmap file in this 
location and it should enable persistence.


If a bitmap file is given and it has valid magic, this should imply 
persistence.


I am hesitant to have it auto-create files that don't already exist 
in case the files become large in size and a misconfiguration leads 
to repeated creation of these files that get orphaned in random 
folders. Perhaps we can add a create=auto flag or similar to allow 
this behavior if wanted.


(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't 
hurt if we give ourselves a sector's worth to write metadata within.)

- Data starting at... PAGESIZE?

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to 
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock 
against any of the bitmap functions to prevent them from marking any 
bits dirty.
- On first write to a clean persistent

Re: [Qemu-devel] [PULL 0/6] Block patches for 2.2.0-rc2

2014-11-18 Thread Peter Maydell

On 18 November 2014 11:35, Kevin Wolf kw...@redhat.com wrote:
 The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2014-11-17 17:22:03 +)

 are available in the git repository at:


   git://repo.or.cz/qemu/kevin.git tags/for-upstream

 for you to fetch changes up to 098ffa6674a82ceac0e3ccb3a8a5bf6ca44adcd5:

   block/raw-posix: Catch fsync() errors (2014-11-18 12:09:00 +0100)

 
 Block patches for 2.2.0-rc2

 

Applied, thanks.

-- PMM

Re: [Qemu-devel] exec.c:invalidate_and_set_dirty() only checks whether first page in its range is dirty...

2014-11-18 Thread Stefan Hajnoczi

On Sun, Nov 16, 2014 at 06:11:48PM +, Peter Maydell wrote:
 I'm trying to track down a bug in ARM TCG where we:
  * boot a guest
  * run 'shutdown -r now' to trigger a reboot
  * on reboot, crash when running userspace because the contents
of physical RAM have changed but the translated code from
before the shutdown was never invalidated
 
 This is with a virtio-mmio block device as the disk.
 
 Debugging indicates that when the post-reboot guest reloads
 binaries from disk into ram we fail to invalidate the cached
 translations. For the specific case I looked at, we have a
 translation of code at ramaddr_t 0x806e000. The disk load
 pulls 0x16000 bytes of data off disk to address 0x806a000.
 Virtio correctly calls address_space_unmap(), which is supposed
 to be what marks the ram range as dirty. It in turn calls
 invalidate_and_set_clean(). However invalidate_and_set_clean()
 just does this:
 
 if (cpu_physical_memory_is_clean(addr)) {
 /* invalidate code */
 tb_invalidate_phys_page_range(addr, addr + length, 0);
 /* set dirty bit */
 cpu_physical_memory_set_dirty_range_nocode(addr, length);
 }
 
 So if the first page in the range (here 0x806a000) happens
 to be dirty then we won't do anything, even if later pages
 in the range do need to be invalidated. Also, we'll call
 tb_invalidate_phys_page_range() with a start/end which may
 be in different physical pages, which is forbidden by that
 function's API.
 
 I guess invalidate_and_set_clean() really needs to be
 fixed to loop through each page in the range; does anybody
 know how this is supposed to work (or why nobody's noticed
 this bug before :-)) ?

Not directly but I don't like this code because it's not atomic.  I'll
send patches soon for atomic test-and-set and test-and-clear.  Hopefully
it won't impact performance too much.

What you've discovered seems like a plain old bug.  It needs a loop.

Stefan


pgpG50EJ2bRrO.pgp
Description: PGP signature

Re: [Qemu-devel] [PATCH 0/5] pc: make ROMs resizeable

2014-11-18 Thread Michael S. Tsirkin

On Tue, Nov 18, 2014 at 03:47:16PM +0100, Markus Armbruster wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  At the moment we migrate ROMs which reside in fw cfg, which allows
  changing ROM code at will, and supports migrating largish blocks early,
  with good performance.
  However, we are running into a problem: changing size breaks
  migration every time.
 
 Pardon my ignorance... changing ROM in migration?  I would expect
 migration to be as transparent for ROM as it is for RAM.  What am I
 missing?
 
 [...]

Nothing really.

We migrate RAM size too - but if there's a mismatch, migration fails.

RAM size is user configurable.

ROM is used internally so we have to figure out the size,
and it turned out to be a hard problem, so we end up maintaining
logic for ROM size like we did in 1.7 like we did in 2.0 etc.

I don't want to add any more: let's just accept whatever is migrated,
and stick to it.



-- 
MST

[Qemu-devel] [PULL 1/2] Tracing docs fix configure option and description

2014-11-18 Thread Stefan Hajnoczi

From: Dr. David Alan Gilbert dgilb...@redhat.com

Fix the example trace configure option.
Update the text to say that multiple backends are allowed and what
happens when multiple backends are enabled.

Signed-off-by: Dr. David Alan Gilbert dgilb...@redhat.com
Message-id: 1412691161-31785-1-git-send-email-dgilb...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 docs/tracing.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index 7d38926..7117c5e 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -139,12 +139,12 @@ events are not tightly coupled to a specific trace 
backend, such as LTTng or
 SystemTap.  Support for trace backends can be added by extending the 
tracetool
 script.
 
-The trace backend is chosen at configure time and only one trace backend can
-be built into the binary:
+The trace backends are chosen at configure time:
 
-./configure --trace-backends=simple
+./configure --enable-trace-backends=simple
 
 For a list of supported trace backends, try ./configure --help or see below.
+If multiple backends are enabled, the trace is sent to them all.
 
 The following subsections describe the supported trace backends.
 
-- 
2.1.0

[Qemu-devel] [PULL 0/2] Tracing patches

2014-11-18 Thread Stefan Hajnoczi

The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-11-17 17:22:03 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/tracing-pull-request

for you to fetch changes up to 776ec96f790e2c943c13313d8ecab4713b47ab65:

  Tracing: Fix simpletrace.py error on tcg enabled binary traces (2014-11-18 
14:05:58 +)





Christoph Seifert (1):
  Tracing: Fix simpletrace.py error on tcg enabled binary traces

Dr. David Alan Gilbert (1):
  Tracing docs fix configure option and description

 docs/tracing.txt  |  6 ++--
 scripts/tracetool/__init__.py | 67 +--
 2 files changed, 36 insertions(+), 37 deletions(-)

-- 
2.1.0

[Qemu-devel] [PULL 2/2] Tracing: Fix simpletrace.py error on tcg enabled binary traces

2014-11-18 Thread Stefan Hajnoczi

From: Christoph Seifert christoph.seif...@posteo.de

simpletrace.py does not recognize the tcg option while reading trace-events  
file. In result simpletrace does not work on binary traces and tcg enabled 
events. Moved transformation of tcg enabled events to _read_events() which is 
used by simpletrace.

Signed-off-by: Christoph Seifert christoph.seif...@posteo.de
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 scripts/tracetool/__init__.py | 67 +--
 1 file changed, 33 insertions(+), 34 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 3d5743f..181675f 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -253,14 +253,44 @@ class Event(object):
 
 
 def _read_events(fobj):
-res = []
+events = []
 for line in fobj:
 if not line.strip():
 continue
 if line.lstrip().startswith('#'):
 continue
-res.append(Event.build(line))
-return res
+
+event = Event.build(line)
+
+# transform TCG-enabled events
+if tcg not in event.properties:
+events.append(event)
+else:
+event_trans = event.copy()
+event_trans.name += _trans
+event_trans.properties += [tcg-trans]
+event_trans.fmt = event.fmt[0]
+args_trans = []
+for atrans, aorig in zip(
+event_trans.transform(tracetool.transform.TCG_2_HOST).args,
+event.args):
+if atrans == aorig:
+args_trans.append(atrans)
+event_trans.args = Arguments(args_trans)
+event_trans = event_trans.copy()
+
+event_exec = event.copy()
+event_exec.name += _exec
+event_exec.properties += [tcg-exec]
+event_exec.fmt = event.fmt[1]
+event_exec = event_exec.transform(tracetool.transform.TCG_2_HOST)
+
+new_event = [event_trans, event_exec]
+event.event_trans, event.event_exec = new_event
+
+events.extend(new_event)
+
+return events
 
 
 class TracetoolError (Exception):
@@ -333,35 +363,4 @@ def generate(fevents, format, backends,
 
 events = _read_events(fevents)
 
-# transform TCG-enabled events
-new_events = []
-for event in events:
-if tcg not in event.properties:
-new_events.append(event)
-else:
-event_trans = event.copy()
-event_trans.name += _trans
-event_trans.properties += [tcg-trans]
-event_trans.fmt = event.fmt[0]
-args_trans = []
-for atrans, aorig in zip(
-event_trans.transform(tracetool.transform.TCG_2_HOST).args,
-event.args):
-if atrans == aorig:
-args_trans.append(atrans)
-event_trans.args = Arguments(args_trans)
-event_trans = event_trans.copy()
-
-event_exec = event.copy()
-event_exec.name += _exec
-event_exec.properties += [tcg-exec]
-event_exec.fmt = event.fmt[1]
-event_exec = event_exec.transform(tracetool.transform.TCG_2_HOST)
-
-new_event = [event_trans, event_exec]
-event.event_trans, event.event_exec = new_event
-
-new_events.extend(new_event)
-events = new_events
-
 tracetool.format.generate(events, format, backend)
-- 
2.1.0

Re: [Qemu-devel] hotremoving a disk qmp/hmp

2014-11-18 Thread William Dauchy

On Nov18 15:22, Markus Armbruster wrote:
 This is block backend disk1.

yes indeed.

 I presume you're deleting the device using backend disk1.

yes

 What kind of device is this?  PCI, perhaps?
 Please show us a complete QMP conversation.

Here it is:

live vm with one disk:

(QEMU) query-block
{   u'return': [   {   u'device': u'disk0',
   u'inserted': {   u'backing_file_depth': 0,
u'bps': 0,
u'bps_rd': 0,
u'bps_wr': 0,
u'detect_zeroes': u'off',
u'drv': u'raw',
u'encrypted': False,
u'encryption_key_missing': False,
u'file': u'/dev/sda',
u'image': {   u'actual-size': 0,
  u'dirty-flag': False,
  u'filename': u'/dev/sda',
  u'format': u'raw',
  u'virtual-size': 
3221225472},
u'iops': 0,
u'iops_rd': 0,
u'iops_wr': 0,
u'ro': False},
   u'io-status': u'ok',
   u'locked': True,
   u'removable': True,
   u'tray_open': False,
   u'type': u'unknown'}]}

hotplugging one disk:

(QEMU) blockdev-add
with
{'options' : {
'driver': 'raw',
'id': 'disk1',
'file': {
'driver': 'file',
'filename': /dev/sdb,
}
}}

(QEMU) device_add
with:
{
'driver': 'scsi-hd',
'id': 'disk1',
'drive': 'disk1',
'scsi-id': 1,
'removable': 'on',
'dpofua': 'off'
}

(QEMU) query-block
{   u'return': [   {   u'device': u'disk0',
   u'inserted': {   u'backing_file_depth': 0,
u'bps': 0,
u'bps_rd': 0,
u'bps_wr': 0,
u'detect_zeroes': u'off',
u'drv': u'raw',
u'encrypted': False,
u'encryption_key_missing': False,
u'file': u'/dev/sda',
u'image': {   u'actual-size': 0,
  u'dirty-flag': False,
  u'filename': u'/dev/sda',
  u'format': u'raw',
  u'virtual-size': 
3221225472},
u'iops': 0,
u'iops_rd': 0,
u'iops_wr': 0,
u'ro': False},
   u'io-status': u'ok',
   u'locked': True,
   u'removable': True,
   u'tray_open': False,
   u'type': u'unknown'},,
   {   u'device': u'disk1',
   u'inserted': {   u'backing_file_depth': 0,
u'bps': 0,
u'bps_rd': 0,
u'bps_wr': 0,
u'detect_zeroes': u'off',
u'drv': u'raw',
u'encrypted': False,
u'encryption_key_missing': False,
u'file': u'/dev/sdb',
u'image': {   u'actual-size': 0,
  u'dirty-flag': False,
  u'filename': u'/dev/sdb',
  u'format': u'raw',
  u'virtual-size': 
3221225472},
u'iops': 0,
u'iops_rd': 0,
u'iops_wr': 0,
u'ro': False},
   u'io-status': u'ok',
   u'locked': True,
   u'removable': True,
   u'tray_open': False,
   u'type': u'unknown'}]}

hotremoving disk1:

(QEMU) device_del
with:
{'id': 'disk1'}

(QEMU) query-block
{   u'return': [   {

[Qemu-devel] [PULL] Net patches

2014-11-18 Thread Stefan Hajnoczi

The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2014-11-17 17:22:03 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tags/net-pull-request

for you to fetch changes up to ed6273e26fdfb94a282dbbf1234a75422c6b4c4b:

  net: The third parameter of getsockname should be initialized (2014-11-18 
15:04:35 +)





zhanghailiang (1):
  net: The third parameter of getsockname should be initialized

 net/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.1.0

Re: [Qemu-devel] [PATCH 0/5] pc: make ROMs resizeable

2014-11-18 Thread Markus Armbruster

Michael S. Tsirkin m...@redhat.com writes:

 At the moment we migrate ROMs which reside in fw cfg, which allows
 changing ROM code at will, and supports migrating largish blocks early,
 with good performance.
 However, we are running into a problem: changing size breaks
 migration every time.

Pardon my ignorance... changing ROM in migration?  I would expect
migration to be as transparent for ROM as it is for RAM.  What am I
missing?

[...]

[Qemu-devel] [PULL] net: The third parameter of getsockname should be initialized

2014-11-18 Thread Stefan Hajnoczi

From: zhanghailiang zhang.zhanghaili...@huawei.com

Signed-off-by: zhanghailiang zhang.zhanghaili...@huawei.com
Reviewed-by: Markus Armbruster arm...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
---
 net/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index fb21e20..ca4b8ba 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -352,7 +352,7 @@ static NetSocketState 
*net_socket_fd_init_dgram(NetClientState *peer,
 {
 struct sockaddr_in saddr;
 int newfd;
-socklen_t saddr_len;
+socklen_t saddr_len = sizeof(saddr);
 NetClientState *nc;
 NetSocketState *s;
 
-- 
2.1.0

Re: [Qemu-devel] [PATCH v6 00/10] block: Incremental backup series

2014-11-18 Thread John Snow




On 11/18/2014 08:09 AM, Vladimir Sementsov-Ogievskiy wrote:

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock
against any of the bitmap functions to prevent them from marking any
bits dirty.
- On first write to a clean persistent bitmap, delay the write until
we can mark the bitmap as dirty first. This incurs a write penalty
when we try to use the bitmap at first...
- Unlock the bitmap functions and allow them to mark blocks as needed.
- At some point, based on a sync policy, re-commit the dirty
information to the file and mark the file as clean once more and
re-take the persistence lock.

Correct me if I'm wrong.

#Read bitmap:
read in blockdev_init, before any write to device, so no lock is needed.

#Set bits in bitmap:
if bitmap.dirty_flag:
set bits
else:
LOCK
set bits
set bitmap.dirty_flag
set dirty_flag in bitmap file
UNLOCK

#Sync:
if not bitmap.dirty_flag:
skip sync
else:
LOCK
save one of bitmap levels (saving the last one is too long and not
very good idea, because it is fast-updateing)
unset dirty_flag in bitmap file
unset bitmap.dirty_flag
UNLOCK

#Last sync in bdrv_close:
Just save the last bitmap level and unset dirty_flag in bitmap file

Also.. I'm not quite sure about locking.. As I understand, co-routines
in qemu are not running in parallel, is locking required? Or sync timer
will not be co-routine based?

Best regards,
Vladimir


Might be being too informal. I just meant a lock or barrier to prevent 
actual IO throughput until we can confirm the dirty flag has been 
adjusted to indicate that the persistent bitmap is now officially out of 
date. Nothing fancy.


Wasn't trying to imply that we needed threading protection, just 
locking the IO until we can configure the bitmap as we need it to be.



On 18.11.2014 13:54, Vladimir Sementsov-Ogievskiy wrote:



(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt
if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE?

- The name of the bitmap and also the size of this name



(5) Partial Persistence

We did not discuss only saving higher levels of the bitmap. What's
the primary benefit you're seeking?

Hmm. It may be used for faster sync. Maybe, save some of bitmap levels
on timer while vm is running and save the last level on shutdown?

CC qemu-devel - ok.

Best regards,
Vladimir

On 18.11.2014 02:46, John Snow wrote:



On 11/13/2014 08:54 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi

I'd just like to start working on persistent dirty bitmap. My thoughts
about it are the following:
- qemu -drive file=file,dirty_bitmap=file
 so,  bitmap will be loaded with drive open and saved with drive
close.
- save only meaningful (the last) level of the bitmap, restore all
levels on bitmap loading
- bool parameter persistent for bdrv_create_dirty_bitmap and
BdrvDirtyBitmap
- internal dirty_bitmaps, saved in qcow2 file

Best regards,
Vladimir


I am thinking:

(1) Command Lines

If you enable dirty bitmaps and give it a file that doesn't exist, it
should error out on you.

If you enable dirty bitmaps and give it a file that's blank, it
understands that it is to create a persistent bitmap file in this
location and it should enable persistence.

If a bitmap file is given and it has valid magic, this should imply
persistence.

I am hesitant to have it auto-create files that don't already exist
in case the files become large in size and a misconfiguration leads
to repeated creation of these files that get orphaned in random
folders. Perhaps we can add a create=auto flag or similar to allow
this behavior if wanted.

(2) File Format

Some standard file magic, which includes:

- Some magic byte(s)
- Dirty flag. Needed to tell if we can trust this data or not.
- The size of the bitmap
- The granularity of the bitmap
- The offset to the first sector of bitmap data (Maybe? It can't hurt
if we give ourselves a sector's worth to write metadata within.)
- Data starting at... PAGESIZE?

(3) Data Integrity

The dirty flag could work something like:

- If, on first open, the file has the dirty flag set, we need to
discard the bitmap data because we can no longer trust it.
- If the bitmap file is clean, proceed as normal, but take a lock
against any of the bitmap functions to prevent them from marking any
bits dirty.
- On first write to a clean persistent bitmap, delay the write until
we can mark the bitmap as dirty first. This incurs a write penalty
when we try to use the bitmap at first...
- Unlock the bitmap functions and allow them to mark blocks as needed.
- At

[Qemu-devel] Fwd: [PATCH v2 1/6] geometry: add bdrv functions for geometry and blocksize

2014-11-18 Thread Ekaterina Tumanova



copying to mail list

 Forwarded Message 
Subject: [PATCH v2 1/6] geometry: add bdrv functions for geometry and 
blocksize

Date: Tue, 18 Nov 2014 17:09:56 +0100
From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 
tuman...@linux.vnet.ibm.com


Add driver functions for geometry and blocksize detection

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
---
 block.c   | 26 ++
 include/block/block.h | 20 
 include/block/block_int.h |  3 +++
 3 files changed, 49 insertions(+)

diff --git a/block.c b/block.c
index a612594..5df35cf 100644
--- a/block.c
+++ b/block.c
@@ -548,6 +548,32 @@ void bdrv_refresh_limits(BlockDriverState *bs, 
Error **errp)

 }
 }

+struct ProbeBlockSize bdrv_probe_blocksizes(BlockDriverState *bs)
+{
+BlockDriver *drv = bs-drv;
+struct ProbeBlockSize err_geo = { .rc = -1 };
+
+assert(drv != NULL);
+if (drv-bdrv_probe_blocksizes) {
+return drv-bdrv_probe_blocksizes(bs);
+}
+
+return err_geo;
+}
+
+struct ProbeGeometry bdrv_probe_geometry(BlockDriverState *bs)
+{
+BlockDriver *drv = bs-drv;
+struct ProbeGeometry err_geo = { .rc = -1 };
+
+assert(drv != NULL);
+if (drv-bdrv_probe_geometry) {
+return drv-bdrv_probe_geometry(bs);
+}
+
+return err_geo;
+}
+
 /*
  * Create a uniquely-named empty temporary file.
  * Return 0 upon success, otherwise a negative errno value.
diff --git a/include/block/block.h b/include/block/block.h
index 5450610..3287dbc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -60,6 +60,24 @@ typedef enum {
 BDRV_REQ_MAY_UNMAP= 0x4,
 } BdrvRequestFlags;

+struct ProbeBlockSize {
+int rc;
+struct BlockSize {
+uint16_t phys;
+uint16_t log;
+} size;
+};
+
+struct ProbeGeometry {
+int rc;
+struct HDGeometry {
+uint32_t heads;
+uint32_t sectors;
+uint32_t cylinders;
+uint32_t start;
+} geo;
+};
+
 #define BDRV_O_RDWR0x0002
 #define BDRV_O_SNAPSHOT0x0008 /* open the file read only and save 
writes in a snapshot */

 #define BDRV_O_TEMPORARY   0x0010 /* delete the file after use */
@@ -538,6 +556,8 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs);
  * the old #AioContext is not executing.
  */
 void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context);
+struct ProbeBlockSize bdrv_probe_blocksizes(BlockDriverState *bs);
+struct ProbeGeometry bdrv_probe_geometry(BlockDriverState *bs);

 void bdrv_io_plug(BlockDriverState *bs);
 void bdrv_io_unplug(BlockDriverState *bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a1c17b9..830e564 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -271,6 +271,9 @@ struct BlockDriver {
 void (*bdrv_io_unplug)(BlockDriverState *bs);
 void (*bdrv_flush_io_queue)(BlockDriverState *bs);

+struct ProbeBlockSize (*bdrv_probe_blocksizes)(BlockDriverState *bs);
+struct ProbeGeometry (*bdrv_probe_geometry)(BlockDriverState *bs);
+
 QLIST_ENTRY(BlockDriver) list;
 };

--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 2/6] geometry: Detect blocksize via ioctls in separate static functions

2014-11-18 Thread Ekaterina Tumanova


copying to mail list


 Forwarded Message 
Subject: [PATCH v2 2/6] geometry: Detect blocksize via ioctls in 
separate static functions

Date: Tue, 18 Nov 2014 17:09:57 +0100
From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 
tuman...@linux.vnet.ibm.com


Move the IOCTL calls that detect logical blocksize from raw_probe_alignment
into separate function (probe_logical_blocksize).
Introduce function which detect physical blocksize via IOCTL
(probe_physical_blocksize).
Both functions will be used in the next patch.

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
---
 block/raw-posix.c | 58 
+--

 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..45f1d79 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -223,50 +223,70 @@ static int raw_normalize_devicepath(const char 
**filename)

 }
 #endif

-static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
+static unsigned int probe_logical_blocksize(BlockDriverState *bs, int fd)
 {
-BDRVRawState *s = bs-opaque;
-char *buf;
-unsigned int sector_size;
-
-/* For /dev/sg devices the alignment is not really used.
-   With buffered I/O, we don't have any restrictions. */
-if (bs-sg || !s-needs_alignment) {
-bs-request_alignment = 1;
-s-buf_align = 1;
-return;
-}
+unsigned int sector_size = 0;

 /* Try a few ioctls to get the right size */
-bs-request_alignment = 0;
-s-buf_align = 0;
-
 #ifdef BLKSSZGET
 if (ioctl(fd, BLKSSZGET, sector_size) = 0) {
-bs-request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef DKIOCGETBLOCKSIZE
 if (ioctl(fd, DKIOCGETBLOCKSIZE, sector_size) = 0) {
-bs-request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef DIOCGSECTORSIZE
 if (ioctl(fd, DIOCGSECTORSIZE, sector_size) = 0) {
-bs-request_alignment = sector_size;
+return sector_size;
 }
 #endif
 #ifdef CONFIG_XFS
 if (s-is_xfs) {
 struct dioattr da;
 if (xfsctl(NULL, fd, XFS_IOC_DIOINFO, da) = 0) {
-bs-request_alignment = da.d_miniosz;
+sector_size = da.d_miniosz;
 /* The kernel returns wrong information for d_mem */
 /* s-buf_align = da.d_mem; */
+return sector_size;
 }
 }
 #endif

+return 0;
+}
+
+static unsigned int probe_physical_blocksize(BlockDriverState *bs, int fd)
+{
+unsigned int blk_size = 0;
+#ifdef BLKPBSZGET
+if (ioctl(fd, BLKPBSZGET, blk_size) = 0) {
+return blk_size;
+}
+#endif
+
+return 0;
+}
+
+static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
+{
+BDRVRawState *s = bs-opaque;
+char *buf;
+
+/* For /dev/sg devices the alignment is not really used.
+   With buffered I/O, we don't have any restrictions. */
+if (bs-sg || !s-needs_alignment) {
+bs-request_alignment = 1;
+s-buf_align = 1;
+return;
+}
+
+s-buf_align = 0;
+/* Let's try to use the logical blocksize for the alignment. */
+bs-request_alignment = probe_logical_blocksize(bs, fd);
+
 /* If we could not get the sizes so far, we can only guess them */
 if (!s-buf_align) {
 size_t align;
--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 3/6] geometry: Add driver methods to probe blocksizes and geometry

2014-11-18 Thread Ekaterina Tumanova



copying to mail list

 Forwarded Message 
Subject: [PATCH v2 3/6] geometry: Add driver methods to probe blocksizes 
and geometry

Date: Tue, 18 Nov 2014 17:09:58 +0100
From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 
tuman...@linux.vnet.ibm.com


This patch introduces driver methods of defining disk blocksizes
(physical and logical) and hard drive geometry.
The method is only implemented for host_device. For raw devices
driver calls child's method.
The detection will only work for DASD devices. In order to check that
a local CheckForDASD function was introduced, which calls BIODASDINFO2 ioctl
and returns 0 only if it succeeds.

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
---
 block/raw-posix.c | 65 
+++

 block/raw_bsd.c   | 12 ++
 2 files changed, 77 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 45f1d79..274ceda 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -56,6 +56,7 @@
 #include linux/cdrom.h
 #include linux/fd.h
 #include linux/fs.h
+#include linux/hdreg.h
 #ifndef FS_NOCOW_FL
 #define FS_NOCOW_FL 0x0080 /* Do not cow file */
 #endif
@@ -93,6 +94,10 @@
 #include xfs/xfs.h
 #endif

+#ifdef __s390__
+#include asm/dasd.h
+#endif
+
 //#define DEBUG_FLOPPY

 //#define DEBUG_BLOCK
@@ -678,6 +683,64 @@ static void raw_refresh_limits(BlockDriverState 
*bs, Error **errp)

 bs-bl.opt_mem_alignment = s-buf_align;
 }

+static int CheckForDASD(int fd)
+{
+#ifdef BIODASDINFO2
+struct dasd_information2_t info = {0};
+
+return ioctl(fd, BIODASDINFO2, info);
+#endif
+return -1;
+}
+
+static struct ProbeBlockSize hdev_probe_blocksizes(BlockDriverState *bs)
+{
+BDRVRawState *s = bs-opaque;
+struct ProbeBlockSize block_sizes = {0};
+
+block_sizes.rc = CheckForDASD(s-fd);
+/* If DASD, get blocksizes */
+if (block_sizes.rc == 0) {
+block_sizes.size.log = probe_logical_blocksize(bs, s-fd);
+block_sizes.size.phys = probe_physical_blocksize(bs, s-fd);
+}
+
+return block_sizes;
+}
+
+static struct ProbeGeometry hdev_probe_geometry(BlockDriverState *bs)
+{
+BDRVRawState *s = bs-opaque;
+struct ProbeGeometry geometry = {0};
+struct hd_geometry ioctl_geo = {0};
+
+geometry.rc = CheckForDASD(s-fd);
+if (geometry.rc != 0) {
+goto done;
+}
+/* If DASD, get it's geometry */
+geometry.rc = ioctl(s-fd, HDIO_GETGEO, ioctl_geo);
+/* Do not return a geometry for partition */
+if (ioctl_geo.start != 0) {
+geometry.rc = -1;
+goto done;
+}
+/* HDIO_GETGEO may return success even though geo contains zeros
+   (e.g. certain multipath setups) */
+if (!ioctl_geo.heads || !ioctl_geo.sectors || !ioctl_geo.cylinders) {
+geometry.rc = -1;
+goto done;
+}
+if (geometry.rc == 0) {
+geometry.geo.heads = (uint32_t)ioctl_geo.heads;
+geometry.geo.sectors = (uint32_t)ioctl_geo.sectors;
+geometry.geo.cylinders = (uint32_t)ioctl_geo.cylinders;
+geometry.geo.start = (uint32_t)ioctl_geo.start;
+}
+done:
+   return geometry;
+}
+
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
 {
 int ret;
@@ -2128,6 +2191,8 @@ static BlockDriver bdrv_host_device = {
 .bdrv_get_info = raw_get_info,
 .bdrv_get_allocated_file_size
 = raw_get_allocated_file_size,
+.bdrv_probe_blocksizes = hdev_probe_blocksizes,
+.bdrv_probe_geometry = hdev_probe_geometry,

 .bdrv_detach_aio_context = raw_detach_aio_context,
 .bdrv_attach_aio_context = raw_attach_aio_context,
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 401b967..d164eba 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -173,6 +173,16 @@ static int raw_probe(const uint8_t *buf, int 
buf_size, const char *filename)

 return 1;
 }

+static struct ProbeBlockSize raw_probe_blocksizes(BlockDriverState *bs)
+{
+return bdrv_probe_blocksizes(bs-file);
+}
+
+static struct ProbeGeometry raw_probe_geometry(BlockDriverState *bs)
+{
+return bdrv_probe_geometry(bs-file);
+}
+
 static BlockDriver bdrv_raw = {
 .format_name  = raw,
 .bdrv_probe   = raw_probe,
@@ -190,6 +200,8 @@ static BlockDriver bdrv_raw = {
 .has_variable_length  = true,
 .bdrv_get_info= raw_get_info,
 .bdrv_refresh_limits  = raw_refresh_limits,
+.bdrv_probe_blocksizes = raw_probe_blocksizes,
+.bdrv_probe_geometry  = raw_probe_geometry,
 .bdrv_is_inserted = raw_is_inserted,
 .bdrv_media_changed   = raw_media_changed,
 .bdrv_eject   = raw_eject,
--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 5/6] geometry: Call backend function to detect geometry and blocksize

2014-11-18 Thread Ekaterina Tumanova


copying to mail list


 Forwarded Message 
Subject: [PATCH v2 5/6] geometry: Call backend function to detect 
geometry and blocksize

Date: Tue, 18 Nov 2014 17:10:00 +0100
From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 
tuman...@linux.vnet.ibm.com


hd_geometry_guess function autodetects the drive geometry. This patch
adds a block backend call, that probes the backing device geometry.
If the inner driver method is implemented and succeeds (currently only 
DASDs),

the blkconf_geometry will pass-through the backing device geometry.

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
---
 hw/block/block.c | 11 +++
 hw/block/hd-geometry.c   |  9 +
 hw/block/virtio-blk.c|  1 +
 include/hw/block/block.h |  1 +
 4 files changed, 22 insertions(+)

diff --git a/hw/block/block.c b/hw/block/block.c
index a625773..f1d29bc 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -25,6 +25,17 @@ void blkconf_serial(BlockConf *conf, char **serial)
 }
 }

+void blkconf_blocksizes(BlockConf *conf)
+{
+BlockBackend *blk = conf-blk;
+struct ProbeBlockSize blocksize = blk_probe_blocksizes(blk);
+
+if (blocksize.rc == 0) {
+conf-physical_block_size = blocksize.size.phys;
+conf-logical_block_size = blocksize.size.log;
+}
+}
+
 void blkconf_geometry(BlockConf *conf, int *ptrans,
   unsigned cyls_max, unsigned heads_max, unsigned 
secs_max,

   Error **errp)
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 6fcf74d..4972114 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -121,6 +121,14 @@ void hd_geometry_guess(BlockBackend *blk,
int *ptrans)
 {
 int cylinders, heads, secs, translation;
+struct ProbeGeometry geometry = blk_probe_geometry(blk);
+
+if (geometry.rc == 0) {
+*pcyls = geometry.geo.cylinders;
+*psecs = geometry.geo.sectors;
+*pheads = geometry.geo.heads;
+goto done;
+}

 if (guess_disk_lchs(blk, cylinders, heads, secs)  0) {
 /* no LCHS guess: use a standard physical disk geometry  */
@@ -143,6 +151,7 @@ void hd_geometry_guess(BlockBackend *blk,
the logical geometry */
 translation = BIOS_ATA_TRANSLATION_NONE;
 }
+done:
 if (ptrans) {
 *ptrans = translation;
 }
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index b19b102..6f01565 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -745,6 +745,7 @@ static void virtio_blk_device_realize(DeviceState 
*dev, Error **errp)

 error_propagate(errp, err);
 return;
 }
+blkconf_blocksizes(conf-conf);

 virtio_init(vdev, virtio-blk, VIRTIO_ID_BLOCK,
 sizeof(struct virtio_blk_config));
diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index 0d0ce9a..856bf75 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -63,6 +63,7 @@ void blkconf_serial(BlockConf *conf, char **serial);
 void blkconf_geometry(BlockConf *conf, int *trans,
   unsigned cyls_max, unsigned heads_max, unsigned 
secs_max,

   Error **errp);
+void blkconf_blocksizes(BlockConf *conf);

 /* Hard disk geometry */

--
1.8.5.5

Re: [Qemu-devel] [PULL 0/2] Tracing patches

2014-11-18 Thread Peter Maydell

On 18 November 2014 15:04, Stefan Hajnoczi stefa...@redhat.com wrote:
 The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2014-11-17 17:22:03 +)

 are available in the git repository at:

   git://github.com/stefanha/qemu.git tags/tracing-pull-request

 for you to fetch changes up to 776ec96f790e2c943c13313d8ecab4713b47ab65:

   Tracing: Fix simpletrace.py error on tcg enabled binary traces (2014-11-18 
 14:05:58 +)

 

 

Applied, thanks.

-- PMM

[Qemu-devel] Fwd: [PATCH v2 4/6] geometry: Add block-backend wrappers for geometry probing

2014-11-18 Thread Ekaterina Tumanova


copying to mail list


 Forwarded Message 
Subject: [PATCH v2 4/6] geometry: Add block-backend wrappers for 
geometry probing

Date: Tue, 18 Nov 2014 17:09:59 +0100
From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 
tuman...@linux.vnet.ibm.com


Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
---
 block/block-backend.c  | 10 ++
 include/sysemu/block-backend.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index d0692b1..6717301 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -629,3 +629,13 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, 
BlockBackend *blk,

 {
 return qemu_aio_get(aiocb_info, blk_bs(blk), cb, opaque);
 }
+
+struct ProbeBlockSize blk_probe_blocksizes(BlockBackend *blk)
+{
+return bdrv_probe_blocksizes(blk-bs);
+}
+
+struct ProbeGeometry blk_probe_geometry(BlockBackend *blk)
+{
+return bdrv_probe_geometry(blk-bs);
+}
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 52d13c1..e8b497a 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -138,5 +138,7 @@ BlockAcctStats *blk_get_stats(BlockBackend *blk);

 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque);
+struct ProbeBlockSize blk_probe_blocksizes(BlockBackend *blk);
+struct ProbeGeometry blk_probe_geometry(BlockBackend *blk);

 #endif
--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 6/6] geometry: Target specific hook for s390x in geometry guessing

2014-11-18 Thread Ekaterina Tumanova



copying to mail list

 Forwarded Message 
Subject: [PATCH v2 6/6] geometry: Target specific hook for s390x in 
geometry guessing

Date: Tue, 18 Nov 2014 17:10:01 +0100
From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 
tuman...@linux.vnet.ibm.com


For target-s390x, the behaviour is chosen as follows:
If no geo could be guessed from backing device, guess_disk_lchs (partition
table guessing) is called.
If this is not successful (e.g. image files) geometry is derived from the
size of the disk (as before).
We have no translation on s390, so we simply set in to NONE.

Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
---
 hw/block/Makefile.objs |  6 +-
 hw/block/hd-geometry.c | 36 +++-
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
index d4c3ab7..1f7da7a 100644
--- a/hw/block/Makefile.objs
+++ b/hw/block/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += block.o cdrom.o hd-geometry.o
+common-obj-y += block.o cdrom.o
 common-obj-$(CONFIG_FDC) += fdc.o
 common-obj-$(CONFIG_SSI_M25P80) += m25p80.o
 common-obj-$(CONFIG_NAND) += nand.o
@@ -13,3 +13,7 @@ obj-$(CONFIG_SH4) += tc58128.o

 obj-$(CONFIG_VIRTIO) += virtio-blk.o
 obj-$(CONFIG_VIRTIO) += dataplane/
+
+# geometry is target/architecture dependent and therefore needs to be built
+# for every target platform
+obj-y += hd-geometry.o
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 4972114..b462225 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -49,11 +49,12 @@ struct partition {

 /* try to guess the disk logical geometry from the MSDOS partition table.
Return 0 if OK, -1 if could not guess */
-static int guess_disk_lchs(BlockBackend *blk,
-   int *pcylinders, int *pheads, int *psectors)
+static int guess_disk_lchs(BlockBackend *blk, uint32_t *pcylinders,
+   uint32_t *pheads, uint32_t *psectors)
 {
 uint8_t buf[BDRV_SECTOR_SIZE];
-int i, heads, sectors, cylinders;
+int i;
+uint32_t heads, sectors, cylinders;
 struct partition *p;
 uint32_t nr_sects;
 uint64_t nb_sectors;
@@ -116,11 +117,11 @@ static void guess_chs_for_size(BlockBackend *blk,
 *psecs = 63;
 }

+#ifdef TARGET_S390X
 void hd_geometry_guess(BlockBackend *blk,
uint32_t *pcyls, uint32_t *pheads, uint32_t *psecs,
int *ptrans)
 {
-int cylinders, heads, secs, translation;
 struct ProbeGeometry geometry = blk_probe_geometry(blk);

 if (geometry.rc == 0) {
@@ -129,7 +130,32 @@ void hd_geometry_guess(BlockBackend *blk,
 *pheads = geometry.geo.heads;
 goto done;
 }
+if (guess_disk_lchs(blk, pcyls, pheads, psecs) == 0) {
+goto done;
+}
+guess_chs_for_size(blk, pcyls, pheads, psecs);
+done:
+if (ptrans) {
+*ptrans = BIOS_ATA_TRANSLATION_NONE;
+}

+trace_hd_geometry_guess(blk, *pcyls, *pheads, *psecs,
+BIOS_ATA_TRANSLATION_NONE);
+}
+#else
+void hd_geometry_guess(BlockBackend *blk,
+   uint32_t *pcyls, uint32_t *pheads, uint32_t *psecs,
+   int *ptrans)
+{
+int cylinders, heads, secs, translation;
+struct ProbeGeometry geometry = blk_probe_geometry(blk);
+
+if (geometry.rc == 0) {
+*pcyls = geometry.geo.cylinders;
+*psecs = geometry.geo.sectors;
+*pheads = geometry.geo.heads;
+goto done;
+}
 if (guess_disk_lchs(blk, cylinders, heads, secs)  0) {
 /* no LCHS guess: use a standard physical disk geometry  */
 guess_chs_for_size(blk, pcyls, pheads, psecs);
@@ -157,7 +183,7 @@ done:
 }
 trace_hd_geometry_guess(blk, *pcyls, *pheads, *psecs, translation);
 }
-
+#endif
 int hd_bios_chs_auto_trans(uint32_t cyls, uint32_t heads, uint32_t secs)
 {
 return cyls = 1024  heads = 16  secs = 63
--
1.8.5.5

[Qemu-devel] Fwd: [PATCH v2 0/6] Geometry and blocksize support for backing devices

2014-11-18 Thread Ekaterina Tumanova



copying to mail list

 Forwarded Message 
Subject: [PATCH v2 0/6] Geometry and blocksize support for backing devices
Date: Tue, 18 Nov 2014 17:09:55 +0100
From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
To: arm...@redhat.com, pbonz...@redhat.com, stefa...@redhat.com, 
kw...@redhat.com
CC: borntrae...@de.ibm.com, cornelia.h...@de.ibm.com, 
d...@linux.vnet.ibm.com, mihaj...@linux.vnet.ibm.com, Ekaterina Tumanova 
tuman...@linux.vnet.ibm.com


This is the rework of the geometry+blocksize patch, which was
recently discussed here:
http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg01148.html

Markus suggested that we only detect blocksize and geometry for DASDs.

According to this agreement new version contains DASD special casing.
The driver methods are implemented only for host_device and inner hdev_xxx
functions check if the backing storage is a DASD by means of
BIODASDINFO2 ioctl.

Original patchset can be found here:
http://lists.gnu.org/archive/html/qemu-devel/2014-07/msg03791.html

Ekaterina Tumanova (6):
  geometry: add bdrv functions for geometry and blocksize
  geometry: Detect blocksize via ioctls in separate static functions
  geometry: Add driver methods to probe blocksizes and geometry
  geometry: Add block-backend wrappers for geometry probing
  geometry: Call backend function to detect geometry and blocksize
  geometry: Target specific hook for s390x in geometry guessing

 block.c|  26 +
 block/block-backend.c  |  10 
 block/raw-posix.c  | 123 
++---

 block/raw_bsd.c|  12 
 hw/block/Makefile.objs |   6 +-
 hw/block/block.c   |  11 
 hw/block/hd-geometry.c |  43 --
 hw/block/virtio-blk.c  |   1 +
 include/block/block.h  |  20 +++
 include/block/block_int.h  |   3 +
 include/hw/block/block.h   |   1 +
 include/sysemu/block-backend.h |   2 +
 12 files changed, 234 insertions(+), 24 deletions(-)

--
1.8.5.5

Re: [Qemu-devel] Sending packets up to VM using vhost-net User.

2014-11-18 Thread Anshul Makkar

Sorry, forgot to mention I am using  git clone -b vhost-user-v5
https://github.com/virtualopensystems/qemu.git; for vhost-user backend
implementation.

and git clone https://github.com/virtualopensystems/vapp.git  for
reference implementation.

Anshul Makkar

On Tue, Nov 18, 2014 at 5:29 PM, Anshul Makkar 
anshul.mak...@profitbricks.com wrote:

 Hi,

 I am developing an application that is using vhost-user backend for packet
 transfer.

 The architecture:

 1) VM1 is using Vhost-user and executing on server1.

 .qemu-system-x86_64 -m 1024 -mem-path /dev/hugepages,prealloc=on,share=on
 -drive
 file=/home/amakkar/test.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
 -device
 virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
 -vga std -vnc 0.0.0.0:3 -netdev
 type=vhost-user,id=net0,file=/home/amakkar/qemu.sock -device
 virtio-net-pci,netdev=net0

 2) App1 on server1: executing in user-mode connects with vhost-user
 backend over qemu.sock. As expected, initialization is done and guest
 addresses including addresses of descriptor ring , available ring and used
 ring and mapped to my userspace app and I can directly access them.

 I launch PACKETH on VM1 and transfer some packets using eth0 on VM1
 (packet transfer uses virtio-net backend. ifconfig eth0 shows correct TX
 stats)

 In App1 I directly access the avail_ring buffer and consume the packet and
 then I do RDMA transfer to server 2 .

 3) VM2 and App2 executing on server2 and again using VHost-User.

 App2: Vring initializations are successfully done and vring buffers are
 mapped. I get the buffer from App1 and now *I want to transfer this
 buffer (Raw packet) to VM2.*

 To transfer the buffer from App2 to VM2, I directly access the descriptor
 ring, place the buffer in it and update the available index and then issue
 the kick.

 code snippet for it:

 dest_buf = (void *)handler-map_handler(handler-context,
 desc[a_idx].addr);
 memcpy(dest_buf + hdr_len, buf, size);
 avail-ring[avail-idx % num] = a_idx;
 avail-idx++;
 fprintf(stdout, put_vring, synching memory \n);
 sync_shm(dest_buf, size);
 sync_shm((void *)(avail), sizeof(struct vring_avail));

 kick(vhost_user-vring_table, rx_idx);

 But the buffer never reaches to VM2. (I do ifconfig eth0 in VM2 and RX
 stats are 0)

 Please can you share if my approach is correct in transferring the packet
 from App2 to VM. Can I directly place the buffer in descriptor ring and
 issue a kick to notify virtio-net that a packet is available or you can
 smell some implementation problem.

 Thanks
 Anshul Makkar

[Qemu-devel] Sending packets up to VM using vhost-net User.

2014-11-18 Thread Anshul Makkar

Hi,

I am developing an application that is using vhost-user backend for packet
transfer.

The architecture:

1) VM1 is using Vhost-user and executing on server1.

.qemu-system-x86_64 -m 1024 -mem-path /dev/hugepages,prealloc=on,share=on
-drive
file=/home/amakkar/test.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
-device
virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-vga std -vnc 0.0.0.0:3 -netdev
type=vhost-user,id=net0,file=/home/amakkar/qemu.sock -device
virtio-net-pci,netdev=net0

2) App1 on server1: executing in user-mode connects with vhost-user backend
over qemu.sock. As expected, initialization is done and guest addresses
including addresses of descriptor ring , available ring and used ring and
mapped to my userspace app and I can directly access them.

I launch PACKETH on VM1 and transfer some packets using eth0 on VM1 (packet
transfer uses virtio-net backend. ifconfig eth0 shows correct TX stats)

In App1 I directly access the avail_ring buffer and consume the packet and
then I do RDMA transfer to server 2 .

3) VM2 and App2 executing on server2 and again using VHost-User.

App2: Vring initializations are successfully done and vring buffers are
mapped. I get the buffer from App1 and now *I want to transfer this buffer
(Raw packet) to VM2.*

To transfer the buffer from App2 to VM2, I directly access the descriptor
ring, place the buffer in it and update the available index and then issue
the kick.

code snippet for it:

dest_buf = (void *)handler-map_handler(handler-context, desc[a_idx].addr);
memcpy(dest_buf + hdr_len, buf, size);
avail-ring[avail-idx % num] = a_idx;
avail-idx++;
fprintf(stdout, put_vring, synching memory \n);
sync_shm(dest_buf, size);
sync_shm((void *)(avail), sizeof(struct vring_avail));

kick(vhost_user-vring_table, rx_idx);

But the buffer never reaches to VM2. (I do ifconfig eth0 in VM2 and RX
stats are 0)

Please can you share if my approach is correct in transferring the packet
from App2 to VM. Can I directly place the buffer in descriptor ring and
issue a kick to notify virtio-net that a packet is available or you can
smell some implementation problem.

Thanks
Anshul Makkar

[Qemu-devel] qmp-commands.hx and inherited types (Was: Re: [PATCH v6 02/10] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove)

2014-11-18 Thread John Snow




On 11/07/2014 08:00 AM, Eric Blake wrote:

On 10/30/2014 04:22 AM, Fam Zheng wrote:


[snip]


+++ b/qapi/block-core.json
@@ -865,6 +865,61 @@
  '*on-target-error': 'BlockdevOnError' } }

  ##
+# @BlockDirtyBitmap
+#
+# @device: name of device which the bitmap is tracking
+#
+# @name: name of the dirty bitmap
+#
+# Since 2.3
+##
+{ 'type': 'BlockDirtyBitmap',
+  'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @BlockDirtyBitmapAdd
+#
+# @device: name of device which the bitmap is tracking
+#
+# @name: name of the dirty bitmap
+#
+# @granularity: #optional the bitmap granularity, default is 64k for
+#   block-dirty-bitmap-add


Do you still need to call out the command, given that it is the only
client of this type?


+#
+# Since 2.3
+##
+{ 'type': 'BlockDirtyBitmapAdd',
+  'data': { 'device': 'str', 'name': 'str', '*granularity': 'int' } }


Is it worth using type inheritance, as in:

{ 'type': 'BlockDirtyBitmapAdd',
   'base': 'BlockDirtyBitmap',
   'data': { '*granularity': 'int' } }



Strictly speaking, I would argue against inheritance here because 
BlockDirtyBitmapAdd is not isa BlockDirtyBitmap. It's more of a 
Hasa relationship.


At any rate, I tried to implement this for giggles to see if I could, 
and ran into the following issue with which I'd be curious to get an 
answer for.


As an example, If you have some type:

{ 'type': 'example',
  'data': { 'foo': 'int' } }

And an extension of it:

{ 'type': 'academicExample'
  'base': 'example',
  'data': { 'bar': 'str' } }

How would you write a command that expected both foo and bar?
The following doesn't seem appropriate (the generated code SKIPS the 
base fields, which leads to missing arguments in the prototype:


{ 'command': 'academic-command',
  'data': 'academicExample' }

...

{
.name = academic-command,
.args_type = foo:i,bar:s,
.mhandler.cmd_new = qmp_marshal_input_academic_command,
},


The generated prototype appears to skip the foo argument, including 
only the arguments associated with the base type, in this case, 'bar'.


Do we support this kind of use? I didn't see it in-use currently, but I 
only gave it a cursory skimming.

Re: [Qemu-devel] runtime configurable semihosting

2014-11-18 Thread Liviu Ionescu


On 18 Nov 2014, at 15:31, Peter Maydell peter.mayd...@linaro.org wrote:

 Sorry, yes, you're right; I should have looked more carefully.
 So we do need a new option;

or we should fix the parser. if the current keyword is not in the list, it can 
be pushed back, and defaults used.

 -semihosting-config target=[native|gdb|auto]

ok, I'll implement this, and if you fix the parser we'll change it to the plain 
-semihosting.

 have -semihosting-config have an implied 'enable=true', so

ok

Liviu

[Qemu-devel] Tunneled Migration with Non-Shared Storage

2014-11-18 Thread Gary R Hook


What I really need to figure out is why, when performing a migration
using non-shared storage, an entire VM is copied into memory before it’s 
sent across the wire using this method. A copy-on-read operation

is performed first, then the disk is sent, then the dirty pages, then
the RAM.

The odd thing is that non-tunneled migrations use a completely different
code path that does _not_ execute a copy-on-read first.

The problem: VMs (VM disks) that are larger than available memory can't
be migrated due to space constraints.

I'd like to disable this COR opration but I think it's important to
understand why the code is written this way. I've yet to run across any
comments that explain the need for this extra copy, and am looking
for background and advice.

Any insights are welcome.

--
Gary R Hook
Senior Kernel Engineer
NIMBOXX, Inc

Re: [Qemu-devel] runtime configurable semihosting

2014-11-18 Thread Peter Maydell

On 18 November 2014 16:46, Liviu Ionescu i...@livius.net wrote:

 On 18 Nov 2014, at 15:31, Peter Maydell peter.mayd...@linaro.org wrote:

 Sorry, yes, you're right; I should have looked more carefully.
 So we do need a new option;

 or we should fix the parser. if the current keyword is not in
 the list, it can be pushed back, and defaults used.

This is veering quite close to magic, really. It changes
the current handling of some valid command lines (admittedly
ones with implausible image names, but still).

 -semihosting-config target=[native|gdb|auto]

 ok, I'll implement this, and if you fix the parser we'll change it to the 
 plain -semihosting.

Thanks.

-- PMM

Re: [Qemu-devel] [PATCH 1/3] s390: Add PCI bus support

2014-11-18 Thread Alexander Graf



On 18.11.14 13:50, Frank Blaschka wrote:
 On Mon, Nov 10, 2014 at 04:14:16PM +0100, Alexander Graf wrote:


 On 10.11.14 15:20, Frank Blaschka wrote:
 From: Frank Blaschka frank.blasc...@de.ibm.com

 This patch implements a pci bus for s390x together with infrastructure
 to generate and handle hotplug events, to configure/unconfigure via
 sclp instruction, to do iommu translations and provide s390 support for
 MSI/MSI-X notification processing.

 Signed-off-by: Frank Blaschka frank.blasc...@de.ibm.com

[...]

 diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
 new file mode 100644
 index 000..f2fa6ba
 --- /dev/null
 +++ b/hw/s390x/s390-pci-bus.c
 @@ -0,0 +1,485 @@
 +/*
 + * s390 PCI BUS
 + *
 + * Copyright 2014 IBM Corp.
 + * Author(s): Frank Blaschka frank.blasc...@de.ibm.com
 + *Hong Bo Li lih...@cn.ibm.com
 + *Yi Min Zhao zyi...@cn.ibm.com
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or (at
 + * your option) any later version. See the COPYING file in the top-level
 + * directory.
 + */
 +
 +#include hw/pci/pci.h
 +#include hw/pci/pci_bus.h
 +#include hw/s390x/css.h
 +#include hw/s390x/sclp.h
 +#include hw/pci/msi.h
 +#include qemu/error-report.h
 +#include s390-pci-bus.h
 +
 +/* #define DEBUG_S390PCI_BUS */
 +#ifdef DEBUG_S390PCI_BUS
 +#define DPRINTF(fmt, ...) \
 +do { fprintf(stderr, S390pci-bus:  fmt, ## __VA_ARGS__); } while (0)
 +#else
 +#define DPRINTF(fmt, ...) \
 +do { } while (0)
 +#endif
 +
 +static const unsigned long be_to_le = BITS_PER_LONG - 1;
 +static QTAILQ_HEAD(, SeiContainer) pending_sei =
 +QTAILQ_HEAD_INITIALIZER(pending_sei);
 +static QTAILQ_HEAD(, S390PCIBusDevice) device_list =
 +QTAILQ_HEAD_INITIALIZER(device_list);

 Please get rid of all statics ;). All state has to live in objects.

 
 be_to_le was misleading and unnecesary will remove this one but
 static QTAILQ_HEAD seems to be a common practice for list anchors.
 If you really want me to change this do you have any prefered way,
 or can you point me to some code doing this?

For PCI devices, I don't think you need a list at all. Your PHB device
should already have a proper qbus that knows about all its child devices.

As for pending_sei, what is this about?

 
 +
 +int chsc_sei_nt2_get_event(void *res)

[...]

 +
 +int chsc_sei_nt2_get_event(void *res);
 +int chsc_sei_nt2_have_event(void);
 +void s390_pci_sclp_configure(int configure, SCCB *sccb);
 +S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx);
 +S390PCIBusDevice *s390_pci_find_dev_by_fh(uint32_t fh);
 +S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid);

 I think it makes sense to pass the PHB device as parameter on these.
 Don't assume you only have one.
 
 We need to lookup our device mainly in the instruction handlers and there
 we do not have a PHB available.

Then have a way to find your PHB - either put a variable into the
machine object, or find it by path via QOM tree lookups. Maybe we need
multiple PHBs, identified by part of the ID? I know too little about the
way PCI works on s390x to really tell.

Again, are there specs?

 Also having one list for our S390PCIBusDevices
 devices does not prevent us from supporting more PHBs.
 

 +void s390_pci_bus_init(void);
 +uint64_t s390_pci_get_table_origin(uint64_t iota);
 +uint64_t s390_guest_io_table_walk(uint64_t guest_iota,
 +  uint64_t guest_dma_address);

 Why are these exported?

 +
 +#endif
 diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
 index bc4dc2a..2e25834 100644
 --- a/hw/s390x/s390-virtio-ccw.c
 +++ b/hw/s390x/s390-virtio-ccw.c
 @@ -18,6 +18,7 @@
  #include css.h
  #include virtio-ccw.h
  #include qemu/config-file.h
 +#include s390-pci-bus.h
  
  #define TYPE_S390_CCW_MACHINE   s390-ccw-machine
  
 @@ -127,6 +128,8 @@ static void ccw_init(MachineState *machine)
machine-initrd_filename, s390-ccw.img);
  s390_flic_init();
  
 +s390_pci_bus_init();

 Please just inline that function here.

 
 What do you mean by just inline?

The contents of the s390_pci_bus_init() function should just be standing
right here. There's no value in creating a public wrapper function for
initialization. We only did this back in the old days before qdev was
around, because initialization was difficult back then and some devices
didn't make the jump to get rid of their public init functions.

 
 +
  /* register hypercalls */
  virtio_ccw_register_hcalls();
  
 diff --git a/hw/s390x/sclp.c b/hw/s390x/sclp.c
 index a759da7..a969975 100644
 --- a/hw/s390x/sclp.c
 +++ b/hw/s390x/sclp.c
 @@ -20,6 +20,7 @@
  #include qemu/config-file.h
  #include hw/s390x/sclp.h
  #include hw/s390x/event-facility.h
 +#include hw/s390x/s390-pci-bus.h
  
  static inline SCLPEventFacility *get_event_facility(void)
  {
 @@ -62,7 +63,8 @@ static void read_SCP_info(SCCB *sccb)
  read_info-entries[i].type = 0;
  }
  
 -

Re: [Qemu-devel] qmp-commands.hx and inherited types (Was: Re: [PATCH v6 02/10] qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-remove)

2014-11-18 Thread Eric Blake

On 11/18/2014 09:44 AM, John Snow wrote:
 Is it worth using type inheritance, as in:

 { 'type': 'BlockDirtyBitmapAdd',
'base': 'BlockDirtyBitmap',
'data': { '*granularity': 'int' } }

 
 Strictly speaking, I would argue against inheritance here because
 BlockDirtyBitmapAdd is not isa BlockDirtyBitmap. It's more of a
 Hasa relationship.

Fair enough.

 
 At any rate, I tried to implement this for giggles to see if I could,
 and ran into the following issue with which I'd be curious to get an
 answer for.
 
 As an example, If you have some type:
 
 { 'type': 'example',
   'data': { 'foo': 'int' } }
 
 And an extension of it:
 
 { 'type': 'academicExample'
   'base': 'example',
   'data': { 'bar': 'str' } }
 
 How would you write a command that expected both foo and bar?
 The following doesn't seem appropriate (the generated code SKIPS the
 base fields, which leads to missing arguments in the prototype:
 
 { 'command': 'academic-command',
   'data': 'academicExample' }

Ouch.  Sounds like a bug in the code generator.  Obviously, someone will
have to patch that (and add a testsuite entry to make sure it doesn't
regress) before we can rely on it.

 
 ...
 
 {
 .name = academic-command,
 .args_type = foo:i,bar:s,
 .mhandler.cmd_new = qmp_marshal_input_academic_command,
 },
 
 
 The generated prototype appears to skip the foo argument, including
 only the arguments associated with the base type, in this case, 'bar'.
 
 Do we support this kind of use? I didn't see it in-use currently, but I
 only gave it a cursory skimming.

We supposedly document it as working, but as no one is using it
(including no testsuite entry), I'm not surprised that it doesn't work yet.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] hotremoving a disk qmp/hmp

2014-11-18 Thread Markus Armbruster

William Dauchy will...@gandi.net writes:

 On Nov18 15:22, Markus Armbruster wrote:
 This is block backend disk1.

 yes indeed.

 I presume you're deleting the device using backend disk1.

 yes

 What kind of device is this?  PCI, perhaps?
 Please show us a complete QMP conversation.

 Here it is:

 live vm with one disk:

 (QEMU) query-block
 {   u'return': [   {   u'device': u'disk0',
u'inserted': {   u'backing_file_depth': 0,
 u'bps': 0,
 u'bps_rd': 0,
 u'bps_wr': 0,
 u'detect_zeroes': u'off',
 u'drv': u'raw',
 u'encrypted': False,
 u'encryption_key_missing': False,
 u'file': u'/dev/sda',
 u'image': {   u'actual-size': 0,
   u'dirty-flag': False,
   u'filename': 
 u'/dev/sda',
   u'format': u'raw',
   u'virtual-size': 
 3221225472},
 u'iops': 0,
 u'iops_rd': 0,
 u'iops_wr': 0,
 u'ro': False},
u'io-status': u'ok',
u'locked': True,
u'removable': True,
u'tray_open': False,
u'type': u'unknown'}]}

 hotplugging one disk:

 (QEMU) blockdev-add
 with
 {'options' : {
 'driver': 'raw',
 'id': 'disk1',
 'file': {
 'driver': 'file',
 'filename': /dev/sdb,
 }
 }}

 (QEMU) device_add
 with:
 {
 'driver': 'scsi-hd',
 'id': 'disk1',

Using the same ID for different things currently works, but it's awfully
confusing.  Please don't :)

 'drive': 'disk1',
 'scsi-id': 1,
 'removable': 'on',
 'dpofua': 'off'
 }

 (QEMU) query-block
 {   u'return': [   {   u'device': u'disk0',
u'inserted': {   u'backing_file_depth': 0,
 u'bps': 0,
 u'bps_rd': 0,
 u'bps_wr': 0,
 u'detect_zeroes': u'off',
 u'drv': u'raw',
 u'encrypted': False,
 u'encryption_key_missing': False,
 u'file': u'/dev/sda',
 u'image': {   u'actual-size': 0,
   u'dirty-flag': False,
   u'filename': 
 u'/dev/sda',
   u'format': u'raw',
   u'virtual-size': 
 3221225472},
 u'iops': 0,
 u'iops_rd': 0,
 u'iops_wr': 0,
 u'ro': False},
u'io-status': u'ok',
u'locked': True,
u'removable': True,
u'tray_open': False,
u'type': u'unknown'},,
{   u'device': u'disk1',
u'inserted': {   u'backing_file_depth': 0,
 u'bps': 0,
 u'bps_rd': 0,
 u'bps_wr': 0,
 u'detect_zeroes': u'off',
 u'drv': u'raw',
 u'encrypted': False,
 u'encryption_key_missing': False,
 u'file': u'/dev/sdb',
 u'image': {   u'actual-size': 0,
   u'dirty-flag': False,
   u'filename': 
 u'/dev/sdb',
   u'format': u'raw',
   u'virtual-size': 
 3221225472},
 u'iops': 0,
 u'iops_rd': 0,
 u'iops_wr': 0,
 u'ro': False},
u'io-status': u'ok',
u'locked':

Re: [Qemu-devel] [PULL] Net patches

2014-11-18 Thread Peter Maydell

On 18 November 2014 15:04, Stefan Hajnoczi stefa...@redhat.com wrote:
 The following changes since commit 1aba4be97eb01b650d146c7f01dc961d55da62ab:

   Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into 
 staging (2014-11-17 17:22:03 +)

 are available in the git repository at:

   git://github.com/stefanha/qemu.git tags/net-pull-request

 for you to fetch changes up to ed6273e26fdfb94a282dbbf1234a75422c6b4c4b:

   net: The third parameter of getsockname should be initialized (2014-11-18 
 15:04:35 +)

 

 
Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment

2014-11-18 Thread Eric Blake

On 11/14/2014 06:06 AM, Max Reitz wrote:
 Add a function qcow2_change_refcount_order() which allows changing the
 refcount order of a qcow2 image.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  block/qcow2-refcount.c | 457 
 +
  block/qcow2.h  |   4 +
  2 files changed, 461 insertions(+)
 

 +static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,

 +
 +status_cb(bs, (uint64_t)index * s-refcount_table_size + 
 reftable_index,
 +  (uint64_t)total * s-refcount_table_size, cb_opaque);

Not sure if the casts are needed (isn't s-refcount_table_size already
uint64_t, and 'int * uint64_t' does the right thing); but I guess it
doesn't hurt to leave them.

 +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
 +BlockDriverAmendStatusCB *status_cb,
 +void *cb_opaque, Error **errp)
 +{

 +do {
 +int total_walks;
 +
 +new_allocation = false;
 +
 +/* At least we have to do this walk and the one which writes the
 + * refblocks; also, at least we have to do this loop here at least
 + * twice (normally), first to do the allocations, and second to
 + * determine that everything is correctly allocated, this then makes
 + * three walks in total */
 +total_walks = MIN(walk_index + 2, 3);

This feels wrong...

 +
 +/* First, allocate the structures so they are present in the refcount
 + * structures */
 +ret = walk_over_reftable(bs, new_reftable, new_reftable_index,
 + new_reftable_size, NULL, new_refblock_size,
 + new_refcount_bits, alloc_refblock,
 + new_allocation, NULL, status_cb, cb_opaque,
 + walk_index++, total_walks, errp);

...In the common case of just two iterations of the do loop (second
iteration confirms no allocations needed), you call with index 0/2, 1/3,
and then the later non-allocation walk is index 2/3.

In the rare case of three iterations of the do loop, you call with index
0/2, 1/3, 2/3, and then the later non-allocation walk is 3/4.

I highly doubt that it is possible to trigger four iterations of the do
loop, but if it were, you would call with 0/2, 1/3, 2/3, 3/3, and then 4/5.

I think you instead want to have:

total_walks = MAX(walk_index + 2, 3)

then the common case will call with 0/3, 1/3, and the later walk as 2/3

the three-iteration loop will call with 0/3, 1/3, 2/4, and the later
walk as 3/4

the unlikely four-iteration loop will call with 0/3, 1/3, 2/4, 3/5, and
the later walk as 4/5.

 +
 +new_reftable_index = 0;
 +
 +if (new_allocation) {
 +if (new_reftable_offset) {
 +qcow2_free_clusters(bs, new_reftable_offset,
 +allocated_reftable_size * 
 sizeof(uint64_t),
 +QCOW2_DISCARD_NEVER);

Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy?
 Why not punch holes in the file when throwing out a failed too-small
new table, or when cleaning up the old table once the new table is good?

 +}
 +
 +new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size 
 *
 +   sizeof(uint64_t));
 +if (new_reftable_offset  0) {
 +error_setg_errno(errp, -new_reftable_offset,
 + Failed to allocate the new reftable);
 +ret = new_reftable_offset;
 +goto done;
 +}
 +allocated_reftable_size = new_reftable_size;
 +
 +new_allocation = true;

This assignment is dead code (it already occurs inside an 'if
(new_allocation)' condition).

 +}
 +} while (new_allocation);
 +
 +/* Second, write the new refblocks */
 +new_allocation = false;

This assignment is dead code (it can only be reached if the earlier do
loop ended, which is only possible when no allocations are recorded).

 +ret = walk_over_reftable(bs, new_reftable, new_reftable_index,
 + new_reftable_size, new_refblock,
 + new_refblock_size, new_refcount_bits,
 + flush_refblock, new_allocation, 
 new_set_refcount,
 + status_cb, cb_opaque, walk_index, walk_index + 
 1,
 + errp);
 +if (ret  0) {
 +goto done;
 +}
 +assert(!new_allocation);
 +

Correct.

 +done:
 +if (new_reftable) {
 +/* On success, new_reftable actually points to the old reftable (and
 + * new_reftable_size is the old reftable's size); but that is just
 + * fine */
 +for (i = 0; i  new_reftable_size; i++) {
 +uint64_t offset =

Re: [Qemu-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration

2014-11-18 Thread Peter Maydell

On 18 November 2014 11:41, Kevin Wolf kw...@redhat.com wrote:
 Am 17.11.2014 um 22:20 hat Don Slutz geschrieben:
 The other callers to blk_set_enable_write_cache() in this file
 already check for s-blk == NULL.

 Signed-off-by: Don Slutz dsl...@verizon.com
 ---

 I think this is a bugfix that should be back ported to stable
 releases.

 Please remember to include a 'Cc: qemu-sta...@nongnu.org' line in your
 commit messages for such patches. I've added it and applied the patch to
 my block branch, thanks. (Saw it too late for my -rc2 pull request,
 though, so it'll have to wait for -rc3.)

Now applied directly to master so it will make -rc2.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment

2014-11-18 Thread Max Reitz


On 18.11.2014 18:55, Eric Blake wrote:

On 11/14/2014 06:06 AM, Max Reitz wrote:

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz mre...@redhat.com
---
  block/qcow2-refcount.c | 457 +
  block/qcow2.h  |   4 +
  2 files changed, 461 insertions(+)

+static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,
+
+status_cb(bs, (uint64_t)index * s-refcount_table_size + 
reftable_index,
+  (uint64_t)total * s-refcount_table_size, cb_opaque);

Not sure if the casts are needed (isn't s-refcount_table_size already
uint64_t,


Surprise, it isn't. I thought otherwise, too, but then got told by 
clang_complete (it's uint32_t).



and 'int * uint64_t' does the right thing); but I guess it
doesn't hurt to leave them.


+int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
+BlockDriverAmendStatusCB *status_cb,
+void *cb_opaque, Error **errp)
+{
+do {
+int total_walks;
+
+new_allocation = false;
+
+/* At least we have to do this walk and the one which writes the
+ * refblocks; also, at least we have to do this loop here at least
+ * twice (normally), first to do the allocations, and second to
+ * determine that everything is correctly allocated, this then makes
+ * three walks in total */
+total_walks = MIN(walk_index + 2, 3);

This feels wrong...


Yes, I noticed already when preparing v3 and it's already fixed in my 
local v3 branch. *cough*



+
+/* First, allocate the structures so they are present in the refcount
+ * structures */
+ret = walk_over_reftable(bs, new_reftable, new_reftable_index,
+ new_reftable_size, NULL, new_refblock_size,
+ new_refcount_bits, alloc_refblock,
+ new_allocation, NULL, status_cb, cb_opaque,
+ walk_index++, total_walks, errp);

...In the common case of just two iterations of the do loop (second
iteration confirms no allocations needed), you call with index 0/2, 1/3,
and then the later non-allocation walk is index 2/3.

In the rare case of three iterations of the do loop, you call with index
0/2, 1/3, 2/3, and then the later non-allocation walk is 3/4.

I highly doubt that it is possible to trigger four iterations of the do
loop, but if it were, you would call with 0/2, 1/3, 2/3, 3/3, and then 4/5.

I think you instead want to have:

total_walks = MAX(walk_index + 2, 3)

then the common case will call with 0/3, 1/3, and the later walk as 2/3

the three-iteration loop will call with 0/3, 1/3, 2/4, and the later
walk as 3/4

the unlikely four-iteration loop will call with 0/3, 1/3, 2/4, 3/5, and
the later walk as 4/5.


+
+new_reftable_index = 0;
+
+if (new_allocation) {
+if (new_reftable_offset) {
+qcow2_free_clusters(bs, new_reftable_offset,
+allocated_reftable_size * sizeof(uint64_t),
+QCOW2_DISCARD_NEVER);

Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy?


Ah, discarding is always interesting... Last year I used 
QCOW2_DISCARD_ALWAYS, then asked Kevin and he basically said never to 
use ALWAYS unless one is really sure about it. I could have used 
QCOW2_DISCARD_OTHER... But the idea behind using NEVER in cases like 
this is that the clusters may get picked up by the following allocation, 
in which case having discarded them is not a good idea (there are some 
other places in the qcow2 code which use NEVER for the same reason).


So, in this case, I think NEVER is good.


Why not punch holes in the file when throwing out a failed too-small
new table, or when cleaning up the old table once the new table is good?


+}
+
+new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
+   sizeof(uint64_t));
+if (new_reftable_offset  0) {
+error_setg_errno(errp, -new_reftable_offset,
+ Failed to allocate the new reftable);
+ret = new_reftable_offset;
+goto done;
+}
+allocated_reftable_size = new_reftable_size;
+
+new_allocation = true;

This assignment is dead code (it already occurs inside an 'if
(new_allocation)' condition).


Right. Though I somehow like its explicitness... I'll remove it.


+}
+} while (new_allocation);
+
+/* Second, write the new refblocks */
+new_allocation = false;

This assignment is dead code (it can only be reached if the earlier do
loop ended, which is only possible when no allocations are recorded).


Right again.


+

[Qemu-devel] [PATCH] add semihosting-config option

2014-11-18 Thread Liviu Ionescu

- allow to define where the semihosting calls will be addressed

Signed-off-by: Liviu Ionescu i...@livius.net
---
 gdbstub.c   | 12 
 include/sysemu/sysemu.h |  6 ++
 qemu-options.hx | 12 +++-
 vl.c| 49 +
 4 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/gdbstub.c b/gdbstub.c
index 0faca56..372aa67 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -355,6 +355,18 @@ static enum {
remote gdb syscalls.  Otherwise use native file IO.  */
 int use_gdb_syscalls(void)
 {
+if (semihosting_target == SEMIHOSTING_TARGET_NATIVE) {
+if (gdb_syscall_mode == GDB_SYS_UNKNOWN) {
+gdb_syscall_mode = GDB_SYS_DISABLED;
+}
+return FALSE;
+} else if (semihosting_target == SEMIHOSTING_TARGET_GDB) {
+if (gdb_syscall_mode == GDB_SYS_UNKNOWN) {
+gdb_syscall_mode = GDB_SYS_ENABLED;
+}
+return TRUE;
+}
+
 if (gdb_syscall_mode == GDB_SYS_UNKNOWN) {
 gdb_syscall_mode = (gdbserver_state ? GDB_SYS_ENABLED
 : GDB_SYS_DISABLED);
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 9fea3bc..c145d94 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -127,7 +127,13 @@ extern int cursor_hide;
 extern int graphic_rotate;
 extern int no_quit;
 extern int no_shutdown;
+
 extern int semihosting_enabled;
+extern int semihosting_target;
+#define SEMIHOSTING_TARGET_AUTO 0
+#define SEMIHOSTING_TARGET_NATIVE   1
+#define SEMIHOSTING_TARGET_GDB  2
+
 extern int old_param;
 extern int boot_menu;
 extern bool boot_strict;
diff --git a/qemu-options.hx b/qemu-options.hx
index da9851d..3e81222 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3216,7 +3216,17 @@ DEF(semihosting, 0, QEMU_OPTION_semihosting,
 STEXI
 @item -semihosting
 @findex -semihosting
-Semihosting mode (ARM, M68K, Xtensa only).
+Enable semihosting mode (ARM, M68K, Xtensa only).
+ETEXI
+DEF(semihosting-config, HAS_ARG, QEMU_OPTION_semihosting_config,
+-semihosting-config [enable=on|off,]target=native|gdb|auto   semihosting 
configuration\n,
+QEMU_ARCH_ARM | QEMU_ARCH_M68K | QEMU_ARCH_XTENSA | QEMU_ARCH_LM32)
+STEXI
+@item -semihosting-config [enable=on|off,]target=native|gdb|auto
+@findex -semihosting-config
+Enable semihosting and define where the semihosting calls will be addressed,
+to QEMU (@code{native}) or to GDB (@code{gdb}). The default is @code{auto}, 
which means
+@code{gdb} during debug sessions and @code{native} otherwise (ARM, M68K, 
Xtensa only).
 ETEXI
 DEF(old-param, 0, QEMU_OPTION_old_param,
 -old-param  old param mode\n, QEMU_ARCH_ARM)
diff --git a/vl.c b/vl.c
index f4a6e5e..8844983 100644
--- a/vl.c
+++ b/vl.c
@@ -172,6 +172,7 @@ const char *watchdog;
 QEMUOptionRom option_rom[MAX_OPTION_ROMS];
 int nb_option_roms;
 int semihosting_enabled = 0;
+int semihosting_target = SEMIHOSTING_TARGET_AUTO;
 int old_param = 0;
 const char *qemu_name;
 int alt_grab = 0;
@@ -554,6 +555,22 @@ static QemuOptsList qemu_icount_opts = {
 },
 };
 
+static QemuOptsList qemu_semihosting_config_opts = {
+.name = semihosting-config,
+.implied_opt_name = enable,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_semihosting_config_opts.head),
+.desc = {
+{
+.name = enable,
+.type = QEMU_OPT_BOOL,
+}, {
+.name = target,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
 /**
  * Get machine options
  *
@@ -2811,6 +2828,7 @@ int main(int argc, char **argv, char **envp)
 qemu_add_opts(qemu_name_opts);
 qemu_add_opts(qemu_numa_opts);
 qemu_add_opts(qemu_icount_opts);
+qemu_add_opts(qemu_semihosting_config_opts);
 
 runstate_init();
 
@@ -3618,6 +3636,37 @@ int main(int argc, char **argv, char **envp)
break;
 case QEMU_OPTION_semihosting:
 semihosting_enabled = 1;
+semihosting_target = SEMIHOSTING_TARGET_AUTO;
+break;
+case QEMU_OPTION_semihosting_config:
+semihosting_enabled = 1;
+opts = qemu_opts_parse(qemu_find_opts(semihosting-config),
+   optarg, 0);
+if (opts != NULL) {
+semihosting_enabled = qemu_opt_get_bool(opts, enable,
+true);
+const char *target = qemu_opt_get(opts, target);
+if (target != NULL) {
+if (strcmp(native, target) == 0) {
+semihosting_target = SEMIHOSTING_TARGET_NATIVE;
+} else if (strcmp(gdb, target) == 0) {
+semihosting_target = SEMIHOSTING_TARGET_GDB;
+} else  if (strcmp(auto, target) == 0) {
+

[Qemu-devel] [PATCH] pass semihosting exit code back to system

2014-11-18 Thread Liviu Ionescu

Signed-off-by: Liviu Ionescu i...@livius.net
---
 target-arm/arm-semi.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target-arm/arm-semi.c b/target-arm/arm-semi.c
index ebb5235..4b982ad 100644
--- a/target-arm/arm-semi.c
+++ b/target-arm/arm-semi.c
@@ -58,6 +58,11 @@
 #define TARGET_SYS_HEAPINFO0x16
 #define TARGET_SYS_EXIT0x18
 
+/* ADP_Stopped_ApplicationExit is used for exit(0),
+ * anything else is implemented as exit(1) */
+#define ADP_Stopped_ApplicationExit ((2  16) + 38)
+#define ADP_Stopped_RunTimeError((2  16) + 35)
+
 #ifndef O_BINARY
 #define O_BINARY 0
 #endif
@@ -551,8 +556,11 @@ uint32_t do_arm_semihosting(CPUARMState *env)
 return 0;
 }
 case TARGET_SYS_EXIT:
-gdb_exit(env, 0);
-exit(0);
+/* ARM specifies only Stopped_ApplicationExit as normal
+ * exit, everything else is considered an error */
+ret = (args == ADP_Stopped_ApplicationExit) ? 0 : 1;
+gdb_exit(env, ret);
+exit(ret);
 default:
 fprintf(stderr, qemu: Unsupported SemiHosting SWI 0x%02x\n, nr);
 cpu_dump_state(cs, stderr, fprintf, 0);
-- 
1.9.3 (Apple Git-50)

[Qemu-devel] [PATCH] pass ARM semihosting exit code back to system

2014-11-18 Thread Liviu Ionescu

In order to run unit tests under semihosting, it is necessary to pass the
application exit code back to the system.

ARM defines only the code to be used for non-error application exit
(ADP_Stopped_ApplicationExit), all other codes should return non-zero
exit codes.

This patch checks if the application code passed via TARGET_SYS_EXIT is
ADP_Stopped_ApplicationExit, and return 0, otherwise return 1.

Liviu Ionescu (1):
  pass semihosting exit code back to system

 target-arm/arm-semi.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

-- 
1.9.3 (Apple Git-50)

[Qemu-devel] [PATCH] add -semihosting-config option

2014-11-18 Thread Liviu Ionescu

The usual semihosting behaviour is to process the system calls locally and
return; unfortuantelly the initial implementation dinamically changed the
target to GDB during debug sessions, which, for the usual arm-none-eabi-gdb,
is not implemented. The result was that during debug sessions the semihosting
calls were discarded.

This patch adds a configuration variable and an option to set it on the
command line:

-semihosting-config [enable=on|off,]target=native|gdb|auto

This option enables semihosting and defines where the semihosting calls will
be addressed, to QEMU ('native') or to GDB ('gdb'). The default is auto, which
means 'gdb' during debug sessions and 'native' otherwise.

Liviu Ionescu (1):
  add semihosting-config option

 gdbstub.c   | 12 
 include/sysemu/sysemu.h |  6 ++
 qemu-options.hx | 12 +++-
 vl.c| 49 +
 4 files changed, 78 insertions(+), 1 deletion(-)

-- 
1.9.3 (Apple Git-50)

1 2 >

1 - 100 of 135 matches

Mail list logo