Re: [Qemu-block] [PATCH v6 25/39] blockdev: Pull out blockdev option extraction
Am 12.10.2015 um 22:00 hat Max Reitz geschrieben: > Extract some of the blockdev option extraction code from blockdev_init() > into its own function. This simplifies blockdev_init() and will allow > reusing the code in a different function added in a follow-up patch. > > Signed-off-by: Max Reitz> Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf > +#ifdef CONFIG_LINUX_AIO > +if ((aio = qemu_opt_get(opts, "aio")) != NULL) { > +if (!strcmp(aio, "native")) { > +*bdrv_flags |= BDRV_O_NATIVE_AIO; > +} else if (!strcmp(aio, "threads")) { > +/* this is the default */ > +} else { > + error_setg(errp, "invalid aio option"); > + return; > +} > +} > +#endif Unrelated to your patch, but didn't we want to remove this #ifdef? I thought we had already done so... Oh, I've never flushed block-next after the 2.4 release. :-/ Expect a merge conflict coming up here... Kevin
Re: [Qemu-block] [PATCH v6 23/39] block: Prepare for NULL BDS
On 13.10.2015 17:37, Kevin Wolf wrote: > Am 12.10.2015 um 22:00 hat Max Reitz geschrieben: >> blk_bs() will not necessarily return a non-NULL value any more (unless >> blk_is_available() is true or it can be assumed to otherwise, e.g. >> because it is called immediately after a successful blk_new_with_bs() or >> blk_new_open()). >> >> Signed-off-by: Max Reitz> >> @@ -1584,12 +1594,16 @@ static void drive_backup_prepare(BlkTransactionState >> *common, Error **errp) >>"Device '%s' not found", backup->device); >> return; >> } >> -bs = blk_bs(blk); >> >> /* AioContext is released in .clean() */ >> -state->aio_context = bdrv_get_aio_context(bs); >> +state->aio_context = blk_get_aio_context(blk); >> aio_context_acquire(state->aio_context); >> >> +if (!blk_is_available(blk)) { >> +error_setg(errp, "Device '%s' has no medium", backup->device); >> +return; >> +} >> + >> qmp_drive_backup(backup->device, backup->target, >> backup->has_format, backup->format, >> backup->sync, >> @@ -1604,7 +1618,7 @@ static void drive_backup_prepare(BlkTransactionState >> *common, Error **errp) >> return; >> } >> >> -state->bs = bs; >> +state->bs = blk_bs(blk); >> state->job = state->bs->job; >> } >> >> @@ -1639,8 +1653,7 @@ static void >> blockdev_backup_prepare(BlkTransactionState *common, Error **errp) >> { >> BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, >> common); >> BlockdevBackup *backup; >> -BlockDriverState *bs, *target; >> -BlockBackend *blk; >> +BlockBackend *blk, *target; >> Error *local_err = NULL; >> >> assert(common->action->kind == TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP); >> @@ -1651,18 +1664,16 @@ static void >> blockdev_backup_prepare(BlkTransactionState *common, Error **errp) >> error_setg(errp, "Device '%s' not found", backup->device); >> return; >> } >> -bs = blk_bs(blk); >> >> -blk = blk_by_name(backup->target); >> -if (!blk) { >> +target = blk_by_name(backup->target); >> +if (!target) { >> error_setg(errp, "Device '%s' not found", backup->target); >> return; >> } >> -target = blk_bs(blk); >> >> /* AioContext is released in .clean() */ >> -state->aio_context = bdrv_get_aio_context(bs); >> -if (state->aio_context != bdrv_get_aio_context(target)) { >> +state->aio_context = blk_get_aio_context(blk); >> +if (state->aio_context != blk_get_aio_context(target)) { >> state->aio_context = NULL; >> error_setg(errp, "Backup between two IO threads is not >> implemented"); >> return; >> @@ -1680,7 +1691,7 @@ static void >> blockdev_backup_prepare(BlkTransactionState *common, Error **errp) >> return; >> } >> >> -state->bs = bs; >> +state->bs = blk_bs(blk); >> state->job = state->bs->job; >> } > > It's somewhat inconsistent that blockdev_backup_prepare() doesn't do an > explicit blk_is_available() check whereas drive_backup_prepare() does. > As far as I can tell, both don't necessarily need their for their own > code and in both cases the called qmp_*_backup() function checks it > again. Probably some kind of a rebase conflict. blockdev-backup was added only rather recently, and so I made a different decision then than for drive-backup. I think I'll drop the check from drive_backup_prepare(). Having it in qmp_drive_backup() is enough. > Not really a problem, though, it just looks a bit odd. > >> @@ -1818,10 +1829,10 @@ static void eject_device(BlockBackend *blk, int >> force, Error **errp) >> BlockDriverState *bs = blk_bs(blk); >> AioContext *aio_context; >> >> -aio_context = bdrv_get_aio_context(bs); >> +aio_context = blk_get_aio_context(blk); >> aio_context_acquire(aio_context); >> >> -if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_EJECT, errp)) { >> +if (bs && bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_EJECT, errp)) { >> goto out; >> } >> if (!blk_dev_has_removable_media(blk)) { >> error_setg(errp, "Device '%s' is not removable", >> bdrv_get_device_name(bs)); > > bs isn't checked to be non-NULL here. You could argue that if it's not > removable, it shouldn't be NULL, but I think we let drive_del forcibly > remove the medium even for devices that can't deal with it and start > producing I/O errors then. I'd mainly argue that this function is to be removed anyway. :-) I should just return immediately for NULL bs, as you're suggesting. > >> goto out; >> } >> >> if (blk_dev_is_medium_locked(blk) && !blk_dev_is_tray_open(blk)) { >> blk_dev_eject_request(blk, force); >> if (!force) { >> error_setg(errp, "Device '%s' is locked", >> bdrv_get_device_name(bs)); > > And here. > >>
Re: [Qemu-block] [PATCH v6 24/39] blockdev: Do not create BDS for empty drive
On 14.10.2015 15:27, Kevin Wolf wrote: > Am 12.10.2015 um 22:00 hat Max Reitz geschrieben: >> Do not use "rudimentary" BDSs for empty drives any longer (for >> freshly created drives). >> >> After a follow-up patch, empty drives will generally use a NULL BDS, not >> only the freshly created drives. >> >> Signed-off-by: Max Reitz>> --- >> blockdev.c | 72 >> ++ >> 1 file changed, 44 insertions(+), 28 deletions(-) >> >> diff --git a/blockdev.c b/blockdev.c >> index 35efe84..845a1c1 100644 >> --- a/blockdev.c >> +++ b/blockdev.c >> @@ -514,16 +514,44 @@ static BlockBackend *blockdev_init(const char *file, >> QDict *bs_opts, >> goto early_err; >> } >> >> +if (snapshot) { >> +/* always use cache=unsafe with snapshot */ >> +bdrv_flags &= ~BDRV_O_CACHE_MASK; >> +bdrv_flags |= (BDRV_O_SNAPSHOT|BDRV_O_CACHE_WB|BDRV_O_NO_FLUSH); >> +} >> + >> +if (copy_on_read) { >> +bdrv_flags |= BDRV_O_COPY_ON_READ; >> +} >> + >> +if (runstate_check(RUN_STATE_INMIGRATE)) { >> +bdrv_flags |= BDRV_O_INCOMING; >> +} > > Bad conflict resolution when you added patch 2, which moved the > BDRV_O_INCOMING code to block.c? Oops, indeed. I was wondering why I only removed it once from blockdev_init(). Thanks. > That patch is actually essential here as well: The flag would ends up in > the root state, and bdrv_invalidate_cache_all() doesn't clear the flag > any more. So we might end up opening an image with BDRV_O_INCOMING even > though migration has long finished. Yep. Will fix. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH V2 3/5] virtio-blk: disable scsi passthrough by default
On 14/10/2015 12:29, Cornelia Huck wrote: > Do we want to change anything for 2.5 about the default? > > Currently, we still default scsi to true, and you have to disable it > explicitly if you want to use virtio-1 compliant virtio-blk devices > (which is a bit annoying, as scsi passthrough is not something people > usually want). I think disabling it is best. Can you send a patch? Paolo
Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote: > On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote: > > On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote: > >> +/* start backup job now */ > >> +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET, > >> +s->active_disk->backing_blocker); > >> +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE, > >> +s->hidden_disk->backing_blocker); > > > > Why is it safe to unblock these operations? > > > > Why do they have to be blocked for non-replication users? > > hidden_disk and secondary disk are opened as backing file, so it is blocked > for > non-replication users. > What can I do if I don't unblock it and want to do backup? CCing Jeff Cody, block jobs maintainer You need to explain why it is safe remove this protection. We can't merge code that may be unsafe. I think we can investigate further by asking: when does QEMU code assume the backing file is read-only? I haven't checked but these cases come to mind: Operations that move data between BDS in the backing chain (e.g. commit and stream block jobs) will lose or overwrite data if the backing file is being written to by another coroutine. We need to prevent users from running these operations at the same time. Also, accessing bs->backing_blocker is a layering violation. No one outside block.c:bdrv_set_backing_hd() is supposed to access this field. Let's figure out the safety concerns first and then the bs->backing_blocker access will probably be eliminated as part of the solution. Stefan
Re: [Qemu-block] [PATCH v6 25/39] blockdev: Pull out blockdev option extraction
On 14.10.2015 17:12, Kevin Wolf wrote: > Am 12.10.2015 um 22:00 hat Max Reitz geschrieben: >> Extract some of the blockdev option extraction code from blockdev_init() >> into its own function. This simplifies blockdev_init() and will allow >> reusing the code in a different function added in a follow-up patch. >> >> Signed-off-by: Max Reitz>> Reviewed-by: Alberto Garcia > > Reviewed-by: Kevin Wolf > >> +#ifdef CONFIG_LINUX_AIO >> +if ((aio = qemu_opt_get(opts, "aio")) != NULL) { >> +if (!strcmp(aio, "native")) { >> +*bdrv_flags |= BDRV_O_NATIVE_AIO; >> +} else if (!strcmp(aio, "threads")) { >> +/* this is the default */ >> +} else { >> + error_setg(errp, "invalid aio option"); >> + return; >> +} >> +} >> +#endif > > Unrelated to your patch, but didn't we want to remove this #ifdef? I > thought we had already done so... Oh, I've never flushed block-next > after the 2.4 release. :-/ > > Expect a merge conflict coming up here... It won't be the first to hit this series. ;-) Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 3/3] block: use bdrv_lookup_bs() over blk_by_name() for BDS only results
On Wed, Oct 14, 2015 at 08:05:34PM +0200, Max Reitz wrote: > On 14.10.2015 15:16, Jeff Cody wrote: > > To find a BlockDriverState interface, it can be done via blk_by_name(), > > bdrv_find_node(), and bdrv_lookup_bs(). The latter can take the place > > of the other two, in the instances where we are only concerned with the > > BlockDriverState. > > > > In much of the usage of blk_by_name(), we can simplify the code by > > calling bdrv_lookup_bs() instead of blk_by_name(), when what we need is > > just the BDS, and not the BB. > > > > Signed-off-by: Jeff Cody> > --- > > blockdev.c| 84 > > --- > > migration/block.c | 6 ++-- > > 2 files changed, 32 insertions(+), 58 deletions(-) > > Again, if this series is based on Berto's blockdev-snapshot series, it > should be based on my BB series as well. But with that series applied, > this patch has some (non-trivial) rebase conflicts on it. > > Also, it has a fundamental conflict with that series: If a BB can be > found by bdrv_lookup_bs() but it doesn't have a BDS, it will return NULL > and set *errp accordingly ("No medium in device"). However, this patch > discards that error message and just keeps the one of the former > blk_by_name() caller, which is generally "Device not found". That is > wrong, however, if there is simply no medium in the device. > Good point. We can just drop this patch, then - maybe at some point, we can have a consolidated API to obtain a BDS (if having such a thing is even all that important), that fits well with the design goal of your series. I think the first two patches are probably worth keeping, however. > In some cases, that doesn't matter (since we just assume that if there > is a BB, it will have a BDS, like for hmp_commit), but it most probably > does matter for all the places which conflict with my BB series, the > reason being that they generally conflict because I added a > blk_is_available() check. > > Note that this is a "design conflict", too: bdrv_lookup_bs() will return > NULL only if blk_bs() is NULL. blk_is_available() may return false even > if there is a BDS (if the BDS is unavailable, e.g. one of the BDSs in > the tree not being inserted or the guest device's tray is open). But if > you already have a BDS (because you used bdrv_lookup_bs()), calling > blk_is_available() on bs->blk afterwards looks kind of strange... > > Max >
Re: [Qemu-block] [PATCH 2/3] block: make bdrv_find_node() static
On 14.10.2015 15:16, Jeff Cody wrote: > This patch does two things: it moves bdrv_find_node() up before the > first usage in block.c, and it makes the function static so that it > is only internal to block.c. > > Signed-off-by: Jeff Cody> --- > block.c | 30 +++--- > include/block/block.h | 1 - > 2 files changed, 15 insertions(+), 16 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 3/3] block: use bdrv_lookup_bs() over blk_by_name() for BDS only results
On 14.10.2015 15:16, Jeff Cody wrote: > To find a BlockDriverState interface, it can be done via blk_by_name(), > bdrv_find_node(), and bdrv_lookup_bs(). The latter can take the place > of the other two, in the instances where we are only concerned with the > BlockDriverState. > > In much of the usage of blk_by_name(), we can simplify the code by > calling bdrv_lookup_bs() instead of blk_by_name(), when what we need is > just the BDS, and not the BB. > > Signed-off-by: Jeff Cody> --- > blockdev.c| 84 > --- > migration/block.c | 6 ++-- > 2 files changed, 32 insertions(+), 58 deletions(-) Again, if this series is based on Berto's blockdev-snapshot series, it should be based on my BB series as well. But with that series applied, this patch has some (non-trivial) rebase conflicts on it. Also, it has a fundamental conflict with that series: If a BB can be found by bdrv_lookup_bs() but it doesn't have a BDS, it will return NULL and set *errp accordingly ("No medium in device"). However, this patch discards that error message and just keeps the one of the former blk_by_name() caller, which is generally "Device not found". That is wrong, however, if there is simply no medium in the device. In some cases, that doesn't matter (since we just assume that if there is a BB, it will have a BDS, like for hmp_commit), but it most probably does matter for all the places which conflict with my BB series, the reason being that they generally conflict because I added a blk_is_available() check. Note that this is a "design conflict", too: bdrv_lookup_bs() will return NULL only if blk_bs() is NULL. blk_is_available() may return false even if there is a BDS (if the BDS is unavailable, e.g. one of the BDSs in the tree not being inserted or the guest device's tray is open). But if you already have a BDS (because you used bdrv_lookup_bs()), calling blk_is_available() on bs->blk afterwards looks kind of strange... Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 1/3] block: Use bdrv_lookup_bs() instead of bdrv_find_node()
On 14.10.2015 19:29, Max Reitz wrote: > On 14.10.2015 15:16, Jeff Cody wrote: >> This is a precursor to making bdrv_find_node() static, and internal >> to block.c >> >> To find a BlockDriverState interface, it can be done via blk_by_name(), >> bdrv_find_node(), and bdrv_lookup_bs(). The latter can take the place >> of the other two, in the instances where we are only concerned with >> the BlockDriverState. >> >> There is no benefit in calling bdrv_find_node() directly. This patch >> replaces all calls to bdrv_find_node() outside of block.c with >> bdrv_lookup_bs(). >> >> Signed-off-by: Jeff Cody>> --- >> block/block-backend.c | 2 +- >> block/mirror.c | 2 +- >> block/write-threshold.c | 2 +- >> 3 files changed, 3 insertions(+), 3 deletions(-) > > Reviewed-by: Max Reitz Oh, wait, on patch 2 gcc tells me I should take that back. If this series is based on Berto's series, that means it's also based on my BlockBackend series, and that one adds another instance of bdrv_find_node(). Since I'll have to send a v7 anyway, I'll make it a bdrv_lookup_bs() there, so my R-b stands. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [PATCH 1/3] block: Use bdrv_lookup_bs() instead of bdrv_find_node()
On Wed, Oct 14, 2015 at 07:34:43PM +0200, Max Reitz wrote: > On 14.10.2015 19:29, Max Reitz wrote: > > On 14.10.2015 15:16, Jeff Cody wrote: > >> This is a precursor to making bdrv_find_node() static, and internal > >> to block.c > >> > >> To find a BlockDriverState interface, it can be done via blk_by_name(), > >> bdrv_find_node(), and bdrv_lookup_bs(). The latter can take the place > >> of the other two, in the instances where we are only concerned with > >> the BlockDriverState. > >> > >> There is no benefit in calling bdrv_find_node() directly. This patch > >> replaces all calls to bdrv_find_node() outside of block.c with > >> bdrv_lookup_bs(). > >> > >> Signed-off-by: Jeff Cody> >> --- > >> block/block-backend.c | 2 +- > >> block/mirror.c | 2 +- > >> block/write-threshold.c | 2 +- > >> 3 files changed, 3 insertions(+), 3 deletions(-) > > > > Reviewed-by: Max Reitz > > Oh, wait, on patch 2 gcc tells me I should take that back. If this > series is based on Berto's series, that means it's also based on my > BlockBackend series, and that one adds another instance of bdrv_find_node(). > > Since I'll have to send a v7 anyway, I'll make it a bdrv_lookup_bs() > there, so my R-b stands. > > Max > Thanks. I actually managed to apply Berto's without yours, but I just went back and rectified that, and applied yours. Jeff
Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
On 10/14/2015 02:19 PM, Peter Lieven wrote: > Am 08.10.2015 um 18:44 schrieb John Snow: >> >> On 10/08/2015 08:06 AM, Peter Lieven wrote: >>> Hi all, >>> >>> short summary from my side. The whole thing seems to get complicated, >>> let me explain why: >>> >>> 1) During review I found that the code in ide_atapi_cmd_reply_end can't >>> work correctly if the >>> byte_count_limit is not a divider or a multiple of cd_sector_size. The >>> reason is that as soon >>> as we load the next sector we start at io_buffer offset 0 overwriting >>> whatever is left in there >>> for transfer. We also reset the io_buffer_index to 0 which means if we >>> continue with the >>> elementary transfer we always transfer a whole sector (of corrupt data) >>> regardless if we >>> are allowed to transfer that much data. Before we consider fixing this I >>> wonder if it >>> is legal at all to have an unaligned byte_count_limit. It obviously has >>> never caused trouble in >>> practice so maybe its not happening in real life. >>> >> I had overlooked that part. Good catch. I do suspect that in practice >> nobody will be asking for bizarre values. >> >> There's no rule against an unaligned byte_count_limit as far as I have >> read, but suspect nobody would have a reason to use it in practice. >> >>> 2) I found that whatever cool optimization I put in to buffer multiple >>> sectors at once I end >>> up with code that breaks migration because older versions would either >>> not fill the io_buffer >>> as expected or we introduce variables that older versions do not >>> understand. This will >>> lead to problems if we migrate in the middle of a transfer. >>> >> Ech. This sounds like a bit of a problem. I'll need to think about this >> one... >> >>> 3) My current plan to get this patch to a useful state would be to use >>> my initial patch and just >>> change the code to use a sync request if we need to buffer additional >>> sectors in an elementary >>> transfer. I found that in real world operating systems the >>> byte_count_limit seems to be equal to >>> the cd_sector_size. After all its just a PIO transfer an operating >>> system will likely switch to DMA >>> as soon as the kernel ist loaded. >>> >>> Thanks, >>> Peter >>> >> It sounds like that might be "good enough" for now, and won't make >> behavior *worse* than it currently is. You can adjust the test I had >> checked in to not use a "tricky" value and we can amend support for this >> later if desired. > > Have you had a chance to look at the series with the "good enough" fix? > > Thanks, > Peter > Will do so Friday, thanks! --js
Re: [Qemu-block] [Qemu-devel] [PATCH 1/5] ide/atapi: make PIO read requests async
Am 08.10.2015 um 18:44 schrieb John Snow: > > On 10/08/2015 08:06 AM, Peter Lieven wrote: >> Hi all, >> >> short summary from my side. The whole thing seems to get complicated, >> let me explain why: >> >> 1) During review I found that the code in ide_atapi_cmd_reply_end can't >> work correctly if the >> byte_count_limit is not a divider or a multiple of cd_sector_size. The >> reason is that as soon >> as we load the next sector we start at io_buffer offset 0 overwriting >> whatever is left in there >> for transfer. We also reset the io_buffer_index to 0 which means if we >> continue with the >> elementary transfer we always transfer a whole sector (of corrupt data) >> regardless if we >> are allowed to transfer that much data. Before we consider fixing this I >> wonder if it >> is legal at all to have an unaligned byte_count_limit. It obviously has >> never caused trouble in >> practice so maybe its not happening in real life. >> > I had overlooked that part. Good catch. I do suspect that in practice > nobody will be asking for bizarre values. > > There's no rule against an unaligned byte_count_limit as far as I have > read, but suspect nobody would have a reason to use it in practice. > >> 2) I found that whatever cool optimization I put in to buffer multiple >> sectors at once I end >> up with code that breaks migration because older versions would either >> not fill the io_buffer >> as expected or we introduce variables that older versions do not >> understand. This will >> lead to problems if we migrate in the middle of a transfer. >> > Ech. This sounds like a bit of a problem. I'll need to think about this > one... > >> 3) My current plan to get this patch to a useful state would be to use >> my initial patch and just >> change the code to use a sync request if we need to buffer additional >> sectors in an elementary >> transfer. I found that in real world operating systems the >> byte_count_limit seems to be equal to >> the cd_sector_size. After all its just a PIO transfer an operating >> system will likely switch to DMA >> as soon as the kernel ist loaded. >> >> Thanks, >> Peter >> > It sounds like that might be "good enough" for now, and won't make > behavior *worse* than it currently is. You can adjust the test I had > checked in to not use a "tricky" value and we can amend support for this > later if desired. Have you had a chance to look at the series with the "good enough" fix? Thanks, Peter
Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu
Am 14.10.2015 um 13:06 hat Stefano Stabellini geschrieben: > On Wed, 14 Oct 2015, Kevin Wolf wrote: > > [ CC qemu-block ] > > > > Am 13.10.2015 um 19:10 hat Stefano Stabellini geschrieben: > > > On Tue, 13 Oct 2015, John Snow wrote: > > > > On 10/13/2015 11:55 AM, Fabio Fantoni wrote: > > > > > I added ahci disk support in libxl and using it for week seems that > > > > > was > > > > > ok, after a reply of Stefano Stabellini seems that xen disk unplug > > > > > support only ide disks: > > > > > http://git.qemu.org/?p=qemu.git;a=commitdiff;h=679f4f8b178e7c66fbc2f39c905374ee8663d5d8 > > > > > > > > > > Today Paul Durrant told me that even if pv disk is ok also with ahci > > > > > and > > > > > the emulated one is offline can be a risk: > > > > > http://lists.xenproject.org/archives/html/win-pv-devel/2015-10/msg00021.html > > > > > > > > > > > > > > > I tried to take a fast look in qemu code but I not understand the > > > > > needed > > > > > thing for add the xen disk unplug support also for ahci, can someone > > > > > do > > > > > it or tell me useful information for do it please? > > > > > > > > > > Thanks for any reply and sorry for my bad english. > > > > > > > > > > > > > I'm not entirely sure what features you need AHCI to support in order > > > > for Xen to be happy. > > > > > > > > I'd guess hotplugging, but where I get confused is that IDE disks don't > > > > support hotplugging either, so I guess I'm not sure sure what you need. > > > > > > > > Stefano, can you help bridge my Xen knowledge gap? > > > > > > Hi John, > > > > > > we need something like hw/i386/xen/xen_platform.c:unplug_disks but that > > > can unplug AHCI disk. And by unplug, I mean "make disappear" like > > > pci_piix3_xen_ide_unplug does for ide. > > > > Maybe this would be the right time to stop the craziness with your > > hybrid IDE/xendisk setup. It's a horrible thing that would never happen > > on real hardware. > > I would be quite happy to stop (or even get rid of) the craziness. > > > > Can't you just teach SeaBIOS how to deal with your PV disks and then > > only add that to your VM and forget about IDE/AHCI? I mean, that's how > > it's done for virtio-blk, and it doesn't involve any insanities like > > ripping out non-hotpluggable devices. > > Teaching SeaBIOS to deal with PV disks can be done, in fact we already > support PV disks in OVMF. It is possible to boot Windows with OVMF > without any IDE disks (patch pending for libxl to create a VM without > emulated IDE disks). > > However we have to be honest that implementing PV disk support in > SeaBIOS is a different magnitude of effort compared to implementing AHCI > "unplug". Agreed. But we generally try to do the right thing and not the easy thing. Maybe I'm missing something, but my impression was that the hybrid setup was only used so that you can boot from a PV disk even though the BIOS doesn't have a PV driver. In that case, why would you even want to use AHCI instead of IDE? During the early boot performance shouldn't be much different as there is no parallelism, and afterwards the PV driver is used. > I would suggest Fabio to avoid AHCI disks altogether and just use OVMF > with PV disks only and Anthony's patch to libxl to avoid creating any > IDE disks: http://marc.info/?l=xen-devel=144482080812353. > > Would that work for you? That sounds certainly like a better step forward than adding a crude hack to AHCI. And if you want to make me completely happy, plan to extend SeaBIOS so that we can even drop the existing hack in IDE. Kevin
Re: [Qemu-block] [PATCH V2 3/5] virtio-blk: disable scsi passthrough by default
On Wed, 15 Jul 2015 16:45:52 +0200 Paolo Bonziniwrote: > On 15/07/2015 16:28, Michael S. Tsirkin wrote: > > On Wed, Jul 15, 2015 at 04:18:49PM +0200, Paolo Bonzini wrote: > >> > >> > >> On 15/07/2015 16:14, Michael S. Tsirkin wrote: > >>> On Wed, Jul 15, 2015 at 02:47:24PM +0200, Paolo Bonzini wrote: > > > On 15/07/2015 14:21, Michael S. Tsirkin wrote: > >>> Disable scsi passthrough by default since it was incompatible with > >>> virtio 1.0. For legacy machine types, keep this on by default. > >>> > >>> Cc: Stefan Hajnoczi > >>> Cc: Kevin Wolf > >>> Cc: qemu-block@nongnu.org > >>> Signed-off-by: Jason Wang > > Seems risky for 2.4. modern is off by default for now. Can't we limit > > the change to when modern is enabled? > > That would have the effect of disabling a feature when you turn on > modern. > >>> > >>> What's wrong with that? > >> > >> Weren't you complaining about it a few hours ago? :) > > > > No, I complained about guest driver update disabling it. > > Ah, sorry for the confusion. > > > I suggested changing this from bool to on/off/auto, and > > make auto mean !modern. > > No, please do it like Jason did. The SCSI feature effectively had to be > enabled explicitly already, the requests were marked as unsupported. > >>> > >>> I didn't know. How is it enabled? > >> > >> It's enabled by default in QEMU, but disabled by default in libvirt. > >> And it only works if you pass a whole _disk_ (not a partition or logical > >> volume) to QEMU, which is definitely not the common case. > >> > >> It can just be documented in the release notes; the feature is still > >> available, and libvirt won't be broken because it adds explicitly both > >> scsi=on and scsi=off. > > > > So for libvirt, we don't really care about the default, right? > > For command line, would it not be friendlier to make it follow the > > modern flag automatically? > > I wouldn't mind if we grabbed the occasion to disable it altogether. > However, that would indeed work as well. Do we want to change anything for 2.5 about the default? Currently, we still default scsi to true, and you have to disable it explicitly if you want to use virtio-1 compliant virtio-blk devices (which is a bit annoying, as scsi passthrough is not something people usually want).
Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu
On Wed, 14 Oct 2015, Kevin Wolf wrote: > Am 14.10.2015 um 13:06 hat Stefano Stabellini geschrieben: > > On Wed, 14 Oct 2015, Kevin Wolf wrote: > > > [ CC qemu-block ] > > > > > > Am 13.10.2015 um 19:10 hat Stefano Stabellini geschrieben: > > > > On Tue, 13 Oct 2015, John Snow wrote: > > > > > On 10/13/2015 11:55 AM, Fabio Fantoni wrote: > > > > > > I added ahci disk support in libxl and using it for week seems that > > > > > > was > > > > > > ok, after a reply of Stefano Stabellini seems that xen disk unplug > > > > > > support only ide disks: > > > > > > http://git.qemu.org/?p=qemu.git;a=commitdiff;h=679f4f8b178e7c66fbc2f39c905374ee8663d5d8 > > > > > > > > > > > > Today Paul Durrant told me that even if pv disk is ok also with > > > > > > ahci and > > > > > > the emulated one is offline can be a risk: > > > > > > http://lists.xenproject.org/archives/html/win-pv-devel/2015-10/msg00021.html > > > > > > > > > > > > > > > > > > I tried to take a fast look in qemu code but I not understand the > > > > > > needed > > > > > > thing for add the xen disk unplug support also for ahci, can > > > > > > someone do > > > > > > it or tell me useful information for do it please? > > > > > > > > > > > > Thanks for any reply and sorry for my bad english. > > > > > > > > > > > > > > > > I'm not entirely sure what features you need AHCI to support in order > > > > > for Xen to be happy. > > > > > > > > > > I'd guess hotplugging, but where I get confused is that IDE disks > > > > > don't > > > > > support hotplugging either, so I guess I'm not sure sure what you > > > > > need. > > > > > > > > > > Stefano, can you help bridge my Xen knowledge gap? > > > > > > > > Hi John, > > > > > > > > we need something like hw/i386/xen/xen_platform.c:unplug_disks but that > > > > can unplug AHCI disk. And by unplug, I mean "make disappear" like > > > > pci_piix3_xen_ide_unplug does for ide. > > > > > > Maybe this would be the right time to stop the craziness with your > > > hybrid IDE/xendisk setup. It's a horrible thing that would never happen > > > on real hardware. > > > > I would be quite happy to stop (or even get rid of) the craziness. > > > > > > > Can't you just teach SeaBIOS how to deal with your PV disks and then > > > only add that to your VM and forget about IDE/AHCI? I mean, that's how > > > it's done for virtio-blk, and it doesn't involve any insanities like > > > ripping out non-hotpluggable devices. > > > > Teaching SeaBIOS to deal with PV disks can be done, in fact we already > > support PV disks in OVMF. It is possible to boot Windows with OVMF > > without any IDE disks (patch pending for libxl to create a VM without > > emulated IDE disks). > > > > However we have to be honest that implementing PV disk support in > > SeaBIOS is a different magnitude of effort compared to implementing AHCI > > "unplug". > > Agreed. But we generally try to do the right thing and not the easy > thing. Sure. I am quite happy to let other people do it though :-) > Maybe I'm missing something, but my impression was that the hybrid setup > was only used so that you can boot from a PV disk even though the BIOS > doesn't have a PV driver. In that case, why would you even want to use > AHCI instead of IDE? During the early boot performance shouldn't be much > different as there is no parallelism, and afterwards the PV driver is > used. The "unplug" code and the design choice predate me -- I am not sure why it was done like that. In those days there was no SeaBIOS: maybe they thought that writing a PV disk frontend in rombios would be madness? In addition isn't it true that some guests, once they see a PIIX3 chipset and they know that there is one disk, they might just try to access it directly via the IDE interface? I am thinking of some old versions of Windows. Are they really able to cope with the case where they can only access the disk via the legacy BIOS until non-IDE drivers come along? > > I would suggest Fabio to avoid AHCI disks altogether and just use OVMF > > with PV disks only and Anthony's patch to libxl to avoid creating any > > IDE disks: http://marc.info/?l=xen-devel=144482080812353. > > > > Would that work for you? > > That sounds certainly like a better step forward than adding a crude > hack to AHCI. > > And if you want to make me completely happy, plan to extend SeaBIOS so > that we can even drop the existing hack in IDE. There are lots of existing guests which write to the magic ioport to "unplug" disks. We would need some sort of deprecation plan.
Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu
> -Original Message- > From: Fabio Fantoni [mailto:fabio.fant...@m2r.biz] > Sent: 14 October 2015 12:12 > To: Kevin Wolf; Stefano Stabellini > Cc: John Snow; Anthony Perard; qemu-de...@nongnu.org; xen- > de...@lists.xen.org; qemu-block@nongnu.org; Paul Durrant > Subject: Re: [Qemu-devel] Question about xen disk unplug support for ahci > missed in qemu > > > > Il 14/10/2015 11:47, Kevin Wolf ha scritto: > > [ CC qemu-block ] > > > > Am 13.10.2015 um 19:10 hat Stefano Stabellini geschrieben: > >> On Tue, 13 Oct 2015, John Snow wrote: > >>> On 10/13/2015 11:55 AM, Fabio Fantoni wrote: > I added ahci disk support in libxl and using it for week seems that was > ok, after a reply of Stefano Stabellini seems that xen disk unplug > support only ide disks: > > http://git.qemu.org/?p=qemu.git;a=commitdiff;h=679f4f8b178e7c66fbc2f39 > c905374ee8663d5d8 > > Today Paul Durrant told me that even if pv disk is ok also with ahci and > the emulated one is offline can be a risk: > http://lists.xenproject.org/archives/html/win-pv-devel/2015- > 10/msg00021.html > > > I tried to take a fast look in qemu code but I not understand the > needed > thing for add the xen disk unplug support also for ahci, can someone do > it or tell me useful information for do it please? > > Thanks for any reply and sorry for my bad english. > > >>> I'm not entirely sure what features you need AHCI to support in order > >>> for Xen to be happy. > >>> > >>> I'd guess hotplugging, but where I get confused is that IDE disks don't > >>> support hotplugging either, so I guess I'm not sure sure what you need. > >>> > >>> Stefano, can you help bridge my Xen knowledge gap? > >> > >> Hi John, > >> > >> we need something like hw/i386/xen/xen_platform.c:unplug_disks but > that > >> can unplug AHCI disk. And by unplug, I mean "make disappear" like > >> pci_piix3_xen_ide_unplug does for ide. > > Maybe this would be the right time to stop the craziness with your > > hybrid IDE/xendisk setup. It's a horrible thing that would never happen > > on real hardware. Unfortunately, it's going to be difficult to remove such 'craziness' when you don't know a priori whether the VM has PV drivers or not. > > > > Can't you just teach SeaBIOS how to deal with your PV disks and then > > only add that to your VM and forget about IDE/AHCI? I mean, that's how > > it's done for virtio-blk, and it doesn't involve any insanities like > > ripping out non-hotpluggable devices. > > Does Windows have in-box virtio-blk drivers? If not, how does it boot? > > Hm... How does a reboot of a machine that had its IDE already removed > > actually work? Do you restart the qemu process on reboot? > > The IDE disks are always present during boot, but before the OS enumerates them they are 'unplugged' and then PV drivers are used instead. The only other way would be to somehow obscure them from OS enumeration so they could be left lying around but remain unused. That's feasible for Windows, but I don't know about other OS. Paul > > Kevin > I thinkthat would be a good idea, unfortunately I'm not able to do that > and I do not know if the developers of xen would agree to such modification. > > If will be done, for example having new disk type "xenpv" require the pv > drivers installed, with linux having drivers included should not be a > problem but on windows yes FWIK. > Like virtio driver should be installed before (or adding them in windows > install), I don't know if will be ok specifying them in install for with > xen driver, another possibility can be start domU with ide disk type, > install the xen pv drivers and reboot selecting xenpv disk type instead. > Doing it FWIK require not only xen and qemu changes but also pv drivers > change, I added on cc also Paul Durrant for see what think about. > With actual unplug based on my tests in some years I had many problem > with windows domUs (with both old and new pv drivers) resulting in > "windows boot blue screen error", I suppose that changing/improving > unplug method can decrease them, is it correct? > > About reboot xen do different from kvm recreating full domU each time > (and new qemu process), I don't know if is possible and good change the > method. > > Thanks for any reply and sorry for my bad english.
Re: [Qemu-block] [PATCH 1/3] block: Use bdrv_lookup_bs() instead of bdrv_find_node()
On 14.10.2015 15:16, Jeff Cody wrote: > This is a precursor to making bdrv_find_node() static, and internal > to block.c > > To find a BlockDriverState interface, it can be done via blk_by_name(), > bdrv_find_node(), and bdrv_lookup_bs(). The latter can take the place > of the other two, in the instances where we are only concerned with > the BlockDriverState. > > There is no benefit in calling bdrv_find_node() directly. This patch > replaces all calls to bdrv_find_node() outside of block.c with > bdrv_lookup_bs(). > > Signed-off-by: Jeff Cody> --- > block/block-backend.c | 2 +- > block/mirror.c | 2 +- > block/write-threshold.c | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH 3/3] block: use bdrv_lookup_bs() over blk_by_name() for BDS only results
To find a BlockDriverState interface, it can be done via blk_by_name(), bdrv_find_node(), and bdrv_lookup_bs(). The latter can take the place of the other two, in the instances where we are only concerned with the BlockDriverState. In much of the usage of blk_by_name(), we can simplify the code by calling bdrv_lookup_bs() instead of blk_by_name(), when what we need is just the BDS, and not the BB. Signed-off-by: Jeff Cody--- blockdev.c| 84 --- migration/block.c | 6 ++-- 2 files changed, 32 insertions(+), 58 deletions(-) diff --git a/blockdev.c b/blockdev.c index fe182bb..6a893e1 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1033,18 +1033,18 @@ fail: void hmp_commit(Monitor *mon, const QDict *qdict) { const char *device = qdict_get_str(qdict, "device"); -BlockBackend *blk; +BlockDriverState *bs; int ret; if (!strcmp(device, "all")) { ret = bdrv_commit_all(); } else { -blk = blk_by_name(device); -if (!blk) { +bs = bdrv_lookup_bs(device, NULL, NULL); +if (!bs) { monitor_printf(mon, "Device '%s' not found\n", device); return; } -ret = bdrv_commit(blk_bs(blk)); +ret = bdrv_commit(bs); } if (ret < 0) { monitor_printf(mon, "'commit' error for '%s': %s\n", device, @@ -1122,20 +1122,18 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device, Error **errp) { BlockDriverState *bs; -BlockBackend *blk; AioContext *aio_context; QEMUSnapshotInfo sn; Error *local_err = NULL; SnapshotInfo *info = NULL; int ret; -blk = blk_by_name(device); -if (!blk) { +bs = bdrv_lookup_bs(device, NULL, NULL); +if (!bs) { error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Device '%s' not found", device); return NULL; } -bs = blk_bs(blk); if (!has_id) { id = NULL; @@ -1300,7 +1298,6 @@ static void internal_snapshot_prepare(BlkTransactionState *common, Error *local_err = NULL; const char *device; const char *name; -BlockBackend *blk; BlockDriverState *bs; QEMUSnapshotInfo old_sn, *sn; bool ret; @@ -1319,13 +1316,12 @@ static void internal_snapshot_prepare(BlkTransactionState *common, name = internal->name; /* 2. check for validation */ -blk = blk_by_name(device); -if (!blk) { +bs = bdrv_lookup_bs(device, NULL, NULL); +if (!bs) { error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Device '%s' not found", device); return; } -bs = blk_bs(blk); /* AioContext is released in .clean() */ state->aio_context = bdrv_get_aio_context(bs); @@ -1613,20 +1609,18 @@ static void drive_backup_prepare(BlkTransactionState *common, Error **errp) { DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common); BlockDriverState *bs; -BlockBackend *blk; DriveBackup *backup; Error *local_err = NULL; assert(common->action->kind == TRANSACTION_ACTION_KIND_DRIVE_BACKUP); backup = common->action->drive_backup; -blk = blk_by_name(backup->device); -if (!blk) { +bs = bdrv_lookup_bs(backup->device, NULL, NULL); +if (!bs) { error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Device '%s' not found", backup->device); return; } -bs = blk_bs(blk); /* AioContext is released in .clean() */ state->aio_context = bdrv_get_aio_context(bs); @@ -1682,25 +1676,22 @@ static void blockdev_backup_prepare(BlkTransactionState *common, Error **errp) BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, common); BlockdevBackup *backup; BlockDriverState *bs, *target; -BlockBackend *blk; Error *local_err = NULL; assert(common->action->kind == TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP); backup = common->action->blockdev_backup; -blk = blk_by_name(backup->device); -if (!blk) { +bs = bdrv_lookup_bs(backup->device, NULL, NULL); +if (!bs) { error_setg(errp, "Device '%s' not found", backup->device); return; } -bs = blk_bs(blk); -blk = blk_by_name(backup->target); -if (!blk) { +target = bdrv_lookup_bs(backup->target, NULL, NULL); +if (!target) { error_setg(errp, "Device '%s' not found", backup->target); return; } -target = blk_bs(blk); /* AioContext is released in .clean() */ state->aio_context = bdrv_get_aio_context(bs); @@ -2324,7 +2315,6 @@ void qmp_block_stream(const char *device, bool has_on_error, BlockdevOnError on_error, Error **errp) { -BlockBackend *blk; BlockDriverState *bs; BlockDriverState *base_bs = NULL; AioContext *aio_context;
[Qemu-block] [PATCH 2/3] block: make bdrv_find_node() static
This patch does two things: it moves bdrv_find_node() up before the first usage in block.c, and it makes the function static so that it is only internal to block.c. Signed-off-by: Jeff Cody--- block.c | 30 +++--- include/block/block.h | 1 - 2 files changed, 15 insertions(+), 16 deletions(-) diff --git a/block.c b/block.c index f95a59d..da1a6a9 100644 --- a/block.c +++ b/block.c @@ -759,6 +759,21 @@ static int bdrv_open_flags(BlockDriverState *bs, int flags) return open_flags; } +/* This function is to find a node in the bs graph */ +static BlockDriverState *bdrv_find_node(const char *node_name) +{ +BlockDriverState *bs; + +assert(node_name); + +QTAILQ_FOREACH(bs, _bdrv_states, node_list) { +if (!strcmp(node_name, bs->node_name)) { +return bs; +} +} +return NULL; +} + static void bdrv_assign_node_name(BlockDriverState *bs, const char *node_name, Error **errp) @@ -2720,21 +2735,6 @@ void bdrv_iterate_format(void (*it)(void *opaque, const char *name), g_free(formats); } -/* This function is to find a node in the bs graph */ -BlockDriverState *bdrv_find_node(const char *node_name) -{ -BlockDriverState *bs; - -assert(node_name); - -QTAILQ_FOREACH(bs, _bdrv_states, node_list) { -if (!strcmp(node_name, bs->node_name)) { -return bs; -} -} -return NULL; -} - /* Put this QMP function here so it can access the static graph_bdrv_states. */ BlockDeviceInfoList *bdrv_named_nodes_list(Error **errp) { diff --git a/include/block/block.h b/include/block/block.h index 1520dee..2b6e1b2 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -404,7 +404,6 @@ int bdrv_media_changed(BlockDriverState *bs); void bdrv_lock_medium(BlockDriverState *bs, bool locked); void bdrv_eject(BlockDriverState *bs, bool eject_flag); const char *bdrv_get_format_name(BlockDriverState *bs); -BlockDriverState *bdrv_find_node(const char *node_name); BlockDeviceInfoList *bdrv_named_nodes_list(Error **errp); BlockDriverState *bdrv_lookup_bs(const char *device, const char *node_name, -- 1.9.3
[Qemu-block] [PATCH 1/3] block: Use bdrv_lookup_bs() instead of bdrv_find_node()
This is a precursor to making bdrv_find_node() static, and internal to block.c To find a BlockDriverState interface, it can be done via blk_by_name(), bdrv_find_node(), and bdrv_lookup_bs(). The latter can take the place of the other two, in the instances where we are only concerned with the BlockDriverState. There is no benefit in calling bdrv_find_node() directly. This patch replaces all calls to bdrv_find_node() outside of block.c with bdrv_lookup_bs(). Signed-off-by: Jeff Cody--- block/block-backend.c | 2 +- block/mirror.c | 2 +- block/write-threshold.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 2256551..7026a3f 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -67,7 +67,7 @@ BlockBackend *blk_new(const char *name, Error **errp) error_setg(errp, "Device with id '%s' already exists", name); return NULL; } -if (bdrv_find_node(name)) { +if (bdrv_lookup_bs(NULL, name, NULL)) { error_setg(errp, "Device name '%s' conflicts with an existing node name", name); diff --git a/block/mirror.c b/block/mirror.c index 7e43511..cb3c765 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -644,7 +644,7 @@ static void mirror_complete(BlockJob *job, Error **errp) if (s->replaces) { AioContext *replace_aio_context; -s->to_replace = bdrv_find_node(s->replaces); +s->to_replace = bdrv_lookup_bs(NULL, s->replaces, NULL); if (!s->to_replace) { error_setg(errp, "Node name '%s' not found", s->replaces); return; diff --git a/block/write-threshold.c b/block/write-threshold.c index a53c1f5..908fa7f 100644 --- a/block/write-threshold.c +++ b/block/write-threshold.c @@ -110,7 +110,7 @@ void qmp_block_set_write_threshold(const char *node_name, BlockDriverState *bs; AioContext *aio_context; -bs = bdrv_find_node(node_name); +bs = bdrv_lookup_bs(NULL, node_name, NULL); if (!bs) { error_setg(errp, "Device '%s' not found", node_name); return; -- 1.9.3
Re: [Qemu-block] [PATCH v6 24/39] blockdev: Do not create BDS for empty drive
Am 12.10.2015 um 22:00 hat Max Reitz geschrieben: > Do not use "rudimentary" BDSs for empty drives any longer (for > freshly created drives). > > After a follow-up patch, empty drives will generally use a NULL BDS, not > only the freshly created drives. > > Signed-off-by: Max Reitz> --- > blockdev.c | 72 > ++ > 1 file changed, 44 insertions(+), 28 deletions(-) > > diff --git a/blockdev.c b/blockdev.c > index 35efe84..845a1c1 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -514,16 +514,44 @@ static BlockBackend *blockdev_init(const char *file, > QDict *bs_opts, > goto early_err; > } > > +if (snapshot) { > +/* always use cache=unsafe with snapshot */ > +bdrv_flags &= ~BDRV_O_CACHE_MASK; > +bdrv_flags |= (BDRV_O_SNAPSHOT|BDRV_O_CACHE_WB|BDRV_O_NO_FLUSH); > +} > + > +if (copy_on_read) { > +bdrv_flags |= BDRV_O_COPY_ON_READ; > +} > + > +if (runstate_check(RUN_STATE_INMIGRATE)) { > +bdrv_flags |= BDRV_O_INCOMING; > +} Bad conflict resolution when you added patch 2, which moved the BDRV_O_INCOMING code to block.c? That patch is actually essential here as well: The flag would ends up in the root state, and bdrv_invalidate_cache_all() doesn't clear the flag any more. So we might end up opening an image with BDRV_O_INCOMING even though migration has long finished. Kevin
Re: [Qemu-block] [PATCH v10 08/10] Implement new driver for block replication
On 10/14/2015 10:27 PM, Stefan Hajnoczi wrote: > On Tue, Oct 13, 2015 at 05:08:17PM +0800, Wen Congyang wrote: >> On 10/13/2015 12:27 AM, Stefan Hajnoczi wrote: >>> On Fri, Sep 25, 2015 at 02:17:36PM +0800, Wen Congyang wrote: +/* start backup job now */ +bdrv_op_unblock(s->hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET, +s->active_disk->backing_blocker); +bdrv_op_unblock(s->secondary_disk, BLOCK_OP_TYPE_BACKUP_SOURCE, +s->hidden_disk->backing_blocker); >>> >>> Why is it safe to unblock these operations? >>> >>> Why do they have to be blocked for non-replication users? >> >> hidden_disk and secondary disk are opened as backing file, so it is blocked >> for >> non-replication users. >> What can I do if I don't unblock it and want to do backup? > > CCing Jeff Cody, block jobs maintainer > > You need to explain why it is safe remove this protection. We can't > merge code that may be unsafe. > > I think we can investigate further by asking: when does QEMU code assume > the backing file is read-only? The backing file is opened in read-only mode. I want to reopen it in read-write mode here in the next version(So the patch 1 will be dropped) > > I haven't checked but these cases come to mind: > > Operations that move data between BDS in the backing chain (e.g. commit > and stream block jobs) will lose or overwrite data if the backing file > is being written to by another coroutine. > > We need to prevent users from running these operations at the same time. Yes, but qemu doesn't provide such API. > > Also, accessing bs->backing_blocker is a layering violation. No one > outside block.c:bdrv_set_backing_hd() is supposed to access this field. I agree with it. Thanks Wen Congyang > > Let's figure out the safety concerns first and then the > bs->backing_blocker access will probably be eliminated as part of the > solution. > > Stefan > . >
Re: [Qemu-block] [Qemu-devel] [PATCH] Revert "blockdev: add note that block_job_cb() must be thread-safe"
- Original Message - > On Tue, Oct 13, 2015 at 06:16:15PM +0800, Fam Zheng wrote: > > This reverts commit 723c5d93c51bdb3adbc238ce90195c0864aa6cd5. > > > > block_job_cb is called by block_job_completed, which is always called in > > a main loop bottom half in existing block jobs. So we don't need to > > worry about thread-safety here. > > This is not correct. Search for block_job_completed() callers. > > For example, block/stream.c has early exit cases that call > block_job_completed() from the coroutine (i.e. dispatched from a > coroutine in another AioContext+IOThread). > > I think you are assuming that all block_job_completed() callers are > called from a function scheduled using block_job_defer_to_main_loop(). No, I'm assuming all block_job_completed() callers are (and they should be) from main thread. Even the early exit cases in stream are so, because they are in the same thread as stream_start, which is main thread. > > Please double-check this. > > Thanks, > Stefan > >
Re: [Qemu-block] [PATCH] Revert "blockdev: add note that block_job_cb() must be thread-safe"
On Tue, Oct 13, 2015 at 06:16:15PM +0800, Fam Zheng wrote: > This reverts commit 723c5d93c51bdb3adbc238ce90195c0864aa6cd5. > > block_job_cb is called by block_job_completed, which is always called in > a main loop bottom half in existing block jobs. So we don't need to > worry about thread-safety here. This is not correct. Search for block_job_completed() callers. For example, block/stream.c has early exit cases that call block_job_completed() from the coroutine (i.e. dispatched from a coroutine in another AioContext+IOThread). I think you are assuming that all block_job_completed() callers are called from a function scheduled using block_job_defer_to_main_loop(). Please double-check this. Thanks, Stefan
Re: [Qemu-block] [PATCH v10 02/10] Backup: clear all bitmap when doing block checkpoint
On Tue, Oct 13, 2015 at 05:13:14PM +0800, Wen Congyang wrote: > On 10/12/2015 09:45 PM, Stefan Hajnoczi wrote: > > On Fri, Sep 25, 2015 at 02:17:30PM +0800, Wen Congyang wrote: > >> Signed-off-by: Wen Congyang> >> Signed-off-by: zhanghailiang > >> Signed-off-by: Gonglei > >> Reviewed-by: Jeff Cody > >> --- > >> block/backup.c | 14 ++ > >> blockjob.c | 11 +++ > >> include/block/blockjob.h | 12 > >> 3 files changed, 37 insertions(+) > >> > >> diff --git a/block/backup.c b/block/backup.c > >> index c61e4c3..5e5995e 100644 > >> --- a/block/backup.c > >> +++ b/block/backup.c > >> @@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job) > >> } > >> } > >> > >> +static void backup_do_checkpoint(BlockJob *job, Error **errp) > >> +{ > >> +BackupBlockJob *backup_job = container_of(job, BackupBlockJob, > >> common); > >> + > >> +if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) { > >> +error_setg(errp, "The backup job only supports block checkpoint > >> in" > >> + " sync=none mode"); > >> +return; > >> +} > >> + > >> +hbitmap_reset_all(backup_job->bitmap); > >> +} > > > > Is this a fast way to stop and then start a new backup blockjob without > > emitting block job lifecycle events? > > > > Not sure the blockjob_do_checkpoint() interface is appropriate. Is > > there any other block job type that will implement .do_checkpoint()? > > Currently, the answer is no. > > > > > COLO block replication could call a public backup_do_checkpoint() > > function. That way the direct coupling between COLO and the backup > > block job is obvious. I'm not convinced a generic interface like > > blockjob_do_checkpoint() makes sense since it's really not a generic > > operation that makes sense for other block job types. > > > > void backup_do_checkpoint(BlockJob *job, Error **errp) > > { > > BackupBlockJob *s; > > > > if (job->driver != backup_job_driver) { > > error_setg(errp, "expected backup block job type for " > >"checkpoint, got %d", job->driver->job_type); > > return; > > } > > > > s = container_of(job, BackupBlockJob, common); > > ... > > } > > In a older version, I implement it like this, but Paolo didn't like it. It's a question of taste. In this case it seems to me that there is really a direct coupling between COLO and the backup block job. This isn't really a generic interface that makes sense in other scenarios. That's why I advocate for direct coupling instead of pretending this is a generic interface. I wish COLO could just stop the existing block job and start a new one for each checkpoint. In reality we probably don't want QMP events and the full block job lifecycle for each checkpoint... But anyway, I like this approach because it does not require a new interface at all. > > Please also make the function name and documentation more specific. > > Instead of "do" maybe this should be "pre" or "post" to indicate whether > > this happens before or after the checkpoint commit. What happens if > > OK > > > this function returns an error? > > We just return this error to COLO, and COLO will do failover. Okay, I ask these questions because the information should be part of the doc comment for this new interface. Stefan