date:20181101

Re: [libvirt] [PATCH] nwfilter: intantiate filters on loading driver

2018-11-01 Thread John Ferlan



On 11/1/18 3:26 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 01.11.2018 03:48, John Ferlan wrote:
>>
>>
>> On 10/31/18 10:41 AM, Nikolay Shirokovskiy wrote:
>>> On 31.10.2018 16:14, John Ferlan wrote:


 On 10/30/18 3:24 AM, Nikolay Shirokovskiy wrote:
>
>
> On 29.10.2018 22:37, John Ferlan wrote:
>>
>>
>> On 10/15/18 4:26 AM, Nikolay Shirokovskiy wrote:
>>> Before using filters binding filters instantiation was done by 
>>> hypervisors
>>> drivers initialization code (qemu was the only such hypervisor). Now 
>>> qemu
>>> reconnection done supposes it should be done by nwfilter driver 
>>> probably.
>>> Let's add this missing step.
>>>
>>> Signed-off-by: Nikolay Shirokovskiy 
>>> ---
>>>  src/nwfilter/nwfilter_driver.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>
>> If there's research you've done where the instantiation was done before
>> introduction of the nwfilter bindings that would be really helpful...
>>
>> I found that virNWFilterBuildAll was introduced in commit 3df907bfff.
>> There were 2 callers for it:
>>
>>1. virNWFilterTriggerRebuildImpl
>>2. nwfilterStateReload
>>
>> The former called as part of the virNWFilterConfLayerInit callback
>> during nwfilterStateInitialize (about 50 lines earlier).

 First off let me say you certainly find a lot of interesting bugs/issues
 that are complex and that's good. Often times I wonder how you trip
 across things or how to quantify what you've found. Some are simpler
 than others. This one on the surface would seem to be simple, but I keep
 wondering is there a downside.
>>>
>>> The issue is related to my recent posts:
>>>
>>> [1] [RFC] Faster libvirtd restart with nwfilter rules
>>> https://www.redhat.com/archives/libvir-list/2018-September/msg01206.html
>>> which continues here:
>>> https://www.redhat.com/archives/libvir-list/2018-October/msg00657.html
>>>
>>> In short if there is lots of VMs with filters then libvirtd restart takes a 
>>> lot of time
>>> during which libvirtd is unresponsive. By the way the issue is found in 
>>> libvirt
>>> versions 3.9.0 and ealier (don't know of newer versions, Virtuozzo is based 
>>> on 3.9.0 now,
>>> just like Centos7 :) )
>>
>> So many different issues - trying to focus on just the one for this
>> particular patch.
>>
>>>
>>> And attempts to fix it:
>>>
>>> [2] [PATCH RFC 0/4] nwfilter: don't reinstantiate filters if they are not 
>>> changed
>>> https://www.redhat.com/archives/libvir-list/2018-October/msg00904.html
>>>
>>> [3]  [PATCH v2 0/2] nwfilter: don't reinstantiate rules if there is no need 
>>> to
>>> https://www.redhat.com/archives/libvir-list/2018-October/msg01317.html
>>>
>>> And the reason I started v2 is Laine mentioned that it is important to
>>> reinstantiate rules on restart (v1 do not care if somebody messed tables).
>>>
>>
>> I've seen the patches and even read some briefly, but still need to take
>> this particular patch as separate from those.
>>
>>> And I discovered that upstream version stops reinstantiating rules at all
>>> after introducing filters bindings. And this functionality is old:
>>
>> So the key is "when" did that happen?  That is can you pinpoint or
>> bisect a time in history where the filters were instantiated? And by
>> instantiated what exactly (call) are you referencing?
> 
> The below commit is the one. It adds instantiating filters on
> libvirtd restart. By instantiating I mean that firewall rules
> correspondent to filters are actually get [re]added to iptables etc.
> 
>>
>>>
>>> commit cf6f8b9a9720fe5323a84e690de9fbf8ba41f6ac
>>> Author: Stefan Berger 
>>> Date:   Mon Aug 16 12:59:54 2010 -0400
>>>
>>> nwfilter: extend nwfilter reload support
>>>
>>
>> Yes, nwfilter is old, brittle, and in need of attention. Not sure anyone
>> really wants to take on the task though! I just realized I never got the
>> common object code implemented there. Mostly because of lock issues.
>> Suffice to say there's more "interesting" changes in the nwfilter space
>> being discussed internally.
>>

 The nwfilter processing is kindly said rather strange, complex, and to a
 degree fragile. Thus when patches come along they are met with greater
 scrutiny. From just reading the commit message here I don't get a sense
 for the problem and why the solution fixes it. So I'm left to try and
 research myself and ask a lot of questions.

 BTW, some of the detail provided in this response is really useful as
 either part of a cover or after the --- in the single patch. I would
 think you'd "know" when you've done lots of research into a problem that
 providing reviews more than a mere hint would be useful! For nwfilter
 bindings, it's hard to imagine this is one of those - it just happened
 type events. There seems to be a very specific sequence that's missing
 from th

Re: [libvirt] [PATCH] nodedev: Document the udevEventHandleThread

2018-11-01 Thread John Ferlan




On 11/1/18 7:36 AM, Erik Skultety wrote:
> On Fri, Oct 19, 2018 at 08:04:43AM -0400, John Ferlan wrote:
>> Add details of the algorithm and why it was used to help future
>> readers understand the issues encountered. Based largely off a
>> description provided by Erik Skultety .
> 
> Since the link to the archive would be missing from the commit message, I feel
> like simply blaming commit cdbe13329a7 for lacking documentation/justification
> of the algorithm used is the right way of composing the commit message.
> 

OK - thanks for the commit link...

> I appreciate the effort to document it properly, although I feel it's a bit
> overwhelming, so the comments I have below is just a sheer attempt to reduce
> the amount of text documenting a single function, you know, if a function 
> needs
> a comment like that, it suggests that there's something wrong with that
> function...
> 

Understood - I threw a lot of spaghetti on the wall and altering what's
here is perfectly fine. I didn't want to leave something out that you
may have found important...

BTW: Another explanation for a lengthy function header - it may suggest
that if you're going to post a patch changing it's logic, you had better
know what you're doing ;-)

>>
>> Signed-off-by: John Ferlan 
>> ---
>>
>>  see: https://www.redhat.com/archives/libvir-list/2018-October/msg00915.html
>>
>>  src/node_device/node_device_udev.c | 49 ++
>>  1 file changed, 49 insertions(+)
>>
>> diff --git a/src/node_device/node_device_udev.c 
>> b/src/node_device/node_device_udev.c
>> index 22897591de..8ca81e65ad 100644
>> --- a/src/node_device/node_device_udev.c
>> +++ b/src/node_device/node_device_udev.c
>> @@ -1591,6 +1591,53 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
>>  }
>>
>>
>> +/**
>> + * udevEventHandleThread
>> + * @opaque: unused
>> + *
>> + * Thread to handle the udevEventHandleCallback processing when udev
>> + * tells us there's a device change for us (add, modify, delete, etc).
>> + *
>> + * Management of the processing udev device notification is a delicate
>> + * balance between the udev process, the scheduler, and when exactly the
>> + * data from/for the socket is received.
> 
> I'd drop ^this sentence, the issue with this unfortunate mix of factors is
> explained further down, let's try keeping the amount of commentary down.
> 

I'll move it to the commit message ;-)

>> + * The udev processing code notifies
>> + * the udevEventHandleCallback that it has data; however, the actual recv()
>> + * done in the udev code to fill in the @device structure can be delayed
>> + * due to how threads are scheduled.
> 
> How about:
> "When we get notified by udev that there are data to be processed, the actual
> data retrieval by libudev may be delayed due to how threads are scheduled."
> 
> + *
>> + * As it turns out, the event loop could be scheduled (and it in fact
>> + * was) earlier than the handler thread. What that essentially means is
>> + * that by the time the thread actually handled the event and read the
>> + * data from the monitor, the event loop fired the very same event, simply
>> + * because the data hadn't been retrieved from the socket at that point yet.
> 
> ^This doesn't need to be a separate paragraph, you can connect it to the
> previous one. Also, some words don't feel right for commentary purposes (I 
> know
> they're mine :D). How about:
> 
> "In fact, the event loop could be scheduled earlier than the handler thread,
> thus potentially emitting the very same event the handler thread is currently
> trying to process, simply because the data hadn't been retrieved from the
> socket at that time yet."
> 
>> + *
>> + * Thus this algorithm is that once udevEventHandleCallback is notified
>> + * there is data,
> 
> ^this can be dropped...
> 
>> + * this loop will process as much data as possible until
>> + * udev_monitor_receive_device fails to get the @device. This may be
>> + * because we've processed everything or because the scheduling thread
>> + * hasn't been able to populate data in the socket. Once we're done
>> + * processing everything we can, wait on the next event/notification.
> 
> How about:
> 
> "Instead of waiting for the next event after retrieving a device data we keep
> reading from the udev monitor until we encounter one of EAGAIN or EWOULDBLOCK
> errors, only then we'll wait for the next event."
> 
> Also, I'd move that bit into the function body, so it's directly tied to
> the piece of code responsible for it.
> 
>> + * Although this may incur a slight delay if the socket data wasn't
>> + * populated, the event/notifications are fairly consistent so there's
>> + * no need to use virCondWaitUntil.
> 
> ^This could be dropped.
> 
>> + *
>> + * NB: Usage of event based socket algorithm causes some issues with
>> + * older platforms such as CentOS 6 in which the data socket is opened
>> + * without the NONBLOCK bit set. Still even if the reset of priv->dataReady
>> + * were moved to ju

[libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread

2018-11-01 Thread John Ferlan

Commit cdbe1332 neglected to document the API. So let's add some
details about the algorithm and why it was used to help future
readers understand the issues encountered. Based largely off a
description provided by Erik Skultety .

NB: Management of the processing udev device notification is a
delicate balance between the udev process, the scheduler, and when
exactly the data from/for the socket is received. The balance is
particularly important for environments when multiple devices are
added into the system more or less simultaneously such as is done
for mdev. In these cases scheduler blocking on the udev recv() call
is more noticeable. It's expected that future devices will follow
similar algorithms. Even though the algorithm does present some
challenges for older OS's (such as Centos 6), trying to rewrite
the algorithm to fit both models would be more complex and involve
pulling the monitor object out of the private data lockable object
and would need to be guarded by a separate lock. Devising such an
algorithm to work around issues with older OS's at the expense
of more modern OS algorithms in newer event processing code may
result in unexpected issues, so the choice is to encourage use
of newer OS's with newer udev event processing code.

Signed-off-by: John Ferlan 
---

 v1: https://www.redhat.com/archives/libvir-list/2018-October/msg01053.html

 Changes are from code review with some minor tweaks/editing as I
 went along. Mistakes are my own!

 src/node_device/node_device_udev.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/src/node_device/node_device_udev.c 
b/src/node_device/node_device_udev.c
index 22897591de..f2c2299d4d 100644
--- a/src/node_device/node_device_udev.c
+++ b/src/node_device/node_device_udev.c
@@ -1591,6 +1591,26 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
 }
 
 
+/**
+ * udevEventHandleThread
+ * @opaque: unused
+ *
+ * Thread to handle the udevEventHandleCallback processing when udev
+ * tells us there's a device change for us (add, modify, delete, etc).
+ *
+ * Once notified there is data to be processed, the actual @device
+ * data retrieval by libudev may be delayed due to how threads are
+ * scheduled. In fact, the event loop could be scheduled earlier than
+ * the handler thread, thus potentially emitting the very same event
+ * the handler thread is currently trying to process, simply because
+ * the data hadn't been retrieved from the socket.
+ *
+ * NB: Usage of event based socket algorithm causes some issues with
+ * older platforms such as CentOS 6 in which the data socket is opened
+ * without the NONBLOCK bit set. That can be avoided by moving the reset
+ * priv->dataReady to just after the udev_monitor_receive_device; however,
+ * scheduler issues would still come into play.
+ */
 static void
 udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
 {
@@ -1637,6 +1657,9 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
 return;
 }
 
+/* Trying to move the reset of the @priv->dataReady flag to
+ * after the udev_monitor_receive_device wouldn't help much
+ * due to event mgmt and scheduler timing. */
 virObjectLock(priv);
 priv->dataReady = false;
 virObjectUnlock(priv);
@@ -1646,6 +1669,11 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
 
 udevHandleOneDevice(device);
 udev_device_unref(device);
+
+/* Instead of waiting for the next event after processing @device
+ * data, let's keep reading from the udev monitor and only wait
+ * for the next event once either a EAGAIN or a EWOULDBLOCK error
+ * is encountered. */
 }
 }
 
-- 
2.17.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v2 3/3] qemu: Narrow the shutdown reconnection failure reason window

2018-11-01 Thread John Ferlan

The current qemuProcessReconnect logic paints a broad brush
determining that the shutdown reason must be crashed if it was
determined that the domain was started with -no-shutdown; however,
there's many other ways to get to the error label, so let's narrow
our reasoning window for using VIR_DOMAIN_SHUTOFF_CRASHED to the
period where we essentially know we've tried to create to the
monitor and before we were successful in opening the connection.

Failures that occur outside that window would thus be considered
as VIR_DOMAIN_SHUTOFF_UNKNOWN, at least for now.

Signed-off-by: John Ferlan 
---
 src/qemu/qemu_process.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 7112f3c048..85f30bf9d4 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -7757,6 +7757,7 @@ qemuProcessReconnect(void *opaque)
 bool jobStarted = false;
 virCapsPtr caps = NULL;
 bool retry = true;
+bool tryMonReconn = false;
 
 VIR_FREE(data);
 
@@ -7797,6 +7798,8 @@ qemuProcessReconnect(void *opaque)
 VIR_DEBUG("Reconnect monitor to def=%p name='%s' retry=%d",
   obj, obj->def->name, retry);
 
+tryMonReconn = true;
+
 /* XXX check PID liveliness & EXE path */
 if (qemuConnectMonitor(driver, obj, QEMU_ASYNC_JOB_NONE, retry, NULL) < 0)
 goto error;
@@ -7993,7 +7996,8 @@ qemuProcessReconnect(void *opaque)
  * If we cannot get to the monitor when the QEMU command
  * line used -no-shutdown, then we can safely say that the
  * domain crashed; otherwise we don't really know. */
-if (qemuDomainIsUsingNoShutdown(priv))
+if (!priv->mon && tryMonReconn &&
+qemuDomainIsUsingNoShutdown(priv))
 state = VIR_DOMAIN_SHUTOFF_CRASHED;
 else
 state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
-- 
2.17.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v2 2/3] qemu: Restore lost shutdown reason

2018-11-01 Thread John Ferlan

When qemuProcessReconnectHelper was introduced (commit d38897a5d)
reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
or VIR_DOMAIN_SHUTOFF_UNKNOWN.

When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.

So introduce qemuDomainIsUsingNoShutdown which will manage the
condition when the domain was started with -no-shutdown so that
when/if reconnection failure occurs we can restore the decision
point used to determine whether CRASHED or UNKNOWN is provided.

Signed-off-by: John Ferlan 
---
 src/qemu/qemu_command.c |  6 +-
 src/qemu/qemu_domain.c  | 17 +
 src/qemu/qemu_domain.h  |  3 +++
 src/qemu/qemu_process.c | 15 ++-
 4 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 6e3ff67660..5b94931af6 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -6415,11 +6415,7 @@ qemuBuildPMCommandLine(virCommandPtr cmd,
 if (priv->allowReboot == VIR_TRISTATE_BOOL_NO)
 virCommandAddArg(cmd, "-no-reboot");
 
-/* If JSON monitor is enabled, we can receive an event
- * when QEMU stops. If we use no-shutdown, then we can
- * watch for this event and do a soft/warm reboot.
- */
-if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
+if (qemuDomainIsUsingNoShutdown(priv))
 virCommandAddArg(cmd, "-no-shutdown");
 
 if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_NO_ACPI)) {
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index ba3fff607a..045a7b4ac0 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -13561,3 +13561,20 @@ 
qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason)
 
 return VIR_DOMAIN_EVENT_RESUMED_UNPAUSED;
 }
+
+
+/* qemuDomainIsUsingNoShutdown:
+ * @priv: Domain private data
+ *
+ * If JSON monitor is enabled, we can receive an event when QEMU stops. If
+ * we use no-shutdown, then we can watch for this event and do a soft/warm
+ * reboot.
+ *
+ * Returns: @true when -no-shutdown either should be or was added to the
+ * command line.
+ */
+bool
+qemuDomainIsUsingNoShutdown(qemuDomainObjPrivatePtr priv)
+{
+return priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES;
+}
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 80bd4bde91..554435e0a9 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -1089,4 +1089,7 @@ void qemuDomainStorageIdReset(qemuDomainObjPrivatePtr 
priv);
 virDomainEventResumedDetailType
 qemuDomainRunningReasonToResumeEvent(virDomainRunningReason reason);
 
+bool
+qemuDomainIsUsingNoShutdown(qemuDomainObjPrivatePtr priv);
+
 #endif /* __QEMU_DOMAIN_H__ */
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 5232f761af..7112f3c048 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -7988,11 +7988,16 @@ qemuProcessReconnect(void *opaque)
 if (virDomainObjIsActive(obj)) {
 /* We can't get the monitor back, so must kill the VM
  * to remove danger of it ending up running twice if
- * user tries to start it again later
- * If we couldn't get the monitor since QEMU supports
- * no-shutdown, we can safely say that the domain
- * crashed ... */
-state = VIR_DOMAIN_SHUTOFF_CRASHED;
+ * user tries to start it again later.
+ *
+ * If we cannot get to the monitor when the QEMU command
+ * line used -no-shutdown, then we can safely say that the
+ * domain crashed; otherwise we don't really know. */
+if (qemuDomainIsUsingNoShutdown(priv))
+state = VIR_DOMAIN_SHUTOFF_CRASHED;
+else
+state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
+
 /* If BeginJob failed, we jumped here without a job, let's hope another
  * thread didn't have a chance to start playing with the domain yet
  * (it's all we can do anyway).
-- 
2.17.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v2 0/3] qemu: Restore lost shutdown reason

2018-11-01 Thread John Ferlan

v1:
https://www.redhat.com/archives/libvir-list/2018-October/msg00945.html

Changes since v1:

Patch 1 (NEW)

  - Set the priv->allowReboot prior to any error processing since we
could/should be using that.

Patch 2 (ADJUSTED LOGIC)

  - Rather than open code the "reason" to use the -no-shutdown flag,
let's create a qemu_domain helper for that so that both the command
line building and the Reconnection failure logic can use it.

Patch 3 (NEW)

  - Let's narrow the window for using VIR_DOMAIN_SHUTOFF_CRASHED to
just the period where we try to open a monitor channel. If we
cannot do so, then "assume" the reason was crashed. There are
a few open failure steps that may not exactly fit the model, but
those are probably splitting hairs.

John Ferlan (3):
  qemu: Move allow reboot check setting
  qemu: Restore lost shutdown reason
  qemu: Narrow the shutdown reconnection failure reason window

 src/qemu/qemu_command.c |  6 +-
 src/qemu/qemu_domain.c  | 17 +
 src/qemu/qemu_domain.h  |  3 +++
 src/qemu/qemu_process.c | 27 ++-
 4 files changed, 39 insertions(+), 14 deletions(-)

-- 
2.17.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v2 1/3] qemu: Move allow reboot check setting

2018-11-01 Thread John Ferlan

Checking and setting the priv->allowReboot can be done before we start
processing the job. A subsequent patch will make use of the value to
make decisions in the error label, so we need to have it set properly.

Signed-off-by: John Ferlan 
---
 src/qemu/qemu_process.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 9cf971808c..5232f761af 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -7767,6 +7767,10 @@ qemuProcessReconnect(void *opaque)
 cfg = virQEMUDriverGetConfig(driver);
 priv = obj->privateData;
 
+/* If we are connecting to a guest started by old libvirt there is no
+ * allowReboot in status XML and we need to initialize it. */
+qemuProcessPrepareAllowReboot(obj);
+
 if (!(caps = virQEMUDriverGetCapabilities(driver, false)))
 goto error;
 
@@ -7783,10 +7787,6 @@ qemuProcessReconnect(void *opaque)
 if (qemuDomainMasterKeyReadFile(priv) < 0)
 goto error;
 
-/* If we are connecting to a guest started by old libvirt there is no
- * allowReboot in status XML and we need to initialize it. */
-qemuProcessPrepareAllowReboot(obj);
-
 if (qemuHostdevUpdateActiveDomainDevices(driver, obj->def) < 0)
 goto error;
 
-- 
2.17.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

2018-11-01 Thread Michal Privoznik

On 10/30/2018 01:55 PM, Daniel P. Berrangé wrote:
> On Tue, Oct 30, 2018 at 10:32:08AM +, Daniel P. Berrangé wrote:
>> On Tue, Oct 30, 2018 at 11:08:45AM +0100, Michal Privoznik wrote:
>>> On 10/30/2018 10:35 AM, Daniel P. Berrangé wrote:
 On Tue, Oct 30, 2018 at 09:13:50AM +0100, Michal Privoznik wrote:
> On 10/29/2018 06:34 PM, Marc Hartmayer wrote:
>> Introduce caching whether /dev/kvm is usable as the QEMU user:QEMU
>> group. This reduces the overhead of the QEMU capabilities cache
>> lookup. Before this patch there were many fork() calls used for
>> checking whether /dev/kvm is accessible. Now we store the result
>> whether /dev/kvm is accessible or not and we only need to re-run the
>> virFileAccessibleAs check if the ctime of /dev/kvm has changed.
>>
>> Suggested-by: Daniel P. Berrangé 
>> Signed-off-by: Marc Hartmayer 
>> ---
>>  src/qemu/qemu_capabilities.c | 54 ++--
>>  1 file changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
>> index e228f52ec0bb..85516954149b 100644
>> --- a/src/qemu/qemu_capabilities.c
>> +++ b/src/qemu/qemu_capabilities.c
>> @@ -3238,6 +3238,10 @@ struct _virQEMUCapsCachePriv {
>>  virArch hostArch;
>>  unsigned int microcodeVersion;
>>  char *kernelVersion;
>> +
>> +/* cache whether /dev/kvm is usable as runUid:runGuid */
>> +virTristateBool kvmUsable;
>> +time_t kvmCtime;
>>  };
>>  typedef struct _virQEMUCapsCachePriv virQEMUCapsCachePriv;
>>  typedef virQEMUCapsCachePriv *virQEMUCapsCachePrivPtr;
>> @@ -3824,6 +3828,52 @@ virQEMUCapsSaveFile(void *data,
>>  }
>>  
>>  
>> +/* Determine whether '/dev/kvm' is usable as QEMU user:QEMU group. */
>> +static bool
>> +virQEMUCapsKVMUsable(virQEMUCapsCachePrivPtr priv)
>> +{
>> +struct stat sb;
>> +static const char *kvm_device = "/dev/kvm";
>> +virTristateBool value;
>> +virTristateBool cached_value = priv->kvmUsable;
>> +time_t kvm_ctime;
>> +time_t cached_kvm_ctime = priv->kvmCtime;
>> +
>> +if (stat(kvm_device, &sb) < 0) {
>> +virReportSystemError(errno,
>> + _("Failed to stat %s"), kvm_device);
>> +return false;
>> +}
>> +kvm_ctime = sb.st_ctime;
>
> This doesn't feel right. /dev/kvm ctime is changed every time qemu is
> started or powered off (try running stat over it before and after a
> domain is started/shut off). So effectively we will fork more often than
> we would think. Should we cache inode number instead? Because for all
> that we care is simply if the file is there.

 Urgh, that is a bit strange and not the usual semantics for timestamps :-(
>>>
>>> Indeed.
>>>

 We can't stat the inode - the code was explicitly trying to cope with the
 way /dev/kvm can change permissions when udev rules get applied. We would
 have to compare the user, group, permissions mask and even ACL, or a hash
 of those.
>>>
>>> Well, we can use ctime as suggested and post a patch for kernel to fix
>>> ctime behaviour. Until the patch is merged our behaviour would be
>>> suboptimal, but still better than it is now.
>>
>> I guess lets talk to KVM team for their input on this and then decide
>> what todo.
> 
> Hmm, I wonder if it is not actually a kernel problem, but rather something
> in userspace genuinely touching the device in a way that caues these
> timestamps to be updated.
> 
> eg I vaguely recall a udev rule that resets permissions on device nodes
> whenever an FD is closed, which might cause this kind of behaviour

After trying hard to find the rule that mangles the ctime I've found the
following bug in udev:

https://github.com/zippy2/systemd/commit/51915e4e6a64c581da321c25448d80626d8e8408?diff=unified

I've send that as a pull request to systemd. So after all, ctime might
be usable again.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH] nodedev: Add new module node_device_util

2018-11-01 Thread Erik Skultety

There's a lot of stuff going on in src/conf/nodedev_conf which not
always has to do anything with config, so even though we're trying,
we're not really consistent in putting only parser/formatter related
stuff here like we do for domains. So, start the cleanup simply by adding
a new module to the nodedev driver and put a few helper APIs which want
to open a secondary driver connection in there (similar to storage_util
module).

Signed-off-by: Erik Skultety 
---

I verified the build with debian 9, centos 7, fedora 28, rawhide, and freebsd
11

 src/conf/Makefile.inc.am |   1 +
 src/conf/node_device_conf.c  | 199 ---
 src/conf/node_device_conf.h  |  11 --
 src/conf/virstorageobj.c |   1 +
 src/libvirt_private.syms |   8 +-
 src/node_device/Makefile.inc.am  |  17 +-
 src/node_device/node_device_driver.c |   1 +
 src/node_device/node_device_util.c   | 229 +++
 src/node_device/node_device_util.h   |  35 
 src/storage/Makefile.inc.am  |   1 +
 src/storage/storage_backend_scsi.c   |   1 +
 11 files changed, 290 insertions(+), 214 deletions(-)
 create mode 100644 src/node_device/node_device_util.c
 create mode 100644 src/node_device/node_device_util.h

diff --git a/src/conf/Makefile.inc.am b/src/conf/Makefile.inc.am
index af23810640..7cb6c29d70 100644
--- a/src/conf/Makefile.inc.am
+++ b/src/conf/Makefile.inc.am
@@ -163,6 +163,7 @@ libvirt_la_BUILT_LIBADD += libvirt_conf.la
 libvirt_conf_la_SOURCES = $(CONF_SOURCES)
 libvirt_conf_la_CFLAGS = \
-I$(srcdir)/conf \
+   -I$(srcdir)/node_device \
$(AM_CFLAGS) \
$(NULL)
 libvirt_conf_la_LDFLAGS = $(AM_LDFLAGS)
diff --git a/src/conf/node_device_conf.c b/src/conf/node_device_conf.c
index 03bd794dc0..74a7bc3933 100644
--- a/src/conf/node_device_conf.c
+++ b/src/conf/node_device_conf.c
@@ -2220,205 +2220,6 @@ virNodeDevCapsDefFree(virNodeDevCapsDefPtr caps)
 }


-/* virNodeDeviceGetParentName
- * @conn: Connection pointer
- * @nodedev_name: Node device to lookup
- *
- * Lookup the node device by name and return the parent name
- *
- * Returns parent name on success, caller is responsible for freeing;
- * otherwise, returns NULL on failure
- */
-char *
-virNodeDeviceGetParentName(virConnectPtr conn,
-   const char *nodedev_name)
-{
-virNodeDevicePtr device = NULL;
-char *parent;
-
-if (!(device = virNodeDeviceLookupByName(conn, nodedev_name))) {
-virReportError(VIR_ERR_XML_ERROR,
-   _("Cannot find '%s' in node device database"),
-   nodedev_name);
-return NULL;
-}
-
-ignore_value(VIR_STRDUP(parent, virNodeDeviceGetParent(device)));
-virObjectUnref(device);
-
-return parent;
-}
-
-
-/**
- * @fchost: Pointer to vHBA adapter
- *
- * Create a vHBA for Storage. This code accomplishes this via searching
- * through the sysfs for scsi_host/fc_host in order to first ensure some
- * vHBA doesn't already exist for the requested wwnn/wwpn (e.g. an unmanaged
- * vHBA) and to search for the parent vport capable scsi_host by name,
- * wwnn/wwpn, or fabric_wwn (if provided). If no parent is provided, then
- * a vport capable scsi_host will be selected.
- *
- * Returns vHBA name on success, NULL on failure with an error message set
- */
-char *
-virNodeDeviceCreateVport(virStorageAdapterFCHostPtr fchost)
-{
-unsigned int parent_host;
-char *name = NULL;
-char *parent_hoststr = NULL;
-bool skip_capable_check = false;
-
-VIR_DEBUG("parent='%s', wwnn='%s' wwpn='%s'",
-  NULLSTR(fchost->parent), fchost->wwnn, fchost->wwpn);
-
-if (fchost->parent) {
-if (VIR_STRDUP(parent_hoststr, fchost->parent) < 0)
-goto cleanup;
-} else if (fchost->parent_wwnn && fchost->parent_wwpn) {
-if (!(parent_hoststr = virVHBAGetHostByWWN(NULL, fchost->parent_wwnn,
-   fchost->parent_wwpn))) {
-virReportError(VIR_ERR_XML_ERROR, "%s",
-   _("cannot find parent using provided wwnn/wwpn"));
-goto cleanup;
-}
-} else if (fchost->parent_fabric_wwn) {
-if (!(parent_hoststr =
-  virVHBAGetHostByFabricWWN(NULL, fchost->parent_fabric_wwn))) {
-virReportError(VIR_ERR_XML_ERROR, "%s",
-   _("cannot find parent using provided fabric_wwn"));
-goto cleanup;
-}
-} else {
-if (!(parent_hoststr = virVHBAFindVportHost(NULL))) {
-virReportError(VIR_ERR_XML_ERROR, "%s",
-   _("'parent' for vHBA not specified, and "
- "cannot find one on this host"));
-goto cleanup;
-}
-skip_capable_check = true;
-}
-
-if (virSCSIHostGetNumber(parent_hoststr, &parent_host) < 0)
-goto cleanup;
-
-/* NOTE:
- * We do not save the parent_host

Re: [libvirt] [PATCH] qemu: Restore lost shutdown reason

2018-11-01 Thread John Ferlan




On 10/22/18 3:36 AM, Nikolay Shirokovskiy wrote:
> 
> 
> On 18.10.2018 18:28, John Ferlan wrote:
>> When qemuProcessReconnectHelper was introduced (commit d38897a5d)
>> reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that
>> was changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED
>> or VIR_DOMAIN_SHUTOFF_UNKNOWN.
>>
>> When QEMU_CAPS_NO_SHUTDOWN checking was removed in commit fe35b1ad6
>> the conditional state was just left at VIR_DOMAIN_SHUTOFF_CRASHED.
>>
>> Restore the logic adjustment using the same conditions as command
>> line building and alter the comment to describe the reasoning.
>>
>> Signed-off-by: John Ferlan 
>> ---
>>
>>  This was "discovered" while reviewing Nikolay's patch related
>>  to adding "DAEMON" as the shutdown reason:
>>
>> https://www.redhat.com/archives/libvir-list/2018-October/msg00493.html
>>
>>  While working through the iterations of that - figured I'd at
>>  least post this.
>>
>>
>>  src/qemu/qemu_process.c | 15 ++-
>>  1 file changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 3955eda17c..4a39111d51 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7985,11 +7985,16 @@ qemuProcessReconnect(void *opaque)
>>  if (virDomainObjIsActive(obj)) {
>>  /* We can't get the monitor back, so must kill the VM
>>   * to remove danger of it ending up running twice if
>> - * user tries to start it again later
>> - * If we couldn't get the monitor since QEMU supports
>> - * no-shutdown, we can safely say that the domain
>> - * crashed ... */
>> -state = VIR_DOMAIN_SHUTOFF_CRASHED;
>> + * user tries to start it again later.
>> + *
>> + * If we cannot get to the monitor when the QEMU command
>> + * line used -no-shutdown, then we can safely say that the
>> + * domain crashed; otherwise we don't really know. */
>> +if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES)
>> +state = VIR_DOMAIN_SHUTOFF_CRASHED;
>> +else
>> +state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
>> +
>>  /* If BeginJob failed, we jumped here without a job, let's hope 
>> another
>>   * thread didn't have a chance to start playing with the domain yet
>>   * (it's all we can do anyway).
>>
> 
> This is reasonable but not complete. This is only applied to case when we do 
> connect(2)

I understand your point; however, the goal of this patch wasn't to fix
the various other failure scenarios/reasons. It was to merely restore
the lost UNKNOWN reason in the event that either priv->monJSON == false
(unlikely) or priv->allowReboot != VIR_TRISTATE_BOOL_YES (possible).

Rather than copy-pasta, I'll create qemuDomainIsUsingNoShutdown(priv)
which will perform the condition checks for both command line generation
and reconnection failure paths.

If that's not desirable, then that's fine. I won't spend more cycles on
this.

John

> to qemu and it is failed with ECONNREFUSED which means qemu process does not 
> exists
> (in which case it is either crashed if was configured with -no-shutdown or 
> terminated
> for unknown reason) or just closed monitor connection (in this case we are 
> going
> to terminate it by ourselves).
> 
> But there are other cases. For example we can jump to error path even before 
> connect(2)
> or after successfull connect. See discussion of such cases in [1].
> 
> [1] https://www.redhat.com/archives/libvir-list/2018-October/msg01031.html
> 
> Nikolay
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v1 2/7] pcihp: overwrite hotplug handler recursively from the start

2018-11-01 Thread Igor Mammedov

On Wed, 24 Oct 2018 12:19:25 +0200
David Hildenbrand  wrote:

> For now, the hotplug handler is not called for devices that are
> being cold plugged. The hotplug handler is setup when the machine
> initialization is fully done. Only bridges that were cold plugged are
> considered.
> 
> Set the hotplug handler for the root piix bus directly when realizing.
> Overwrite the hotplug handler of bridges when hotplugging/coldplugging
> them.
> 
> This will now make sure that the ACPI PCI hotplug handler is also called
> for cold-plugged devices (also on bridges) and for bridges that were
> hotplugged.
> 
> When trying to hotplug a device to a hotplugged bridge, we now correctly
> get the error message
>  "Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set"
> Insted of going via the standard PCI hotplug handler.
Erroring out is probably not ok, since it can break existing setups
where SHPC hotplugging to hotplugged bridge was working just fine before.

Marcel/Michael what's your take on this change in behaviour?
CCing libvirt in case they are doing this stuff


> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/acpi/pcihp.c | 10 ++
>  hw/acpi/piix4.c | 16 +---
>  2 files changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 5e7cef173c..647b5e8d82 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -30,6 +30,7 @@
>  #include "hw/hw.h"
>  #include "hw/i386/pc.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/pci_bridge.h"
>  #include "hw/acpi/acpi.h"
>  #include "sysemu/sysemu.h"
>  #include "exec/address-spaces.h"
> @@ -236,6 +237,15 @@ void acpi_pcihp_device_plug_cb(HotplugHandler 
> *hotplug_dev, AcpiPciHpState *s,
>  int slot = PCI_SLOT(pdev->devfn);
>  int bsel;
>  
> +if (!s->legacy_piix && object_dynamic_cast(OBJECT(dev), 
> TYPE_PCI_BRIDGE)) {
> +PCIBus *sec = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
> +
> +/* Overwrite the default hotplug handler with the ACPI PCI one. */
> +qbus_set_hotplug_handler(BUS(sec), DEVICE(hotplug_dev), 
> &error_abort);
> +/* No need to do it recursively */
> +assert(QLIST_EMPTY(&sec->child));
> +}
> +
>  /* Don't send event when device is enabled during qemu machine creation:
>   * it is present on boot, no hotplug event is necessary. We do send an
>   * event when the device is disabled later. */
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 167afe2cea..e611778daf 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -446,15 +446,6 @@ static void piix4_device_unplug_cb(HotplugHandler 
> *hotplug_dev,
>  }
>  }
>  
> -static void piix4_update_bus_hotplug(PCIBus *pci_bus, void *opaque)
> -{
> -PIIX4PMState *s = opaque;
> -
> -/* pci_bus cannot outlive PIIX4PMState, because /machine keeps it alive
> - * and it's not hot-unpluggable */
> -qbus_set_hotplug_handler(BUS(pci_bus), DEVICE(s), &error_abort);
> -}
> -
>  static void piix4_pm_machine_ready(Notifier *n, void *opaque)
>  {
>  PIIX4PMState *s = container_of(n, PIIX4PMState, machine_ready);
> @@ -468,12 +459,6 @@ static void piix4_pm_machine_ready(Notifier *n, void 
> *opaque)
>  pci_conf[0x63] = 0x60;
>  pci_conf[0x67] = (memory_region_present(io_as, 0x3f8) ? 0x08 : 0) |
>  (memory_region_present(io_as, 0x2f8) ? 0x90 : 0);
> -
> -if (s->use_acpi_pci_hotplug) {
> -pci_for_each_bus(pci_get_bus(d), piix4_update_bus_hotplug, s);
> -} else {
> -piix4_update_bus_hotplug(pci_get_bus(d), s);
> -}
>  }
>  
>  static void piix4_pm_add_propeties(PIIX4PMState *s)
> @@ -547,6 +532,7 @@ static void piix4_pm_realize(PCIDevice *dev, Error **errp)
>  
>  piix4_acpi_system_hot_add_init(pci_address_space_io(dev),
> pci_get_bus(dev), s);
> +qbus_set_hotplug_handler(BUS(pci_get_bus(dev)), DEVICE(s), &error_abort);
>  
>  piix4_pm_add_propeties(s);
>  }

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] No change in git since RC1 so no RC2 for libvirt 4.9.0

2018-11-01 Thread Daniel Veillard

  Since nothing was commited in git since RC1 it doesn't really make
sense for me to push an RC2 at this point :-)

  So please continue testing RC1, I will check status over the w.e. 
as I plan to be offline for a bit more than a day, and depending on
status either cut an RC2 or push directly 4.9.0 GA,

   thanks in advance,

Daniel

-- 
Daniel Veillard  | Red Hat Developers Tools http://developer.redhat.com/
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

2018-11-01 Thread Daniel P . Berrangé

On Mon, Oct 29, 2018 at 06:34:58PM +0100, Marc Hartmayer wrote:
> Introduce caching whether /dev/kvm is usable as the QEMU user:QEMU
> group. This reduces the overhead of the QEMU capabilities cache
> lookup. Before this patch there were many fork() calls used for
> checking whether /dev/kvm is accessible. Now we store the result
> whether /dev/kvm is accessible or not and we only need to re-run the
> virFileAccessibleAs check if the ctime of /dev/kvm has changed.
> 
> Suggested-by: Daniel P. Berrangé 
> Signed-off-by: Marc Hartmayer 
> ---
>  src/qemu/qemu_capabilities.c | 54 ++--
>  1 file changed, 52 insertions(+), 2 deletions(-)

For benefit of those who were not at KVM Forum last week, this patch
is addressing the performance problem described starting here:

   https://youtu.be/H3lw50IKqGo?t=1055


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Report warning if QoS is set for vhostuser interface

2018-11-01 Thread Erik Skultety

On Thu, Nov 01, 2018 at 02:07:21PM +0100, Michal Privoznik wrote:
> On 11/01/2018 10:33 AM, Erik Skultety wrote:
> > On Wed, Oct 31, 2018 at 04:47:45PM +0100, Michal Privoznik wrote:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1524230
> >>
> >> Because of historical reasons, we are not denying starting a
> >> domain which has QoS set for unsupported type of device. We do
> >> report just a warning instead. And even though we perhaps used to
> >> do so for vhostuser it got lost somewhere. Bring it back.
> >
> > So, if my blame-fu isn't flawed, then I don't think that vhostuser ever
> > reported a warning during machine startup, have a look at commits 
> > 4a74ccdb92f
> > and 0bce012d7f0 respectively - there always was a goto cleanup statement.
> >
>
> Ah, so we did not report warning for vhostuser, but for everything else
> we do.
>
> >>
> >> Signed-off-by: Michal Privoznik 
> >> ---
> >>  src/qemu/qemu_command.c | 6 ++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> >> index 6e3ff67660..489e8bc689 100644
> >> --- a/src/qemu/qemu_command.c
> >> +++ b/src/qemu/qemu_command.c
> >> @@ -8246,6 +8246,12 @@ qemuBuildVhostuserCommandLine(virQEMUDriverPtr 
> >> driver,
> >>  goto cleanup;
> >>  }
> >>
> >> +if (virDomainNetGetActualBandwidth(net)) {
> >> +VIR_WARN("setting bandwidth on interfaces of "
> >> + "type '%s' is not implemented yet",
> >> + virDomainNetTypeToString(VIR_DOMAIN_NET_TYPE_VHOSTUSER));
> >> +}
> >> +
> >
> > We already do this kind of checking on line 8593 in the same source file. 
> > It's
> > just because of the goto cleanup jump after calling
> > qemuBuildVhostuserCommandLine that causes the logic to never reach the 
> > condition
> > (Not to mention the code is a mess). I'd suggest moving the already existing
> > check to the top of the qemuBuildInterfaceCommandLine function instead of
> > duplicating the code.
>
> I'm not sure I understand. The code looks something like this:
>
>   switch (actualType) {
> case VIR_DOMAIN_NET_TYPE_NETWORK:
> case VIR_DOMAIN_NET_TYPE_BRIDGE:
> createTapDevice();
> break;
>
> 
> case VIR_DOMAIN_NET_TYPE_VHOSTUSER:
> ret = qemuBuildVhostuserCommandLine();
> goto cleanup;
>   }
>
>   if (bandwidth) {
> if (bandwidthSupported(actualType))
>   setBandwidth();
> else
>   VIR_WARN();
>   }
>
> Obviously, I can't just move 'if (bandwidth)' chunk before switch()
> because setBandwidth() wouldn't have an interface to operate on (as it
> is created in the switch). Or did you have something else on mind?

Nah, my bad, I missed that there actually was the setBandwidth call. That
basically leaves us with 2 options (if don't count your original patch) and both
of them include some refactor:

1) extract the common bits of NET_TYPE_(DIRECT|ETHERNET|BRIDGE) to a helper,
then extract the bandwidth bits to a helper and call this from the
per-interface-type helpers
2) do a massive refactor of the whole interface-cmdline generating function, so
that there wouldn't be short exit paths just like it is the case now for
vhostuser so that in the end the bandwidth snippet we have can stay and will be
applied to all the interface types without exception.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH libvirt-python] event-test.py: Report ERROR events

2018-11-01 Thread Michal Privoznik

On 11/01/2018 11:20 AM, Philipp Hahn wrote:
> VIR_DOMAIN_EVENT_ID_IO_ERROR and VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON
> callbacks receive the same 'action' parameter, so also translate that
> numeric action to a descriptive text for the first callback.
> 
> Signed-off-by: Philipp Hahn 
> ---
>  examples/event-test.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

ACKed and pushed.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Report warning if QoS is set for vhostuser interface

2018-11-01 Thread Michal Privoznik

On 11/01/2018 10:33 AM, Erik Skultety wrote:
> On Wed, Oct 31, 2018 at 04:47:45PM +0100, Michal Privoznik wrote:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1524230
>>
>> Because of historical reasons, we are not denying starting a
>> domain which has QoS set for unsupported type of device. We do
>> report just a warning instead. And even though we perhaps used to
>> do so for vhostuser it got lost somewhere. Bring it back.
> 
> So, if my blame-fu isn't flawed, then I don't think that vhostuser ever
> reported a warning during machine startup, have a look at commits 4a74ccdb92f
> and 0bce012d7f0 respectively - there always was a goto cleanup statement.
> 

Ah, so we did not report warning for vhostuser, but for everything else
we do.

>>
>> Signed-off-by: Michal Privoznik 
>> ---
>>  src/qemu/qemu_command.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
>> index 6e3ff67660..489e8bc689 100644
>> --- a/src/qemu/qemu_command.c
>> +++ b/src/qemu/qemu_command.c
>> @@ -8246,6 +8246,12 @@ qemuBuildVhostuserCommandLine(virQEMUDriverPtr driver,
>>  goto cleanup;
>>  }
>>
>> +if (virDomainNetGetActualBandwidth(net)) {
>> +VIR_WARN("setting bandwidth on interfaces of "
>> + "type '%s' is not implemented yet",
>> + virDomainNetTypeToString(VIR_DOMAIN_NET_TYPE_VHOSTUSER));
>> +}
>> +
> 
> We already do this kind of checking on line 8593 in the same source file. It's
> just because of the goto cleanup jump after calling
> qemuBuildVhostuserCommandLine that causes the logic to never reach the 
> condition
> (Not to mention the code is a mess). I'd suggest moving the already existing
> check to the top of the qemuBuildInterfaceCommandLine function instead of
> duplicating the code.

I'm not sure I understand. The code looks something like this:

  switch (actualType) {
case VIR_DOMAIN_NET_TYPE_NETWORK:
case VIR_DOMAIN_NET_TYPE_BRIDGE:
createTapDevice();
break;


case VIR_DOMAIN_NET_TYPE_VHOSTUSER:
ret = qemuBuildVhostuserCommandLine();
goto cleanup;
  }

  if (bandwidth) {
if (bandwidthSupported(actualType))
  setBandwidth();
else
  VIR_WARN();
  }

Obviously, I can't just move 'if (bandwidth)' chunk before switch()
because setBandwidth() wouldn't have an interface to operate on (as it
is created in the switch). Or did you have something else on mind?

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH libvirt-python] event-test.py: Fix ERROR event

2018-11-01 Thread Michal Privoznik

On 11/01/2018 10:55 AM, Philipp Hahn wrote:
> ERROR_EVENTS translates the numeric 'action' argument to a description,
> not the 'reason' argument which already contains a descriptive string
> like 'enospc'.
> 
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 4661, in 
>> _dispatchDomainEventIOErrorReasonCallback
>> reason, opaque)
>>   File "libvirt-python/examples/event-test.py", line 536, in 
>> myDomainEventIOErrorReasonCallback
>> dom.name(), dom.ID(), srcpath, devalias, action, ERROR_EVENTS[reason]))
>>   File "libvirt-python/examples/event-test.py", line 474, in __getitem__
>> data = self.args[item]
>> TypeError: tuple indices must be integers, not str
> 
> Fixes: f5928c6711654f1496707ca77f626b3192843d57
> Signed-off-by: Philipp Hahn 
> ---
>  examples/event-test.py | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

ACKed and pushed.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 7/7] tests: fix dry run handling in network firewall test

2018-11-01 Thread Daniel P . Berrangé

The networkxml2firewalltest sets virCommand to dry run mode but doesn't
provide a callback to fill in stdout/stderr. As a result when the
firewall code queries rules it gets a NULL output and so never triggers
the callback to process output.

We only need to return an empty string to make the firewall code work
and thus trigger adding of the libvirt private chains to the builtin
chains.

Signed-off-by: Daniel P. Berrangé 
---
 .../nat-default-linux.args| 48 +++
 .../nat-ipv6-linux.args   | 48 +++
 .../nat-many-ips-linux.args   | 48 +++
 .../nat-no-dhcp-linux.args| 48 +++
 .../nat-tftp-linux.args   | 48 +++
 .../route-default-linux.args  | 48 +++
 tests/networkxml2firewalltest.c   | 16 ++-
 7 files changed, 303 insertions(+), 1 deletion(-)

diff --git a/tests/networkxml2firewalldata/nat-default-linux.args 
b/tests/networkxml2firewalldata/nat-default-linux.args
index 69995181ad..e7d71817c7 100644
--- a/tests/networkxml2firewalldata/nat-default-linux.args
+++ b/tests/networkxml2firewalldata/nat-default-linux.args
@@ -72,6 +72,54 @@ ip6tables \
 --list POSTROUTING
 iptables \
 --table filter \
+--insert INPUT \
+--jump INP_libvirt
+iptables \
+--table filter \
+--insert OUTPUT \
+--jump OUT_libvirt
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_out
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_in
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_cross
+iptables \
+--table nat \
+--insert POSTROUTING \
+--jump PRT_libvirt
+ip6tables \
+--table filter \
+--insert INPUT \
+--jump INP_libvirt
+ip6tables \
+--table filter \
+--insert OUTPUT \
+--jump OUT_libvirt
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_out
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_in
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_cross
+ip6tables \
+--table nat \
+--insert POSTROUTING \
+--jump PRT_libvirt
+iptables \
+--table filter \
 --insert INP_libvirt \
 --in-interface virbr0 \
 --protocol tcp \
diff --git a/tests/networkxml2firewalldata/nat-ipv6-linux.args 
b/tests/networkxml2firewalldata/nat-ipv6-linux.args
index f93d8face2..620ebb8d14 100644
--- a/tests/networkxml2firewalldata/nat-ipv6-linux.args
+++ b/tests/networkxml2firewalldata/nat-ipv6-linux.args
@@ -72,6 +72,54 @@ ip6tables \
 --list POSTROUTING
 iptables \
 --table filter \
+--insert INPUT \
+--jump INP_libvirt
+iptables \
+--table filter \
+--insert OUTPUT \
+--jump OUT_libvirt
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_out
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_in
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_cross
+iptables \
+--table nat \
+--insert POSTROUTING \
+--jump PRT_libvirt
+ip6tables \
+--table filter \
+--insert INPUT \
+--jump INP_libvirt
+ip6tables \
+--table filter \
+--insert OUTPUT \
+--jump OUT_libvirt
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_out
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_in
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_cross
+ip6tables \
+--table nat \
+--insert POSTROUTING \
+--jump PRT_libvirt
+iptables \
+--table filter \
 --insert INP_libvirt \
 --in-interface virbr0 \
 --protocol tcp \
diff --git a/tests/networkxml2firewalldata/nat-many-ips-linux.args 
b/tests/networkxml2firewalldata/nat-many-ips-linux.args
index faae4b881c..7c378b8c7e 100644
--- a/tests/networkxml2firewalldata/nat-many-ips-linux.args
+++ b/tests/networkxml2firewalldata/nat-many-ips-linux.args
@@ -72,6 +72,54 @@ ip6tables \
 --list POSTROUTING
 iptables \
 --table filter \
+--insert INPUT \
+--jump INP_libvirt
+iptables \
+--table filter \
+--insert OUTPUT \
+--jump OUT_libvirt
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_out
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_in
+iptables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_cross
+iptables \
+--table nat \
+--insert POSTROUTING \
+--jump PRT_libvirt
+ip6tables \
+--table filter \
+--insert INPUT \
+--jump INP_libvirt
+ip6tables \
+--table filter \
+--insert OUTPUT \
+--jump OUT_libvirt
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_out
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_in
+ip6tables \
+--table filter \
+--insert FORWARD \
+--jump FWD_libvirt_cross
+ip6tables \
+--table nat \
+--insert POSTROUTING \
+--jump PRT_libvirt
+iptables \
+--table filter \
 --insert INP_libvirt \
 --in-interface virbr0 \
 --protocol tcp \
diff --git a/tests/networkxml2firewalldata/nat-no-dhcp-linux.args 
b/tests/networkxml2firewalldata/nat-no-dhcp-linux.args
index cb0d908506..afa8c3a0ca 100644

[libvirt] [PATCH 4/7] network: setup default iptables chains

2018-11-01 Thread Daniel P . Berrangé

Register the default chains that will be used to hold firewall
rules at network startup.

Signed-off-by: Daniel P. Berrangé 
---
 src/network/bridge_driver_linux.c |  3 +
 .../nat-default-linux.args| 72 +++
 .../nat-ipv6-linux.args   | 72 +++
 .../nat-many-ips-linux.args   | 72 +++
 .../nat-no-dhcp-linux.args| 72 +++
 .../nat-tftp-linux.args   | 72 +++
 .../route-default-linux.args  | 72 +++
 7 files changed, 435 insertions(+)

diff --git a/src/network/bridge_driver_linux.c 
b/src/network/bridge_driver_linux.c
index fb09954b8f..6992653b4a 100644
--- a/src/network/bridge_driver_linux.c
+++ b/src/network/bridge_driver_linux.c
@@ -640,6 +640,9 @@ int networkAddFirewallRules(virNetworkDefPtr def)
 virFirewallPtr fw = NULL;
 int ret = -1;
 
+if (iptablesSetupPrivateChains() < 0)
+return -1;
+
 fw = virFirewallNew();
 
 virFirewallStartTransaction(fw, 0);
diff --git a/tests/networkxml2firewalldata/nat-default-linux.args 
b/tests/networkxml2firewalldata/nat-default-linux.args
index ffdafdff0e..9928da715b 100644
--- a/tests/networkxml2firewalldata/nat-default-linux.args
+++ b/tests/networkxml2firewalldata/nat-default-linux.args
@@ -1,5 +1,77 @@
 iptables \
 --table filter \
+--new-chain INP_libvirt
+iptables \
+--table filter \
+--new-chain OUT_libvirt
+iptables \
+--table filter \
+--new-chain FWD_libvirt_out
+iptables \
+--table filter \
+--new-chain FWD_libvirt_in
+iptables \
+--table filter \
+--new-chain FWD_libvirt_cross
+iptables \
+--table nat \
+--new-chain PRT_libvirt
+ip6tables \
+--table filter \
+--new-chain INP_libvirt
+ip6tables \
+--table filter \
+--new-chain OUT_libvirt
+ip6tables \
+--table filter \
+--new-chain FWD_libvirt_out
+ip6tables \
+--table filter \
+--new-chain FWD_libvirt_in
+ip6tables \
+--table filter \
+--new-chain FWD_libvirt_cross
+ip6tables \
+--table nat \
+--new-chain PRT_libvirt
+iptables \
+--table filter \
+--list INPUT
+iptables \
+--table filter \
+--list OUTPUT
+iptables \
+--table filter \
+--list FORWARD
+iptables \
+--table filter \
+--list FORWARD
+iptables \
+--table filter \
+--list FORWARD
+iptables \
+--table nat \
+--list POSTROUTING
+ip6tables \
+--table filter \
+--list INPUT
+ip6tables \
+--table filter \
+--list OUTPUT
+ip6tables \
+--table filter \
+--list FORWARD
+ip6tables \
+--table filter \
+--list FORWARD
+ip6tables \
+--table filter \
+--list FORWARD
+ip6tables \
+--table nat \
+--list POSTROUTING
+iptables \
+--table filter \
 --insert INPUT \
 --in-interface virbr0 \
 --protocol tcp \
diff --git a/tests/networkxml2firewalldata/nat-ipv6-linux.args 
b/tests/networkxml2firewalldata/nat-ipv6-linux.args
index 22285afa10..440896de18 100644
--- a/tests/networkxml2firewalldata/nat-ipv6-linux.args
+++ b/tests/networkxml2firewalldata/nat-ipv6-linux.args
@@ -1,5 +1,77 @@
 iptables \
 --table filter \
+--new-chain INP_libvirt
+iptables \
+--table filter \
+--new-chain OUT_libvirt
+iptables \
+--table filter \
+--new-chain FWD_libvirt_out
+iptables \
+--table filter \
+--new-chain FWD_libvirt_in
+iptables \
+--table filter \
+--new-chain FWD_libvirt_cross
+iptables \
+--table nat \
+--new-chain PRT_libvirt
+ip6tables \
+--table filter \
+--new-chain INP_libvirt
+ip6tables \
+--table filter \
+--new-chain OUT_libvirt
+ip6tables \
+--table filter \
+--new-chain FWD_libvirt_out
+ip6tables \
+--table filter \
+--new-chain FWD_libvirt_in
+ip6tables \
+--table filter \
+--new-chain FWD_libvirt_cross
+ip6tables \
+--table nat \
+--new-chain PRT_libvirt
+iptables \
+--table filter \
+--list INPUT
+iptables \
+--table filter \
+--list OUTPUT
+iptables \
+--table filter \
+--list FORWARD
+iptables \
+--table filter \
+--list FORWARD
+iptables \
+--table filter \
+--list FORWARD
+iptables \
+--table nat \
+--list POSTROUTING
+ip6tables \
+--table filter \
+--list INPUT
+ip6tables \
+--table filter \
+--list OUTPUT
+ip6tables \
+--table filter \
+--list FORWARD
+ip6tables \
+--table filter \
+--list FORWARD
+ip6tables \
+--table filter \
+--list FORWARD
+ip6tables \
+--table nat \
+--list POSTROUTING
+iptables \
+--table filter \
 --insert INPUT \
 --in-interface virbr0 \
 --protocol tcp \
diff --git a/tests/networkxml2firewalldata/nat-many-ips-linux.args 
b/tests/networkxml2firewalldata/nat-many-ips-linux.args
index aff9f69664..d80a9551d4 100644
--- a/tests/networkxml2firewalldata/nat-many-ips-linux.args
+++ b/tests/networkxml2firewalldata/nat-many-ips-linux.args
@@ -1,5 +1,77 @@
 iptables \
 --table filter \
+--new-chain INP_libvirt
+iptables \
+--table filter \
+--new-chain OUT_libvirt
+iptables \
+--table filter \
+--new-chain FWD_libvirt_out
+iptables \
+--table filter \
+--new-chain FWD_libvirt_in
+iptables \
+--table filter \
+--new-chain FWD_libvirt_cross
+iptables \
+--table nat \
+--new-chain PRT_libvirt
+ip6tables \
+--

[libvirt] [PATCH 6/7] tests: remove duplicated test case in networkxml2firewalltest

2018-11-01 Thread Daniel P . Berrangé

Signed-off-by: Daniel P. Berrangé 
---
 tests/networkxml2firewalltest.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/networkxml2firewalltest.c b/tests/networkxml2firewalltest.c
index 242b645767..505ff0c740 100644
--- a/tests/networkxml2firewalltest.c
+++ b/tests/networkxml2firewalltest.c
@@ -154,7 +154,6 @@ mymain(void)
 DO_TEST("nat-no-dhcp");
 DO_TEST("nat-ipv6");
 DO_TEST("route-default");
-DO_TEST("route-default");
 
  cleanup:
 return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
-- 
2.19.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 5/7] util: switch over to creating rules in private chains

2018-11-01 Thread Daniel P . Berrangé

All rules are now created in the libvirt private firewall chains. The
code for deleting rules will try to delete from both the original
builtin chains and the new private chains in order to cleanup properly
during upgrades.

This finally fixes a very old bug (from 2008!) related to traffic
between guests on distinct virtual networks. The intention is that
networks never allow incoming connections, but the old ordering of rules
meant that we would mistakenly allow accept traffic from whichever
network was most recently created.

With everything going into the FORWARD chain there was interleaving of
rules for outbound traffic and inbound traffic for each network:

  ACCEPT all  --  *  virbr2  0.0.0.0/0192.168.123.0/24 
ctstate RELATED,ESTABLISHED
  ACCEPT all  --  virbr2 *   192.168.123.0/24 0.0.0.0/0
  ACCEPT all  --  virbr2 virbr2  0.0.0.0/00.0.0.0/0
  REJECT all  --  *  virbr2  0.0.0.0/00.0.0.0/0
reject-with icmp-port-unreachable
  REJECT all  --  virbr2 *   0.0.0.0/00.0.0.0/0
reject-with icmp-port-unreachable
  ACCEPT all  --  *  virbr0  0.0.0.0/0192.168.122.0/24 
ctstate RELATED,ESTABLISHED
  ACCEPT all  --  virbr0 *   192.168.122.0/24 0.0.0.0/0
  ACCEPT all  --  virbr0 virbr0  0.0.0.0/00.0.0.0/0
  REJECT all  --  *  virbr0  0.0.0.0/00.0.0.0/0
reject-with icmp-port-unreachable
  REJECT all  --  virbr0 *   0.0.0.0/00.0.0.0/0
reject-with icmp-port-unreachable

So the rule allowing outbound traffic from virbr2 would mistakenly
allow packets from virbr2 to virbr0, before the rule denying input
to virbr0 gets a chance to run

With the split up forwarding chains, all incoming deny rules are checked
before any of the outgoing allow rules, as rules are grouped into three
distinct sets

Cross rules

  ACCEPT all  --  virbr2 virbr2  0.0.0.0/00.0.0.0/0
  ACCEPT all  --  virbr0 virbr0  0.0.0.0/00.0.0.0/0

Incoming rules

  ACCEPT all  --  *  virbr2  0.0.0.0/0192.168.123.0/24 
ctstate RELATED,ESTABLISHED
  ACCEPT all  --  *  virbr0  0.0.0.0/0192.168.122.0/24 
ctstate RELATED,ESTABLISHED
  REJECT all  --  *  virbr2  0.0.0.0/00.0.0.0/0
reject-with icmp-port-unreachable
  REJECT all  --  *  virbr0  0.0.0.0/00.0.0.0/0
reject-with icmp-port-unreachable

Outgoing rules

  ACCEPT all  --  virbr2 *   192.168.123.0/24 0.0.0.0/0
  REJECT all  --  virbr2 *   0.0.0.0/00.0.0.0/0
reject-with icmp-port-unreachable
  ACCEPT all  --  virbr0 *   192.168.122.0/24 0.0.0.0/0
  REJECT all  --  virbr0 *   0.0.0.0/00.0.0.0/0
reject-with icmp-port-unreachable

Signed-off-by: Daniel P. Berrangé 
---
 src/util/viriptables.c| 71 +--
 .../nat-default-linux.args| 32 -
 .../nat-ipv6-linux.args   | 48 ++---
 .../nat-many-ips-linux.args   | 60 
 .../nat-no-dhcp-linux.args| 46 ++--
 .../nat-tftp-linux.args   | 34 -
 .../route-default-linux.args  | 22 +++---
 7 files changed, 171 insertions(+), 142 deletions(-)

diff --git a/src/util/viriptables.c b/src/util/viriptables.c
index b4a4bf9a12..ad029e6465 100644
--- a/src/util/viriptables.c
+++ b/src/util/viriptables.c
@@ -209,7 +209,7 @@ iptablesAddTcpInput(virFirewallPtr fw,
 const char *iface,
 int port)
 {
-iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, ADD, 1);
+iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_PRIVATE, iface, port, ADD, 1);
 }
 
 /**
@@ -228,6 +228,7 @@ iptablesRemoveTcpInput(virFirewallPtr fw,
int port)
 {
 iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, REMOVE, 
1);
+iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_PRIVATE, iface, port, REMOVE, 
1);
 }
 
 /**
@@ -245,7 +246,7 @@ iptablesAddUdpInput(virFirewallPtr fw,
 const char *iface,
 int port)
 {
-iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, ADD, 0);
+iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_PRIVATE, iface, port, ADD, 0);
 }
 
 /**
@@ -263,7 +264,8 @@ iptablesRemoveUdpInput(virFirewallPtr fw,
const char *iface,
int port)
 {
-return iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, 
REMOVE, 0);
+iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, REMOVE, 
0);
+iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_PRIVATE, iface, port, REMOVE, 
0);
 }
 
 /**
@@ -281,7 +283,7 @@ iptablesAddUdpOutput(virFirewallPtr fw,

[libvirt] [PATCH 2/7] util: add iptables API for creating base chains

2018-11-01 Thread Daniel P . Berrangé

Historically rules were added straight into the base chains. This works
but it is inflexible for admins adding extra rules via hook scripts, and
it is not clear which rules are libvirt created.

There is a further complexity with the FORWARD chain where a specific
ordering of rules is needed to ensure traffic is matched correctly. This
would require complex interleaving of rules instead of plain appending.
By splitting the FORWARD chain into three chains management will be
simpler. Thus we create

  INPUT -> INP_libvirt
  OUTPUT -> OUT_libvirt
  FORWARD -> FWD_libvirt_cross
  FORWARD -> FWD_libvirt_in
  FORWARD -> FWD_libvirt_out
  POSTROUTING -> PRT_libvirt

Signed-off-by: Daniel P. Berrangé 
---
 src/libvirt_private.syms |  1 +
 src/util/viriptables.c   | 81 
 src/util/viriptables.h   |  2 +
 3 files changed, 84 insertions(+)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 335210c31d..e42c946de6 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -2062,6 +2062,7 @@ iptablesRemoveOutputFixUdpChecksum;
 iptablesRemoveTcpInput;
 iptablesRemoveUdpInput;
 iptablesRemoveUdpOutput;
+iptablesSetupPrivateChains;
 
 
 # util/viriscsi.h
diff --git a/src/util/viriptables.c b/src/util/viriptables.c
index f379844d28..4a7ea54b38 100644
--- a/src/util/viriptables.c
+++ b/src/util/viriptables.c
@@ -51,6 +51,87 @@ enum {
 };
 
 
+
+typedef struct {
+virFirewallLayer layer;
+const char *table;
+const char *parent;
+const char *child;
+} iptablesChain;
+
+static int
+iptablesCheckPrivateChain(virFirewallPtr fw,
+  const char *const *lines,
+  void *opaque)
+{
+iptablesChain *data = opaque;
+bool found = false;
+
+while (lines && *lines && !found) {
+if (STRPREFIX(*lines, data->child))
+found = true;
+lines++;
+}
+
+if (!found)
+virFirewallAddRule(fw, data->layer,
+   "--table", data->table,
+   "--insert", data->parent,
+   "--jump", data->child, NULL);
+
+ return 0;
+}
+
+
+int
+iptablesSetupPrivateChains(void)
+{
+virFirewallPtr fw;
+int ret = -1;
+iptablesChain chains[] = {
+{VIR_FIREWALL_LAYER_IPV4, "filter", "INPUT", "INP_libvirt"},
+{VIR_FIREWALL_LAYER_IPV4, "filter", "OUTPUT", "OUT_libvirt"},
+{VIR_FIREWALL_LAYER_IPV4, "filter", "FORWARD", "FWD_libvirt_out"},
+{VIR_FIREWALL_LAYER_IPV4, "filter", "FORWARD", "FWD_libvirt_in"},
+{VIR_FIREWALL_LAYER_IPV4, "filter", "FORWARD", "FWD_libvirt_cross"},
+{VIR_FIREWALL_LAYER_IPV4, "nat", "POSTROUTING", "PRT_libvirt"},
+{VIR_FIREWALL_LAYER_IPV6, "filter", "INPUT", "INP_libvirt"},
+{VIR_FIREWALL_LAYER_IPV6, "filter", "OUTPUT", "OUT_libvirt"},
+{VIR_FIREWALL_LAYER_IPV6, "filter", "FORWARD", "FWD_libvirt_out"},
+{VIR_FIREWALL_LAYER_IPV6, "filter", "FORWARD", "FWD_libvirt_in"},
+{VIR_FIREWALL_LAYER_IPV6, "filter", "FORWARD", "FWD_libvirt_cross"},
+{VIR_FIREWALL_LAYER_IPV6, "nat", "POSTROUTING", "PRT_libvirt"},
+};
+size_t i;
+
+fw = virFirewallNew();
+
+virFirewallStartTransaction(fw, VIR_FIREWALL_TRANSACTION_IGNORE_ERRORS);
+
+for (i = 0; i < ARRAY_CARDINALITY(chains); i++) {
+virFirewallAddRule(fw, chains[i].layer,
+   "--table", chains[i].table,
+   "--new-chain", chains[i].child, NULL);
+}
+
+virFirewallStartTransaction(fw, 0);
+
+for (i = 0; i < ARRAY_CARDINALITY(chains); i++) {
+virFirewallAddRuleFull(fw, chains[i].layer,
+   false, iptablesCheckPrivateChain,
+   &chains[i],
+   "--table", chains[i].table,
+   "--list", chains[i].parent, NULL);
+}
+
+if (virFirewallApply(fw) < 0)
+goto cleanup;
+
+ret = 0;
+ cleanup:
+return ret;
+}
+
 static void
 iptablesInput(virFirewallPtr fw,
   virFirewallLayer layer,
diff --git a/src/util/viriptables.h b/src/util/viriptables.h
index 9ea25fc096..1db97937a1 100644
--- a/src/util/viriptables.h
+++ b/src/util/viriptables.h
@@ -27,6 +27,8 @@
 # include "virsocketaddr.h"
 # include "virfirewall.h"
 
+int  iptablesSetupPrivateChains  (void);
+
 void iptablesAddTcpInput (virFirewallPtr fw,
   virFirewallLayer layer,
   const char *iface,
-- 
2.19.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 3/7] util: prepare iptables for putting rules into private chains

2018-11-01 Thread Daniel P . Berrangé

Currently all rules are created directly in the INPUT, FORWARD,
OUTPUT and POSTROUTING chains. This change prepares for putting
the rules into private changes, but does not actually do the
switch yet.

Signed-off-by: Daniel P. Berrangé 
---
 src/util/viriptables.c | 152 +
 1 file changed, 108 insertions(+), 44 deletions(-)

diff --git a/src/util/viriptables.c b/src/util/viriptables.c
index 4a7ea54b38..b4a4bf9a12 100644
--- a/src/util/viriptables.c
+++ b/src/util/viriptables.c
@@ -50,6 +50,12 @@ enum {
 REMOVE
 };
 
+enum {
+VIR_IPTABLES_CHAIN_BUILTIN,
+VIR_IPTABLES_CHAIN_PRIVATE,
+
+VIR_IPTABLES_CHAIN_LAST,
+};
 
 
 typedef struct {
@@ -135,19 +141,24 @@ iptablesSetupPrivateChains(void)
 static void
 iptablesInput(virFirewallPtr fw,
   virFirewallLayer layer,
+  int chain,
   const char *iface,
   int port,
   int action,
   int tcp)
 {
 char portstr[32];
+static const char *chainName[VIR_IPTABLES_CHAIN_LAST] = {
+"INPUT",
+"INP_libvirt",
+};
 
 snprintf(portstr, sizeof(portstr), "%d", port);
 portstr[sizeof(portstr) - 1] = '\0';
 
 virFirewallAddRule(fw, layer,
"--table", "filter",
-   action == ADD ? "--insert" : "--delete", "INPUT",
+   action == ADD ? "--insert" : "--delete", 
chainName[chain],
"--in-interface", iface,
"--protocol", tcp ? "tcp" : "udp",
"--destination-port", portstr,
@@ -158,19 +169,24 @@ iptablesInput(virFirewallPtr fw,
 static void
 iptablesOutput(virFirewallPtr fw,
virFirewallLayer layer,
+   int chain,
const char *iface,
int port,
int action,
int tcp)
 {
 char portstr[32];
+static const char *chainName[VIR_IPTABLES_CHAIN_LAST] = {
+"OUTPUT",
+"OUT_libvirt",
+};
 
 snprintf(portstr, sizeof(portstr), "%d", port);
 portstr[sizeof(portstr) - 1] = '\0';
 
 virFirewallAddRule(fw, layer,
"--table", "filter",
-   action == ADD ? "--insert" : "--delete", "OUTPUT",
+   action == ADD ? "--insert" : "--delete", 
chainName[chain],
"--out-interface", iface,
"--protocol", tcp ? "tcp" : "udp",
"--destination-port", portstr,
@@ -193,7 +209,7 @@ iptablesAddTcpInput(virFirewallPtr fw,
 const char *iface,
 int port)
 {
-iptablesInput(fw, layer, iface, port, ADD, 1);
+iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, ADD, 1);
 }
 
 /**
@@ -211,7 +227,7 @@ iptablesRemoveTcpInput(virFirewallPtr fw,
const char *iface,
int port)
 {
-iptablesInput(fw, layer, iface, port, REMOVE, 1);
+iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, REMOVE, 
1);
 }
 
 /**
@@ -229,7 +245,7 @@ iptablesAddUdpInput(virFirewallPtr fw,
 const char *iface,
 int port)
 {
-iptablesInput(fw, layer, iface, port, ADD, 0);
+iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, ADD, 0);
 }
 
 /**
@@ -247,7 +263,7 @@ iptablesRemoveUdpInput(virFirewallPtr fw,
const char *iface,
int port)
 {
-return iptablesInput(fw, layer, iface, port, REMOVE, 0);
+return iptablesInput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, 
REMOVE, 0);
 }
 
 /**
@@ -265,7 +281,7 @@ iptablesAddUdpOutput(virFirewallPtr fw,
  const char *iface,
  int port)
 {
-iptablesOutput(fw, layer, iface, port, ADD, 0);
+iptablesOutput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, ADD, 0);
 }
 
 /**
@@ -283,7 +299,7 @@ iptablesRemoveUdpOutput(virFirewallPtr fw,
 const char *iface,
 int port)
 {
-iptablesOutput(fw, layer, iface, port, REMOVE, 0);
+iptablesOutput(fw, layer, VIR_IPTABLES_CHAIN_BUILTIN, iface, port, REMOVE, 
0);
 }
 
 
@@ -323,6 +339,7 @@ static char *iptablesFormatNetwork(virSocketAddr *netaddr,
  */
 static int
 iptablesForwardAllowOut(virFirewallPtr fw,
+int chain,
 virSocketAddr *netaddr,
 unsigned int prefix,
 const char *iface,
@@ -332,6 +349,10 @@ iptablesForwardAllowOut(virFirewallPtr fw,
 VIR_AUTOFREE(char *) networkstr = NULL;
 virFirewallLayer layer = VIR_SOCKET_ADDR_FAMILY(netaddr) == AF_INET ?
 VIR_FIREWALL_LAYER_IPV4 : VIR_FIREWALL_LAYER_IPV6;
+static const char *chainName[VIR_IPTABLES_CHAIN_LAST] = {
+"FORWARD",
+"FWD_libvirt_out",
+};
 
 if (!(networkstr = iptablesFormatNetwork(netaddr, prefix

[libvirt] [PATCH 1/7] util: refactor iptables APIs to share more code

2018-11-01 Thread Daniel P . Berrangé

Most of the iptables APIs share code for the add/delete paths, but a
couple were separated. Merge the remaining APIs to facilitate future
changes.

Signed-off-by: Daniel P. Berrangé 
---
 src/util/viriptables.c | 73 --
 1 file changed, 42 insertions(+), 31 deletions(-)

diff --git a/src/util/viriptables.c b/src/util/viriptables.c
index 5dbea8cf57..f379844d28 100644
--- a/src/util/viriptables.c
+++ b/src/util/viriptables.c
@@ -495,6 +495,21 @@ iptablesRemoveForwardAllowIn(virFirewallPtr fw,
 return iptablesForwardAllowIn(fw, netaddr, prefix, iface, physdev, REMOVE);
 }
 
+static void
+iptablesForwardAllowCross(virFirewallPtr fw,
+  virFirewallLayer layer,
+  const char *iface,
+  int action)
+{
+virFirewallAddRule(fw, layer,
+   "--table", "filter",
+   action == ADD ? "--insert" : "--delete", "FORWARD",
+   "--in-interface", iface,
+   "--out-interface", iface,
+   "--jump", "ACCEPT",
+   NULL);
+}
+
 /**
  * iptablesAddForwardAllowCross:
  * @ctx: pointer to the IP table context
@@ -511,13 +526,7 @@ iptablesAddForwardAllowCross(virFirewallPtr fw,
  virFirewallLayer layer,
  const char *iface)
 {
-virFirewallAddRule(fw, layer,
-   "--table", "filter",
-   "--insert", "FORWARD",
-   "--in-interface", iface,
-   "--out-interface", iface,
-   "--jump", "ACCEPT",
-   NULL);
+iptablesForwardAllowCross(fw, layer, iface, ADD);
 }
 
 /**
@@ -535,13 +544,21 @@ void
 iptablesRemoveForwardAllowCross(virFirewallPtr fw,
 virFirewallLayer layer,
 const char *iface)
+{
+iptablesForwardAllowCross(fw, layer, iface, REMOVE);
+}
+
+static void
+iptablesForwardRejectOut(virFirewallPtr fw,
+ virFirewallLayer layer,
+ const char *iface,
+ int action)
 {
 virFirewallAddRule(fw, layer,
"--table", "filter",
-   "--delete", "FORWARD",
+   action == ADD ? "--insert" : "delete", "FORWARD",
"--in-interface", iface,
-   "--out-interface", iface,
-   "--jump", "ACCEPT",
+   "--jump", "REJECT",
NULL);
 }
 
@@ -560,12 +577,7 @@ iptablesAddForwardRejectOut(virFirewallPtr fw,
 virFirewallLayer layer,
 const char *iface)
 {
-virFirewallAddRule(fw, layer,
-   "--table", "filter",
-   "--insert", "FORWARD",
-   "--in-interface", iface,
-   "--jump", "REJECT",
-   NULL);
+iptablesForwardRejectOut(fw, layer, iface, ADD);
 }
 
 /**
@@ -582,16 +594,25 @@ void
 iptablesRemoveForwardRejectOut(virFirewallPtr fw,
virFirewallLayer layer,
const char *iface)
+{
+iptablesForwardRejectOut(fw, layer, iface, REMOVE);
+}
+
+
+static void
+iptablesForwardRejectIn(virFirewallPtr fw,
+virFirewallLayer layer,
+const char *iface,
+int action)
 {
 virFirewallAddRule(fw, layer,
"--table", "filter",
-   "--delete", "FORWARD",
-   "--in-interface", iface,
+   action == ADD ? "--insert" : "--delete", "FORWARD",
+   "--out-interface", iface,
"--jump", "REJECT",
NULL);
 }
 
-
 /**
  * iptablesAddForwardRejectIn:
  * @ctx: pointer to the IP table context
@@ -607,12 +628,7 @@ iptablesAddForwardRejectIn(virFirewallPtr fw,
virFirewallLayer layer,
const char *iface)
 {
-virFirewallAddRule(fw, layer,
-   "--table", "filter",
-   "--insert", "FORWARD",
-   "--out-interface", iface,
-   "--jump", "REJECT",
-   NULL);
+iptablesForwardRejectIn(fw, layer, iface, ADD);
 }
 
 /**
@@ -630,12 +646,7 @@ iptablesRemoveForwardRejectIn(virFirewallPtr fw,
   virFirewallLayer layer,
   const char *iface)
 {
-virFirewallAddRule(fw, layer,
-   "--table", "filter",
-   "--delete", "FORWARD",
-   "--out-interface", iface,
-   "--jump", "REJECT",
-   NULL);
+iptablesForwardRejectIn(fw, layer, iface

[libvirt] [PATCH 0/7] Restructure firewall rules for virtual networks into private chains

2018-11-01 Thread Daniel P . Berrangé

The virtual networks in NAT mode are supposed to only allow outbound
network access for guests. Unfortunately due to ordering of the firewall
rules libvirt creates, when you have multiple virtual networks, guests
on the more recently created virtual networks can connect to guests on
old virtual networks.

This was reported way back in 2008 but we always thought the fix would
be very complicated to deal with, so we've been putting it off forever.

In parallel with this there's also been a long standing desire since
2009 to move our firewall rules out of the builtin chains, to libvirt
private chains. This is to make it easier for admins to use hook scripts
to setup rules in the builtin chains that take priority over rules
libvirt creates.

In implementing the changes to use private chains, I suddenly realized
that fixing the network to network traffic blocking problem was trivial
if I grouped the forwarding rules into three distinct sets.

So this series finally fixes an annoying 10 year old bug, and implements
a 9 year old RFE.

It may take us a while, but we'll get to your bugs eventually ;-)

Daniel P. Berrangé (7):
  util: refactor iptables APIs to share more code
  util: add iptables API for creating base chains
  util: prepare iptables for putting rules into private chains
  network: setup default iptables chains
  util: switch over to creating rules in private chains
  tests: remove duplicated test case in networkxml2firewalltest
  tests: fix dry run handling in network firewall test

 src/libvirt_private.syms  |   1 +
 src/network/bridge_driver_linux.c |   3 +
 src/util/viriptables.c| 317 ++
 src/util/viriptables.h|   2 +
 .../nat-default-linux.args| 150 -
 .../nat-ipv6-linux.args   | 166 +++--
 .../nat-many-ips-linux.args   | 178 --
 .../nat-no-dhcp-linux.args| 164 +++--
 .../nat-tftp-linux.args   | 152 -
 .../route-default-linux.args  | 140 +++-
 tests/networkxml2firewalltest.c   |  17 +-
 11 files changed, 1107 insertions(+), 183 deletions(-)

-- 
2.19.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH 1/3] qemu: caps: add drive.qcow2.l2-cache-size

2018-11-01 Thread Nikolay Shirokovskiy

Signed-off-by: Nikolay Shirokovskiy 
---
 src/qemu/qemu_capabilities.c   | 3 +++
 src/qemu/qemu_capabilities.h   | 1 +
 tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml | 1 +
 tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml  | 1 +
 tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.11.0.x86_64.xml  | 1 +
 tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml | 1 +
 tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml  | 1 +
 tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.6.0.aarch64.xml  | 1 +
 tests/qemucapabilitiesdata/caps_2.6.0.ppc64.xml| 1 +
 tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml| 1 +
 tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml| 1 +
 tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml   | 1 +
 tests/qemucapabilitiesdata/caps_2.9.0.ppc64.xml| 1 +
 tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml| 1 +
 tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml   | 1 +
 tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml| 1 +
 tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml  | 1 +
 tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml  | 1 +
 tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml| 1 +
 tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml   | 1 +
 28 files changed, 30 insertions(+)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index e228f52..7d42254 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -508,6 +508,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST,
   /* 315 */
   "vfio-pci.display",
   "blockdev",
+  "drive.qcow2.l2-cache-size",
 );
 
 
@@ -1234,6 +1235,8 @@ static struct virQEMUCapsStringFlags 
virQEMUCapsQMPSchemaQueries[] = {
 { "blockdev-add/arg-type/+vxhs", QEMU_CAPS_VXHS},
 { "blockdev-add/arg-type/+iscsi/password-secret", 
QEMU_CAPS_ISCSI_PASSWORD_SECRET },
 { "blockdev-add/arg-type/+qcow2/encrypt/+luks/key-secret", 
QEMU_CAPS_QCOW2_LUKS },
+{ "blockdev-add/arg-type/options/+qcow2/l2-cache-size", 
QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE},
+{ "blockdev-add/arg-type/+qcow2/l2-cache-size", 
QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE},
 { "nbd-server-start/arg-type/tls-creds", QEMU_CAPS_NBD_TLS },
 { "screendump/arg-type/device", QEMU_CAPS_SCREENDUMP_DEVICE },
 { "block-commit/arg-type/*top",  QEMU_CAPS_ACTIVE_COMMIT },
diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h
index 934620e..eb9b98b 100644
--- a/src/qemu/qemu_capabilities.h
+++ b/src/qemu/qemu_capabilities.h
@@ -492,6 +492,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for 
syntax-check */
 /* 315 */
 QEMU_CAPS_VFIO_PCI_DISPLAY, /* -device vfio-pci.display */
 QEMU_CAPS_BLOCKDEV, /* -blockdev and blockdev-add are supported */
+QEMU_CAPS_DRIVE_QCOW2_L2_CACHE_SIZE, /* -drive l2-cache-size */
 
 QEMU_CAPS_LAST /* this must always be the last item */
 } virQEMUCapsFlags;
diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml 
b/tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml
index b9c4182..1214a48 100644
--- a/tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml
+++ b/tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml
@@ -151,6 +151,7 @@
   
   
   
+  
   201
   0
   305067
diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml 
b/tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml
index 66b2560..defbd6d 100644
--- a/tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml
+++ b/tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml
@@ -150,6 +150,7 @@
   
   
   
+  
   201
   0
   384412
diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml 
b/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml
index e000aac..3dfb5b3 100644
--- a/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml
+++ b/tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml
@@ -113,6 +113,7 @@
   
   
   
+  
   201
   0
   306247
diff --git a/tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml 
b/tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml
index ebc5e77..adf9cd0 100644
--- a/tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml
+++ b/tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml
@@ -192,6 +192,7 @@
   
   
   
+  
   201
   0
   364386
diff --git a/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml 
b/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml
index 4eb8a39..d115424 100644
--- a/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml
+++ b/tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml
@@ -120,6 +120,7 @@
   
   
   
+  
   2011000
   0
   345099
diff --git a/tests/qemucapabilities

[libvirt] [PATCH 2/3] xml: add disk driver metadata_cache_size option

2018-11-01 Thread Nikolay Shirokovskiy

The only possible value is 'maximum' which makes l2_cache_size large
enough to keep all metadata in memory. This will unleash disks
performace for some database workloads and IO benchmarks with random
access to whole disk.

Note that imlementation sets l2-cache-size and not cache-size.
Both *cache-size's is upper limit on cache size value thus instead of
setting precise limit for disk which involves knowing disk size
and disk's cluster size we can just set INT64_MAX. Unfortunately
both old and new versions of qemu fail on setting cache-size to INT64_MAX.
Fortunately both old and new versions works well on such setting for
l2-cache-size. As guest performance depends only l2 cache size
and not refcount cache size (which is documented in recent qemu)
we can set l2 directly.

Signed-off-by: Nikolay Shirokovskiy 
---
 docs/formatdomain.html.in  |  7 
 docs/schemas/domaincommon.rng  | 10 +
 src/conf/domain_conf.c | 17 
 src/conf/domain_conf.h |  9 
 src/qemu/qemu_command.c| 26 
 src/qemu/qemu_domain.c |  1 +
 .../qemuxml2argvdata/disk-metadata_cache_size.args | 34 +++
 .../qemuxml2argvdata/disk-metadata_cache_size.xml  | 42 +++
 tests/qemuxml2argvtest.c   |  2 +
 .../disk-metadata_cache_size.xml   | 48 ++
 tests/qemuxml2xmltest.c|  2 +
 11 files changed, 198 insertions(+)
 create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.args
 create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.xml
 create mode 100644 tests/qemuxml2xmloutdata/disk-metadata_cache_size.xml

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 8189959..93e0009 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -3556,6 +3556,13 @@
 virt queues for virtio-blk. (Since 
3.9.0)
   
   
+The optional metadata_cache_size attribute specifies
+metadata cache size policy. The only possible value is "maximum" to
+keep all metadata in cache, this will help if workload needs access
+to whole disk all the time.  (Since
+4.9.0)
+  
+  
   For virtio disks,
   Virtio-specific options can also be
   set. (Since 3.5.0)
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 099a949..18efa3a 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -1990,6 +1990,9 @@
 
   
   
+
+  
+  
 
   
 
@@ -2090,6 +2093,13 @@
   
 
   
+  
+
+  
+maximum
+  
+
+  
   
 
   
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index e8e0adc..04383f0 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -885,6 +885,11 @@ VIR_ENUM_IMPL(virDomainDiskDetectZeroes, 
VIR_DOMAIN_DISK_DETECT_ZEROES_LAST,
   "on",
   "unmap")
 
+VIR_ENUM_IMPL(virDomainDiskMetadataCacheSize,
+  VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST,
+  "default",
+  "maximum")
+
 VIR_ENUM_IMPL(virDomainDiskMirrorState, VIR_DOMAIN_DISK_MIRROR_STATE_LAST,
   "none",
   "yes",
@@ -9347,6 +9352,14 @@ virDomainDiskDefDriverParseXML(virDomainDiskDefPtr def,
 }
 VIR_FREE(tmp);
 
+if ((tmp = virXMLPropString(cur, "metadata_cache_size")) &&
+(def->metadata_cache_size = 
virDomainDiskMetadataCacheSizeTypeFromString(tmp)) <= 0) {
+virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+   _("unknown driver metadata_cache_size value '%s'"), 
tmp);
+goto cleanup;
+}
+VIR_FREE(tmp);
+
 ret = 0;
 
  cleanup:
@@ -23874,6 +23887,10 @@ virDomainDiskDefFormatDriver(virBufferPtr buf,
 if (disk->queues)
 virBufferAsprintf(&driverBuf, " queues='%u'", disk->queues);
 
+if (disk->metadata_cache_size)
+virBufferAsprintf(&driverBuf, " metadata_cache_size='%s'",
+  
virDomainDiskMetadataCacheSizeTypeToString(disk->metadata_cache_size));
+
 virDomainVirtioOptionsFormat(&driverBuf, disk->virtio);
 
 ret = virXMLFormatElement(buf, "driver", &driverBuf, NULL);
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index e30a4b2..b155058 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -568,6 +568,13 @@ typedef enum {
 VIR_DOMAIN_DISK_DETECT_ZEROES_LAST
 } virDomainDiskDetectZeroes;
 
+typedef enum {
+VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_DEFAULT = 0,
+VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM,
+
+VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_LAST
+} virDomainDiskMetadataCacheSize;
+
 typedef struct _virDomainBlockIoTuneInfo virDomainBlockIoTuneInfo;
 struct _virD

Re: [libvirt] [PATCH] nodedev: Document the udevEventHandleThread

2018-11-01 Thread Erik Skultety

On Fri, Oct 19, 2018 at 08:04:43AM -0400, John Ferlan wrote:
> Add details of the algorithm and why it was used to help future
> readers understand the issues encountered. Based largely off a
> description provided by Erik Skultety .

Since the link to the archive would be missing from the commit message, I feel
like simply blaming commit cdbe13329a7 for lacking documentation/justification
of the algorithm used is the right way of composing the commit message.

I appreciate the effort to document it properly, although I feel it's a bit
overwhelming, so the comments I have below is just a sheer attempt to reduce
the amount of text documenting a single function, you know, if a function needs
a comment like that, it suggests that there's something wrong with that
function...

>
> Signed-off-by: John Ferlan 
> ---
>
>  see: https://www.redhat.com/archives/libvir-list/2018-October/msg00915.html
>
>  src/node_device/node_device_udev.c | 49 ++
>  1 file changed, 49 insertions(+)
>
> diff --git a/src/node_device/node_device_udev.c 
> b/src/node_device/node_device_udev.c
> index 22897591de..8ca81e65ad 100644
> --- a/src/node_device/node_device_udev.c
> +++ b/src/node_device/node_device_udev.c
> @@ -1591,6 +1591,53 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
>  }
>
>
> +/**
> + * udevEventHandleThread
> + * @opaque: unused
> + *
> + * Thread to handle the udevEventHandleCallback processing when udev
> + * tells us there's a device change for us (add, modify, delete, etc).
> + *
> + * Management of the processing udev device notification is a delicate
> + * balance between the udev process, the scheduler, and when exactly the
> + * data from/for the socket is received.

I'd drop ^this sentence, the issue with this unfortunate mix of factors is
explained further down, let's try keeping the amount of commentary down.

> + * The udev processing code notifies
> + * the udevEventHandleCallback that it has data; however, the actual recv()
> + * done in the udev code to fill in the @device structure can be delayed
> + * due to how threads are scheduled.

How about:
"When we get notified by udev that there are data to be processed, the actual
data retrieval by libudev may be delayed due to how threads are scheduled."

> + *
> + * As it turns out, the event loop could be scheduled (and it in fact
> + * was) earlier than the handler thread. What that essentially means is
> + * that by the time the thread actually handled the event and read the
> + * data from the monitor, the event loop fired the very same event, simply
> + * because the data hadn't been retrieved from the socket at that point yet.

^This doesn't need to be a separate paragraph, you can connect it to the
previous one. Also, some words don't feel right for commentary purposes (I know
they're mine :D). How about:

"In fact, the event loop could be scheduled earlier than the handler thread,
thus potentially emitting the very same event the handler thread is currently
trying to process, simply because the data hadn't been retrieved from the
socket at that time yet."

> + *
> + * Thus this algorithm is that once udevEventHandleCallback is notified
> + * there is data,

^this can be dropped...

> + * this loop will process as much data as possible until
> + * udev_monitor_receive_device fails to get the @device. This may be
> + * because we've processed everything or because the scheduling thread
> + * hasn't been able to populate data in the socket. Once we're done
> + * processing everything we can, wait on the next event/notification.

How about:

"Instead of waiting for the next event after retrieving a device data we keep
reading from the udev monitor until we encounter one of EAGAIN or EWOULDBLOCK
errors, only then we'll wait for the next event."

Also, I'd move that bit into the function body, so it's directly tied to
the piece of code responsible for it.

> + * Although this may incur a slight delay if the socket data wasn't
> + * populated, the event/notifications are fairly consistent so there's
> + * no need to use virCondWaitUntil.

^This could be dropped.

> + *
> + * NB: Usage of event based socket algorithm causes some issues with
> + * older platforms such as CentOS 6 in which the data socket is opened
> + * without the NONBLOCK bit set. Still even if the reset of priv->dataReady
> + * were moved to just after the udev_monitor_receive_device in order to
> + * avoid the NONBLOCK issue, the scheduler would still come into play

"Trying to move the reset of the @priv->dataReady flag within the loop wouldn't
help much as the scheduler would still come into play."

Everything written below could go to the commit message itself.


Would you agree to the proposed adjustments?
Erik

> + * especially for environments when multiple devices are added into the
> + * system. Even those environments would still be blocked on the udev
> + * socket recv() call. The algorithm for handling that scenario would
> + * be more complex

[libvirt] [PATCH 0/3] add disk driver metadata_cache_size option

2018-11-01 Thread Nikolay Shirokovskiy

Hi, all.

This is a patch series after offlist agreement on introducing
metadata-cache-size option for disks. The options itself is described in 2nd
patch of the series.

There is a plenty of attempts to add option to set qcow2 metadata cache sizes in
past, see [1] [2] to name recent that has comments I tried to address or has
work I reused to some extent.

I would like to address Peter's comment in [1] that this option is image's
option rather then disk's here. While this makes sense if we set specific cache
value that covers disk only partially, here when we have maximum policy that
covers whole disk it makes sense to set same value for all images. The setted
value is only upper limit on cache size and thus cache for every image will
grow proportionally to image data size "visible from top" eventually I guess.
And these sizes are changing during guest lifetime - thus we need set whole
disk limit for every image for 'maxium' effect.

On the other hand if we add policies different from 'maximum' it may require
per image option. Well I can't imagine name for element for backing chain that
will have cache size attribute better then 'driver'). And driver is already
used for top image so I leave it this way.

  KNOWN ISSUES

1. when using -driver to configure disks in command line cache size does not
   get propagated thru backing chain

2. when making external disk snapshot cache size sneak into backing file
   filename and thus cache size for backing image became remembered there

3. blockcommit after such snapshot will not work because libvirt is not ready
   for backing file name is reported as json sometimes

Looks like 1 can be addressed in qemu. The reason for behaviour in 2
I do not understand honestly. And 3 will be naturally addessed after
blockjobs starts working with blockdev configuration I guess.

  LINKS

[1] [PATCH] qemu: Added support L2 table cache for qcow2 disk
https://www.redhat.com/archives/libvir-list/2018-July/msg00166.html

[2] [PATCH v6 0/2] Add support for qcow2 cache
 https://www.redhat.com/archives/libvir-list/2017-September/msg00553.html

Nikolay Shirokovskiy (3):
  qemu: caps: add drive.qcow2.l2-cache-size
  xml: add disk driver metadata_cache_size option
  qemu: support metadata-cache-size for blockdev

 docs/formatdomain.html.in  |  7 
 docs/schemas/domaincommon.rng  | 10 +
 src/conf/domain_conf.c | 17 
 src/conf/domain_conf.h |  9 
 src/qemu/qemu_block.c  |  5 ++-
 src/qemu/qemu_capabilities.c   |  3 ++
 src/qemu/qemu_capabilities.h   |  1 +
 src/qemu/qemu_command.c| 26 
 src/qemu/qemu_domain.c |  2 +
 src/util/virstoragefile.c  |  1 +
 src/util/virstoragefile.h  |  1 +
 tests/qemucapabilitiesdata/caps_2.10.0.aarch64.xml |  1 +
 tests/qemucapabilitiesdata/caps_2.10.0.ppc64.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.10.0.s390x.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.10.0.x86_64.xml  |  1 +
 tests/qemucapabilitiesdata/caps_2.11.0.s390x.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.11.0.x86_64.xml  |  1 +
 tests/qemucapabilitiesdata/caps_2.12.0.aarch64.xml |  1 +
 tests/qemucapabilitiesdata/caps_2.12.0.ppc64.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.12.0.s390x.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.12.0.x86_64.xml  |  1 +
 tests/qemucapabilitiesdata/caps_2.5.0.x86_64.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.6.0.aarch64.xml  |  1 +
 tests/qemucapabilitiesdata/caps_2.6.0.ppc64.xml|  1 +
 tests/qemucapabilitiesdata/caps_2.6.0.x86_64.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.7.0.s390x.xml|  1 +
 tests/qemucapabilitiesdata/caps_2.7.0.x86_64.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.8.0.s390x.xml|  1 +
 tests/qemucapabilitiesdata/caps_2.8.0.x86_64.xml   |  1 +
 tests/qemucapabilitiesdata/caps_2.9.0.ppc64.xml|  1 +
 tests/qemucapabilitiesdata/caps_2.9.0.s390x.xml|  1 +
 tests/qemucapabilitiesdata/caps_2.9.0.x86_64.xml   |  1 +
 tests/qemucapabilitiesdata/caps_3.0.0.ppc64.xml|  1 +
 tests/qemucapabilitiesdata/caps_3.0.0.riscv32.xml  |  1 +
 tests/qemucapabilitiesdata/caps_3.0.0.riscv64.xml  |  1 +
 tests/qemucapabilitiesdata/caps_3.0.0.s390x.xml|  1 +
 tests/qemucapabilitiesdata/caps_3.0.0.x86_64.xml   |  1 +
 .../qemuxml2argvdata/disk-metadata_cache_size.args | 34 +++
 .../qemuxml2argvdata/disk-metadata_cache_size.xml  | 42 +++
 tests/qemuxml2argvtest.c   |  2 +
 .../disk-metadata_cache_size.xml   | 48 ++
 tests/qemuxml2xmltest.c|  2 +
 42 files changed, 235 insertions(+), 1 deletion(-)
 create mode 100644 tests/qemuxml2argvdata/disk-metadata_cache_size.args
 create mode 100644 tests/qemuxml2argvdat

[libvirt] [PATCH 3/3] qemu: support metadata-cache-size for blockdev

2018-11-01 Thread Nikolay Shirokovskiy

Just set l2-cache-size to INT64_MAX for all format nodes of
qcow2 type in block node graph.

AFAIK this is sane because *actual* cache size depends on size
of data being referenced in image and thus the total size of
all cache sizes for all images in disk backing chain will not
exceed the cache size that covers just one full image as in
case of no backing chain.

Signed-off-by: Nikolay Shirokovskiy 
---
 src/qemu/qemu_block.c | 5 -
 src/qemu/qemu_domain.c| 1 +
 src/util/virstoragefile.c | 1 +
 src/util/virstoragefile.h | 1 +
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c
index 5321dda..8771cc1 100644
--- a/src/qemu/qemu_block.c
+++ b/src/qemu/qemu_block.c
@@ -1322,7 +1322,6 @@ 
qemuBlockStorageSourceGetFormatQcow2Props(virStorageSourcePtr src,
  * 'pass-discard-snapshot'
  * 'pass-discard-other'
  * 'overlap-check'
- * 'l2-cache-size'
  * 'l2-cache-entry-size'
  * 'refcount-cache-size'
  * 'cache-clean-interval'
@@ -1331,6 +1330,10 @@ 
qemuBlockStorageSourceGetFormatQcow2Props(virStorageSourcePtr src,
 if (qemuBlockStorageSourceGetFormatQcowGenericProps(src, "qcow2", props) < 
0)
 return -1;
 
+if (src->metadata_cache_size == 
VIR_DOMAIN_DISK_METADATA_CACHE_SIZE_MAXIMUM &&
+virJSONValueObjectAdd(props, "I:l2-cache-size", INT64_MAX, NULL) < 0)
+return -1;
+
 return 0;
 }
 
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 896adf3..f87cfd2 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -13245,6 +13245,7 @@ qemuDomainPrepareDiskSourceData(virDomainDiskDefPtr 
disk,
 src->iomode = disk->iomode;
 src->cachemode = disk->cachemode;
 src->discard = disk->discard;
+src->metadata_cache_size = disk->metadata_cache_size;
 
 if (disk->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY)
 src->floppyimg = true;
diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c
index 94c32d8..9089e2f 100644
--- a/src/util/virstoragefile.c
+++ b/src/util/virstoragefile.c
@@ -2210,6 +2210,7 @@ virStorageSourceCopy(const virStorageSource *src,
 ret->cachemode = src->cachemode;
 ret->discard = src->discard;
 ret->detect_zeroes = src->detect_zeroes;
+ret->metadata_cache_size = src->metadata_cache_size;
 
 /* storage driver metadata are not copied */
 ret->drv = NULL;
diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h
index 3ff6c4f..8b57399 100644
--- a/src/util/virstoragefile.h
+++ b/src/util/virstoragefile.h
@@ -331,6 +331,7 @@ struct _virStorageSource {
 int cachemode; /* enum virDomainDiskCache */
 int discard; /* enum virDomainDiskDiscard */
 int detect_zeroes; /* enum virDomainDiskDetectZeroes */
+int metadata_cache_size; /* enum virDomainDiskImageMetadataCacheSize */
 
 bool floppyimg; /* set to true if the storage source is going to be used
as a source for floppy drive */
-- 
1.8.3.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH libvirt-python] event-test.py: Report ERROR events

2018-11-01 Thread Philipp Hahn

VIR_DOMAIN_EVENT_ID_IO_ERROR and VIR_DOMAIN_EVENT_ID_IO_ERROR_REASON
callbacks receive the same 'action' parameter, so also translate that
numeric action to a descriptive text for the first callback.

Signed-off-by: Philipp Hahn 
---
 examples/event-test.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/event-test.py b/examples/event-test.py
index dabf4b0..709277b 100755
--- a/examples/event-test.py
+++ b/examples/event-test.py
@@ -526,8 +526,8 @@ def myDomainEventWatchdogCallback(conn, dom, action, 
opaque):
 
 
 def myDomainEventIOErrorCallback(conn, dom, srcpath, devalias, action, opaque):
-print("myDomainEventIOErrorCallback: Domain %s(%s) %s %s %d" % (
-dom.name(), dom.ID(), srcpath, devalias, action))
+print("myDomainEventIOErrorCallback: Domain %s(%s) %s %s %s" % (
+dom.name(), dom.ID(), srcpath, devalias, ERROR_EVENTS[action]))
 
 
 def myDomainEventIOErrorReasonCallback(conn, dom, srcpath, devalias, action, 
reason, opaque):
-- 
2.11.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH libvirt-python] event-test.py: Fix ERROR event

2018-11-01 Thread Philipp Hahn

ERROR_EVENTS translates the numeric 'action' argument to a description,
not the 'reason' argument which already contains a descriptive string
like 'enospc'.

> Traceback (most recent call last):
>   File "/usr/lib/python2.7/dist-packages/libvirt.py", line 4661, in 
> _dispatchDomainEventIOErrorReasonCallback
> reason, opaque)
>   File "libvirt-python/examples/event-test.py", line 536, in 
> myDomainEventIOErrorReasonCallback
> dom.name(), dom.ID(), srcpath, devalias, action, ERROR_EVENTS[reason]))
>   File "libvirt-python/examples/event-test.py", line 474, in __getitem__
> data = self.args[item]
> TypeError: tuple indices must be integers, not str

Fixes: f5928c6711654f1496707ca77f626b3192843d57
Signed-off-by: Philipp Hahn 
---
 examples/event-test.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/event-test.py b/examples/event-test.py
index eeb2777..dabf4b0 100755
--- a/examples/event-test.py
+++ b/examples/event-test.py
@@ -531,8 +531,8 @@ def myDomainEventIOErrorCallback(conn, dom, srcpath, 
devalias, action, opaque):
 
 
 def myDomainEventIOErrorReasonCallback(conn, dom, srcpath, devalias, action, 
reason, opaque):
-print("myDomainEventIOErrorReasonCallback: Domain %s(%s) %s %s %d %s" % (
-dom.name(), dom.ID(), srcpath, devalias, action, ERROR_EVENTS[reason]))
+print("myDomainEventIOErrorReasonCallback: Domain %s(%s) %s %s %s %s" % (
+dom.name(), dom.ID(), srcpath, devalias, ERROR_EVENTS[action], reason))
 
 
 def myDomainEventGraphicsCallback(conn, dom, phase, localAddr, remoteAddr, 
authScheme, subject, opaque):
-- 
2.11.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

2018-11-01 Thread Daniel P . Berrangé

On Thu, Nov 01, 2018 at 09:07:05AM +0100, Michal Privoznik wrote:
> On 11/01/2018 05:11 AM, Ján Tomko wrote:
> > IIRC the problem was that users with vmx disabled in BIOS setup needed
> > to delete libvirt's qemuCaps cache manually after enabling it even after
> > restarting the system.
> 
> Sounds like it. The commit that introduced the check is d87df9bd39b.
> 
> Honestly, I don't think we need to care about perms - we can assume
> those are set properly (as they don't change often). What we have to

We can't assume permissions are correct. One of the core issues we hit
was that the udev rule which sets the permissions has been in the QEMU
RPM historically. So you can have the kernel module loaded automatically
by systemd when it detects VMX, but its permissions will be 0600 until
the qemu-kvm RPM is installed and udev is triggered to fix the permissions
to 0666.

> care about is inserting/removing kvm module (even though it's not
> necessary these days since virtualbox has learned how to co-exist with
> KVM).

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

2018-11-01 Thread Daniel P . Berrangé

On Thu, Nov 01, 2018 at 09:31:50AM +0100, Martin Kletzander wrote:
> On Tue, Oct 30, 2018 at 03:07:59PM +, Daniel P. Berrangé wrote:
> > On Tue, Oct 30, 2018 at 03:45:36PM +0100, Michal Privoznik wrote:
> > > On 10/30/2018 02:46 PM, Michal Privoznik wrote:
> > > >
> > > > It is kernel problem. In my testing, the moment I call:
> > > >
> > > >  ioctl(kvm, KVM_CREATE_VM, 0);
> > > 
> > > Okay, I have to retract this claim. 'udevadm monitor' shows some events:
> > > 
> > > KERNEL[3631.129645] change   /devices/virtual/misc/kvm (misc)
> > > UDEV  [3631.130816] change   /devices/virtual/misc/kvm (misc)
> > > 
> > > and stopping udevd leaves all three times untouched. So it is udev after
> > > all. I just don't know how to find the rule that is causing the issue.
> > > Anyway, as for this patch, I think we can merge it in the end, can't we?
> > 
> > Not really. Udev is in use everywhere, so this behaviour makes the
> > patch useless in practice, even though it is technically right in
> > theory :-(
> > 
> 
> Does it?  With this behaviour we still do the "expensive" work after any 
> machine
> has started.  But for one machine starting it still has the effect of running 
> it
> only once.  And we *need* to run it for each machine unless we also cache the
> result per (at least) user:group of that machine as every machine can run 
> under
> different user:group.

Yes, with the current patch it still rechecks it for each VM startup, but
I was going to suggest the caching of this is simply put in a global static
variable as there's no reason to record this device permissions state in
the per VM caps cache.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: Report warning if QoS is set for vhostuser interface

2018-11-01 Thread Erik Skultety

On Wed, Oct 31, 2018 at 04:47:45PM +0100, Michal Privoznik wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=1524230
>
> Because of historical reasons, we are not denying starting a
> domain which has QoS set for unsupported type of device. We do
> report just a warning instead. And even though we perhaps used to
> do so for vhostuser it got lost somewhere. Bring it back.

So, if my blame-fu isn't flawed, then I don't think that vhostuser ever
reported a warning during machine startup, have a look at commits 4a74ccdb92f
and 0bce012d7f0 respectively - there always was a goto cleanup statement.

>
> Signed-off-by: Michal Privoznik 
> ---
>  src/qemu/qemu_command.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index 6e3ff67660..489e8bc689 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -8246,6 +8246,12 @@ qemuBuildVhostuserCommandLine(virQEMUDriverPtr driver,
>  goto cleanup;
>  }
>
> +if (virDomainNetGetActualBandwidth(net)) {
> +VIR_WARN("setting bandwidth on interfaces of "
> + "type '%s' is not implemented yet",
> + virDomainNetTypeToString(VIR_DOMAIN_NET_TYPE_VHOSTUSER));
> +}
> +

We already do this kind of checking on line 8593 in the same source file. It's
just because of the goto cleanup jump after calling
qemuBuildVhostuserCommandLine that causes the logic to never reach the condition
(Not to mention the code is a mess). I'd suggest moving the already existing
check to the top of the qemuBuildInterfaceCommandLine function instead of
duplicating the code.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

2018-11-01 Thread Martin Kletzander


On Mon, Oct 29, 2018 at 06:34:58PM +0100, Marc Hartmayer wrote:

Introduce caching whether /dev/kvm is usable as the QEMU user:QEMU
group. This reduces the overhead of the QEMU capabilities cache
lookup. Before this patch there were many fork() calls used for
checking whether /dev/kvm is accessible. Now we store the result
whether /dev/kvm is accessible or not and we only need to re-run the
virFileAccessibleAs check if the ctime of /dev/kvm has changed.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Marc Hartmayer 
---
src/qemu/qemu_capabilities.c | 54 ++--
1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index e228f52ec0bb..85516954149b 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -3238,6 +3238,10 @@ struct _virQEMUCapsCachePriv {
virArch hostArch;
unsigned int microcodeVersion;
char *kernelVersion;
+
+/* cache whether /dev/kvm is usable as runUid:runGuid */


Even though this doesn't solve all of it (the ctime change with udev etc.) it
makes it better.  However this needs to be checked (or cached) per seclabel of
the domain as it can run under non-default user.


+virTristateBool kvmUsable;
+time_t kvmCtime;
};
typedef struct _virQEMUCapsCachePriv virQEMUCapsCachePriv;
typedef virQEMUCapsCachePriv *virQEMUCapsCachePrivPtr;
@@ -3824,6 +3828,52 @@ virQEMUCapsSaveFile(void *data,
}


+/* Determine whether '/dev/kvm' is usable as QEMU user:QEMU group. */
+static bool
+virQEMUCapsKVMUsable(virQEMUCapsCachePrivPtr priv)
+{
+struct stat sb;
+static const char *kvm_device = "/dev/kvm";
+virTristateBool value;
+virTristateBool cached_value = priv->kvmUsable;
+time_t kvm_ctime;
+time_t cached_kvm_ctime = priv->kvmCtime;
+
+if (stat(kvm_device, &sb) < 0) {
+virReportSystemError(errno,
+ _("Failed to stat %s"), kvm_device);
+return false;
+}
+kvm_ctime = sb.st_ctime;
+
+if (kvm_ctime != cached_kvm_ctime) {
+VIR_DEBUG("%s has changed (%lld vs %lld)", kvm_device,
+  (long long)kvm_ctime, (long long)cached_kvm_ctime);
+cached_value = VIR_TRISTATE_BOOL_ABSENT;
+}
+
+if (cached_value != VIR_TRISTATE_BOOL_ABSENT)
+return cached_value == VIR_TRISTATE_BOOL_YES;
+
+if (virFileAccessibleAs(kvm_device, R_OK | W_OK,
+priv->runUid, priv->runGid) == 0) {
+value = VIR_TRISTATE_BOOL_YES;
+} else {
+value = VIR_TRISTATE_BOOL_NO;
+}
+
+/* There is a race window between 'stat' and
+ * 'virFileAccessibleAs'. However, since we're only interested in
+ * detecting changes *after* the virFileAccessibleAs check, we can
+ * neglect this here.
+ */
+priv->kvmCtime = kvm_ctime;
+priv->kvmUsable = value;
+
+return value == VIR_TRISTATE_BOOL_YES;
+}
+
+
static bool
virQEMUCapsIsValid(void *data,
   void *privData)
@@ -3872,8 +3922,7 @@ virQEMUCapsIsValid(void *data,
return true;
}

-kvmUsable = virFileAccessibleAs("/dev/kvm", R_OK | W_OK,
-priv->runUid, priv->runGid) == 0;
+kvmUsable = virQEMUCapsKVMUsable(priv);

if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_KVM) &&
kvmUsable) {
@@ -4684,6 +4733,7 @@ virQEMUCapsCacheNew(const char *libDir,
priv->runUid = runUid;
priv->runGid = runGid;
priv->microcodeVersion = microcodeVersion;
+priv->kvmUsable = VIR_TRISTATE_BOOL_ABSENT;

if (uname(&uts) == 0 &&
virAsprintf(&priv->kernelVersion, "%s %s", uts.release, uts.version) < 
0)
--
2.17.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

2018-11-01 Thread Martin Kletzander


On Thu, Nov 01, 2018 at 09:31:50AM +0100, Martin Kletzander wrote:

On Tue, Oct 30, 2018 at 03:07:59PM +, Daniel P. Berrangé wrote:

On Tue, Oct 30, 2018 at 03:45:36PM +0100, Michal Privoznik wrote:

On 10/30/2018 02:46 PM, Michal Privoznik wrote:
> On 10/30/2018 01:55 PM, Daniel P. Berrangé wrote:
>> On Tue, Oct 30, 2018 at 10:32:08AM +, Daniel P. Berrangé wrote:
>>> On Tue, Oct 30, 2018 at 11:08:45AM +0100, Michal Privoznik wrote:
 On 10/30/2018 10:35 AM, Daniel P. Berrangé wrote:
> On Tue, Oct 30, 2018 at 09:13:50AM +0100, Michal Privoznik wrote:
>> On 10/29/2018 06:34 PM, Marc Hartmayer wrote:
>>> Introduce caching whether /dev/kvm is usable as the QEMU user:QEMU
>>> group. This reduces the overhead of the QEMU capabilities cache
>>> lookup. Before this patch there were many fork() calls used for
>>> checking whether /dev/kvm is accessible. Now we store the result
>>> whether /dev/kvm is accessible or not and we only need to re-run the
>>> virFileAccessibleAs check if the ctime of /dev/kvm has changed.
>>>
>>> Suggested-by: Daniel P. Berrangé 
>>> Signed-off-by: Marc Hartmayer 
>>> ---
>>>  src/qemu/qemu_capabilities.c | 54 ++--
>>>  1 file changed, 52 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
>>> index e228f52ec0bb..85516954149b 100644
>>> --- a/src/qemu/qemu_capabilities.c
>>> +++ b/src/qemu/qemu_capabilities.c
>>> @@ -3238,6 +3238,10 @@ struct _virQEMUCapsCachePriv {
>>>  virArch hostArch;
>>>  unsigned int microcodeVersion;
>>>  char *kernelVersion;
>>> +
>>> +/* cache whether /dev/kvm is usable as runUid:runGuid */
>>> +virTristateBool kvmUsable;
>>> +time_t kvmCtime;
>>>  };
>>>  typedef struct _virQEMUCapsCachePriv virQEMUCapsCachePriv;
>>>  typedef virQEMUCapsCachePriv *virQEMUCapsCachePrivPtr;
>>> @@ -3824,6 +3828,52 @@ virQEMUCapsSaveFile(void *data,
>>>  }
>>>
>>>
>>> +/* Determine whether '/dev/kvm' is usable as QEMU user:QEMU group. */
>>> +static bool
>>> +virQEMUCapsKVMUsable(virQEMUCapsCachePrivPtr priv)
>>> +{
>>> +struct stat sb;
>>> +static const char *kvm_device = "/dev/kvm";
>>> +virTristateBool value;
>>> +virTristateBool cached_value = priv->kvmUsable;
>>> +time_t kvm_ctime;
>>> +time_t cached_kvm_ctime = priv->kvmCtime;
>>> +
>>> +if (stat(kvm_device, &sb) < 0) {
>>> +virReportSystemError(errno,
>>> + _("Failed to stat %s"), kvm_device);
>>> +return false;
>>> +}
>>> +kvm_ctime = sb.st_ctime;
>>
>> This doesn't feel right. /dev/kvm ctime is changed every time qemu is
>> started or powered off (try running stat over it before and after a
>> domain is started/shut off). So effectively we will fork more often than
>> we would think. Should we cache inode number instead? Because for all
>> that we care is simply if the file is there.
>
> Urgh, that is a bit strange and not the usual semantics for timestamps :-(

 Indeed.

>
> We can't stat the inode - the code was explicitly trying to cope with the
> way /dev/kvm can change permissions when udev rules get applied. We would
> have to compare the user, group, permissions mask and even ACL, or a hash
> of those.

 Well, we can use ctime as suggested and post a patch for kernel to fix
 ctime behaviour. Until the patch is merged our behaviour would be
 suboptimal, but still better than it is now.
>>>
>>> I guess lets talk to KVM team for their input on this and then decide
>>> what todo.
>>
>> Hmm, I wonder if it is not actually a kernel problem, but rather something
>> in userspace genuinely touching the device in a way that caues these
>> timestamps to be updated.
>>
>
> It is kernel problem. In my testing, the moment I call:
>
>  ioctl(kvm, KVM_CREATE_VM, 0);

Okay, I have to retract this claim. 'udevadm monitor' shows some events:

KERNEL[3631.129645] change   /devices/virtual/misc/kvm (misc)
UDEV  [3631.130816] change   /devices/virtual/misc/kvm (misc)

and stopping udevd leaves all three times untouched. So it is udev after
all. I just don't know how to find the rule that is causing the issue.
Anyway, as for this patch, I think we can merge it in the end, can't we?


Not really. Udev is in use everywhere, so this behaviour makes the
patch useless in practice, even though it is technically right in
theory :-(



Does it?  With this behaviour we still do the "expensive" work after any machine
has started.  But for one machine starting it still has the effect of running it
only once.  And we *need* to run it for each machine unless we also cache the
result per (at least) user:group of that machine as every machine can run under
different user

Re: [libvirt] [PATCH v2] vhost-user: define conventions for vhost-user backends

2018-11-01 Thread Marc-André Lureau

Hi

On Wed, Oct 31, 2018 at 8:05 PM Daniel P. Berrangé  wrote:
>
> On Wed, Oct 31, 2018 at 07:44:36PM +0400, Marc-André Lureau wrote:
> > Hi
> >
> > On Tue, Oct 16, 2018 at 4:14 PM Daniel P. Berrangé  
> > wrote:
> > > > > It just reinvents the chardev unix socket syntax, but in a
> > > > > different adhoc manner, which is repeating the mistake we have
> > > > > made time & again in QEMU. Using QAPI we can directly accept
> > > > > the ChardevSocket syntax we already know about. Instead of
> > > > > having --socket-path and --fd and documenting that they are
> > > > > mutually exclusive ChardevSocket QAPI schema provides that
> > > > > representation in a well defined format which is discoverable
> > > > > and QEMU and mgmt apps already understand.
> > > >
> > > > That would require external vhost-user backends to implement QAPI/json
> > > > parsing. Is this necessary for a vhost-user backend? I doubt it.
> > >
> > > They could, but would not be required, to implement QAPI/json parser.
> > >
> > > The QAPI schema defines a standard for how to model & interpret the
> > > non-scalar values for command line arguments. An external impl would
> > > need to ensure that whatever parsing it does for CLI args is semantically
> > > compatible with the parsing rules defined by the QEMU QAPI schema we
> > > define to ensure interoperability of its impl.
> >
> > So you would want to have something like?
> >
> > --chardev '{ "id" : "bar", "backend" : { "type" : "socket", "data" : {
> > "addr" : { "type": "unix", "path": "/tmp/foo.sock" }, "server":
> > "false"  } } }'
>
> I wasn't specificially suggesting json syntax. Just the flattened
> dot separate syntax, whose valid components are mapped to corresponding
> qapi schema defintions. eg closer to what we have already today
>
>   --chardev socket,id=bar,path=/tmp/foo.sock,server
>

ok

> >
> > instead of:
> >
> > --socket-path=/tmp/foo.sock
> >
> > I don't really get what that will help with, for the vhost-user backend 
> > case...
>
> It avoids presuming that we forever want to launch the backend with
> a single socket path and nothing else. Using the chardev, means we
> can choosen between client/server mode. We can choose whether to
> pass an FD to the socket, instead of the socket path. Whether the
> reconnect flag is set or not, and all the other aspects of a chardev
> you can define.
>

We are trying to define a common specification for vhost-user
backends. Where do we stop defining what --chardev should support?

> I don't think we should have to add more & more adhoc CLI args (--socket-path,
> --fd, --reconnect, etc) and then manually parse them & pack their values
> together into a chardev object.

The backends most likely won't use qemu chardev (nor qapi), and be
limited to unix socket.


>
>
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

2018-11-01 Thread Martin Kletzander


On Tue, Oct 30, 2018 at 03:07:59PM +, Daniel P. Berrangé wrote:

On Tue, Oct 30, 2018 at 03:45:36PM +0100, Michal Privoznik wrote:

On 10/30/2018 02:46 PM, Michal Privoznik wrote:
> On 10/30/2018 01:55 PM, Daniel P. Berrangé wrote:
>> On Tue, Oct 30, 2018 at 10:32:08AM +, Daniel P. Berrangé wrote:
>>> On Tue, Oct 30, 2018 at 11:08:45AM +0100, Michal Privoznik wrote:
 On 10/30/2018 10:35 AM, Daniel P. Berrangé wrote:
> On Tue, Oct 30, 2018 at 09:13:50AM +0100, Michal Privoznik wrote:
>> On 10/29/2018 06:34 PM, Marc Hartmayer wrote:
>>> Introduce caching whether /dev/kvm is usable as the QEMU user:QEMU
>>> group. This reduces the overhead of the QEMU capabilities cache
>>> lookup. Before this patch there were many fork() calls used for
>>> checking whether /dev/kvm is accessible. Now we store the result
>>> whether /dev/kvm is accessible or not and we only need to re-run the
>>> virFileAccessibleAs check if the ctime of /dev/kvm has changed.
>>>
>>> Suggested-by: Daniel P. Berrangé 
>>> Signed-off-by: Marc Hartmayer 
>>> ---
>>>  src/qemu/qemu_capabilities.c | 54 ++--
>>>  1 file changed, 52 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
>>> index e228f52ec0bb..85516954149b 100644
>>> --- a/src/qemu/qemu_capabilities.c
>>> +++ b/src/qemu/qemu_capabilities.c
>>> @@ -3238,6 +3238,10 @@ struct _virQEMUCapsCachePriv {
>>>  virArch hostArch;
>>>  unsigned int microcodeVersion;
>>>  char *kernelVersion;
>>> +
>>> +/* cache whether /dev/kvm is usable as runUid:runGuid */
>>> +virTristateBool kvmUsable;
>>> +time_t kvmCtime;
>>>  };
>>>  typedef struct _virQEMUCapsCachePriv virQEMUCapsCachePriv;
>>>  typedef virQEMUCapsCachePriv *virQEMUCapsCachePrivPtr;
>>> @@ -3824,6 +3828,52 @@ virQEMUCapsSaveFile(void *data,
>>>  }
>>>
>>>
>>> +/* Determine whether '/dev/kvm' is usable as QEMU user:QEMU group. */
>>> +static bool
>>> +virQEMUCapsKVMUsable(virQEMUCapsCachePrivPtr priv)
>>> +{
>>> +struct stat sb;
>>> +static const char *kvm_device = "/dev/kvm";
>>> +virTristateBool value;
>>> +virTristateBool cached_value = priv->kvmUsable;
>>> +time_t kvm_ctime;
>>> +time_t cached_kvm_ctime = priv->kvmCtime;
>>> +
>>> +if (stat(kvm_device, &sb) < 0) {
>>> +virReportSystemError(errno,
>>> + _("Failed to stat %s"), kvm_device);
>>> +return false;
>>> +}
>>> +kvm_ctime = sb.st_ctime;
>>
>> This doesn't feel right. /dev/kvm ctime is changed every time qemu is
>> started or powered off (try running stat over it before and after a
>> domain is started/shut off). So effectively we will fork more often than
>> we would think. Should we cache inode number instead? Because for all
>> that we care is simply if the file is there.
>
> Urgh, that is a bit strange and not the usual semantics for timestamps :-(

 Indeed.

>
> We can't stat the inode - the code was explicitly trying to cope with the
> way /dev/kvm can change permissions when udev rules get applied. We would
> have to compare the user, group, permissions mask and even ACL, or a hash
> of those.

 Well, we can use ctime as suggested and post a patch for kernel to fix
 ctime behaviour. Until the patch is merged our behaviour would be
 suboptimal, but still better than it is now.
>>>
>>> I guess lets talk to KVM team for their input on this and then decide
>>> what todo.
>>
>> Hmm, I wonder if it is not actually a kernel problem, but rather something
>> in userspace genuinely touching the device in a way that caues these
>> timestamps to be updated.
>>
>
> It is kernel problem. In my testing, the moment I call:
>
>  ioctl(kvm, KVM_CREATE_VM, 0);

Okay, I have to retract this claim. 'udevadm monitor' shows some events:

KERNEL[3631.129645] change   /devices/virtual/misc/kvm (misc)
UDEV  [3631.130816] change   /devices/virtual/misc/kvm (misc)

and stopping udevd leaves all three times untouched. So it is udev after
all. I just don't know how to find the rule that is causing the issue.
Anyway, as for this patch, I think we can merge it in the end, can't we?


Not really. Udev is in use everywhere, so this behaviour makes the
patch useless in practice, even though it is technically right in
theory :-(



Does it?  With this behaviour we still do the "expensive" work after any machine
has started.  But for one machine starting it still has the effect of running it
only once.  And we *need* to run it for each machine unless we also cache the
result per (at least) user:group of that machine as every machine can run under
different user:group.

I'll go through the patch again (just skimmed it the first

Re: [libvirt] [PATCH v2] qemu: Introduce caching whether /dev/kvm is accessible

2018-11-01 Thread Michal Privoznik

On 11/01/2018 05:11 AM, Ján Tomko wrote:
> On Tue, Oct 30, 2018 at 03:53:59PM +, Daniel P. Berrangé wrote:
>> On Tue, Oct 30, 2018 at 04:47:32PM +0100, Michal Privoznik wrote:
>>> Well, caching owner + seclabels + ACLs won't help either. What if user
>>> loads some profile into AppArmor or something that denies previously
>>> allowed access to /dev/kvm (or vice versa)? What I am saying is that
>>> there are some security models which base their decisions on something
>>> else than file attributes.
>>
>> The virFileAccessibleAs check for /dev/kvm was put in their to ensure
>> that we correctly report usability of KVM in the capabilities XML
>> according to file permissions/ownership. Essentially KVM is not usable
>> until the udev rule is applied to change permissions to world accessible
>> or to set the kvm group.
>>
> 
> Can libvirt be run sooner than the permissions are set up during regular
> start-up, or this is just for the case of the user randomly touching the
> permissions?

Sure it can. And it is. For instance, I have ConsoleKit installed on my
system and when I log in as a regular user into my system, it adds an
ACL entry to /dev/kvm to allow access to the user. Libvirtd is already
running at that point (well, except not really because I don't run it as
a service since I'm running it from git all the time. But you get the
picture).

> 
> IIRC the problem was that users with vmx disabled in BIOS setup needed
> to delete libvirt's qemuCaps cache manually after enabling it even after
> restarting the system.

Sounds like it. The commit that introduced the check is d87df9bd39b.

Honestly, I don't think we need to care about perms - we can assume
those are set properly (as they don't change often). What we have to
care about is inserting/removing kvm module (even though it's not
necessary these days since virtualbox has learned how to co-exist with
KVM).

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] nwfilter: intantiate filters on loading driver

2018-11-01 Thread Nikolay Shirokovskiy



On 01.11.2018 03:48, John Ferlan wrote:
> 
> 
> On 10/31/18 10:41 AM, Nikolay Shirokovskiy wrote:
>> On 31.10.2018 16:14, John Ferlan wrote:
>>>
>>>
>>> On 10/30/18 3:24 AM, Nikolay Shirokovskiy wrote:


 On 29.10.2018 22:37, John Ferlan wrote:
>
>
> On 10/15/18 4:26 AM, Nikolay Shirokovskiy wrote:
>> Before using filters binding filters instantiation was done by 
>> hypervisors
>> drivers initialization code (qemu was the only such hypervisor). Now qemu
>> reconnection done supposes it should be done by nwfilter driver probably.
>> Let's add this missing step.
>>
>> Signed-off-by: Nikolay Shirokovskiy 
>> ---
>>  src/nwfilter/nwfilter_driver.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>
> If there's research you've done where the instantiation was done before
> introduction of the nwfilter bindings that would be really helpful...
>
> I found that virNWFilterBuildAll was introduced in commit 3df907bfff.
> There were 2 callers for it:
>
>1. virNWFilterTriggerRebuildImpl
>2. nwfilterStateReload
>
> The former called as part of the virNWFilterConfLayerInit callback
> during nwfilterStateInitialize (about 50 lines earlier).
>>>
>>> First off let me say you certainly find a lot of interesting bugs/issues
>>> that are complex and that's good. Often times I wonder how you trip
>>> across things or how to quantify what you've found. Some are simpler
>>> than others. This one on the surface would seem to be simple, but I keep
>>> wondering is there a downside.
>>
>> The issue is related to my recent posts:
>>
>> [1] [RFC] Faster libvirtd restart with nwfilter rules
>> https://www.redhat.com/archives/libvir-list/2018-September/msg01206.html
>> which continues here:
>> https://www.redhat.com/archives/libvir-list/2018-October/msg00657.html
>>
>> In short if there is lots of VMs with filters then libvirtd restart takes a 
>> lot of time
>> during which libvirtd is unresponsive. By the way the issue is found in 
>> libvirt
>> versions 3.9.0 and ealier (don't know of newer versions, Virtuozzo is based 
>> on 3.9.0 now,
>> just like Centos7 :) )
> 
> So many different issues - trying to focus on just the one for this
> particular patch.
> 
>>
>> And attempts to fix it:
>>
>> [2] [PATCH RFC 0/4] nwfilter: don't reinstantiate filters if they are not 
>> changed
>> https://www.redhat.com/archives/libvir-list/2018-October/msg00904.html
>>
>> [3]  [PATCH v2 0/2] nwfilter: don't reinstantiate rules if there is no need 
>> to
>> https://www.redhat.com/archives/libvir-list/2018-October/msg01317.html
>>
>> And the reason I started v2 is Laine mentioned that it is important to
>> reinstantiate rules on restart (v1 do not care if somebody messed tables).
>>
> 
> I've seen the patches and even read some briefly, but still need to take
> this particular patch as separate from those.
> 
>> And I discovered that upstream version stops reinstantiating rules at all
>> after introducing filters bindings. And this functionality is old:
> 
> So the key is "when" did that happen?  That is can you pinpoint or
> bisect a time in history where the filters were instantiated? And by
> instantiated what exactly (call) are you referencing?

The below commit is the one. It adds instantiating filters on
libvirtd restart. By instantiating I mean that firewall rules
correspondent to filters are actually get [re]added to iptables etc.

> 
>>
>> commit cf6f8b9a9720fe5323a84e690de9fbf8ba41f6ac
>> Author: Stefan Berger 
>> Date:   Mon Aug 16 12:59:54 2010 -0400
>>
>> nwfilter: extend nwfilter reload support
>>
> 
> Yes, nwfilter is old, brittle, and in need of attention. Not sure anyone
> really wants to take on the task though! I just realized I never got the
> common object code implemented there. Mostly because of lock issues.
> Suffice to say there's more "interesting" changes in the nwfilter space
> being discussed internally.
> 
>>>
>>> The nwfilter processing is kindly said rather strange, complex, and to a
>>> degree fragile. Thus when patches come along they are met with greater
>>> scrutiny. From just reading the commit message here I don't get a sense
>>> for the problem and why the solution fixes it. So I'm left to try and
>>> research myself and ask a lot of questions.
>>>
>>> BTW, some of the detail provided in this response is really useful as
>>> either part of a cover or after the --- in the single patch. I would
>>> think you'd "know" when you've done lots of research into a problem that
>>> providing reviews more than a mere hint would be useful! For nwfilter
>>> bindings, it's hard to imagine this is one of those - it just happened
>>> type events. There seems to be a very specific sequence that's missing
>>> from the commit description.
>>>

 Right but virNWFilterTriggerRebuildImpl is not actually called, it
 is set as rebuild callback. This rebuild callback can be called in 
 virNWFi

41 matches

Mail list logo