Re: [Qemu-devel] [PATCH v3 0/5] Fix handling of IPv4/IPv6 dual stack

2017-06-10 Thread Vlad Yasevich
On 06/01/2017 04:29 AM, Daniel P. Berrange wrote:
> This is a followup to:
> 
>   v1: https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg05659.html
>   v2: https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04706.html
> 
> This series aims to fix a lot of bugs related to handling of IPv4 / IPv6
> dual stack.
> 
>  - The VNC server mistakenly listened on two separate ports 5900+5901
>when the to= parameter was given
>  - IPv6 sockets are accepting IPv4 clients even when IPv4 is set to
>be disabled
>  - IPv6 sockets are failing to accept IPv4 clients when IPv4 is not set
>to be disabled
>  - The VNC server was loosing the ipv4=/ipv6= settings due to a bug
>in the DNS resolver
> 
> The behaviour of all this is really subtle and hard to get working correctly
> across all the different network backends. Thus, the most important part of
> this patch series is the last patch which adds a test case covering the
> backends for -vnc, -chardev tcp, -net socket, and -incoming socket, with
> a 120 entry matrix.
> 
> IOW, if you think any of the first 4 patches are applying the wrong logic,
> then take a look at the last patch and indicate which test matrix entries
> are believed to be defining wrong behaviour :-)
> 
> Changed in v2:
> 
>  - Clarified error message in commit message
>  - Replace assert with exit (Eric)
>  - Fix typo in test comment (Eric)
>  - Fix wierd copyright line (Eric)
> 
> Daniel P. Berrange (5):
>   sockets: ensure we can bind to both ipv4 & ipv6 separately
>   sockets: don't block IPv4 clients when listening on "::"
>   sockets: ensure we don't accept IPv4 clients when IPv4 is disabled
>   io: preserve ipv4/ipv6 flags when resolving InetSocketAddress
>   tests: add functional test validating ipv4/ipv6 address flag handling
> 
>  io/dns-resolver.c  |   6 +-
>  tests/.gitignore   |   1 +
>  tests/Makefile.include |   4 +
>  tests/test-sockets-proto.c | 855 
> +
>  util/qemu-sockets.c|  71 +++-
>  5 files changed, 916 insertions(+), 21 deletions(-)
>  create mode 100644 tests/test-sockets-proto.c
> 

Series Reviewed-by: Vlad Yasevich 

-vlad



Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-06-01 Thread Vlad Yasevich
On 06/01/2017 03:02 AM, Jason Wang wrote:
> 
> 
> On 2017年05月31日 02:57, Dr. David Alan Gilbert wrote:
>> * Vlad Yasevich (vyase...@redhat.com) wrote:
>>> On 05/26/2017 12:03 AM, Jason Wang wrote:
>>>> On 2017年05月25日 02:05, Vladislav Yasevich wrote:
>>>>> Add parameters that control RARP/GARP announcement timeouts.
>>>>> The parameters structure is added to the QAPI and a qmp command
>>>>> is added to set/get the parameter data.
>>>>>
>>>>> Based on work by "Dr. David Alan Gilbert"
>>>>>
>>>>> Signed-off-by: Vladislav Yasevich
>>>> I think it's better to explain e.g under which condition do we need to 
>>>> tweak such
>>>> parameters.
>>>>
>>>> Thanks
>>>>
>>> OK.  I'll rip off what dgilbert wrote in his original series for the 
>>> description.
>>>
>>> Dave, if you have any text to add as to why migration might need to tweak 
>>> this, I'd
>>> appreciate it.
>> Pretty much what I originally said;  that the existing values
>> are arbitrary and fixed, and for systems with complex/sluggish
>> network reconfiguration systems they can be too slow.
>>
>> Dave
>>
> 
> I agree, but I'm not sure how much it can help in fact unless management can 
> set
> configuration specific parameters. And what we did here is best effort, 
> there's no
> guarantee that G(R)ARP packet can reach the destination.
> 

So, that's what the series allows.  If management knows something new, it can 
set
appropriate parameter values.  Additionally, the management is also free to 
trigger
additional announcements through the new commands.

I am starting to think that just for the sake of migration, exposing 
announce_self
interface to management might be sufficient.  Management, when it deems 
migration
complete, may use the interface to trigger announcements in addition to whatever
best effort QEMU may attempt itself.

Dave, would that be enough, or do the parameters still make sense?

Thanks
-vlad





Re: [Qemu-devel] [PATCH 08/12] announce_timer: Add ability to reset an existing

2017-05-30 Thread Vlad Yasevich
On 05/30/2017 03:35 PM, Dr. David Alan Gilbert wrote:
> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>> It is now potentially possible to issue annouce-self command in a tight
>> loop.  Instead of doing nother, we can reset the timeout pararameters,
>> especially since each instance may provide it's own values.  This
>> allows the user to  extend or cut short currently runnig timer.
>>
>> Signed-off-by: Vladislav Yasevich 
> 
> ah ok, you can ignore my comment on the previous patch then!
> 
>> ---
>>  include/migration/vmstate.h |  1 +
>>  migration/savevm.c  | 41 +++--
>>  2 files changed, 32 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 689b685..6dfdac3 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -1057,6 +1057,7 @@ void vmstate_register_ram_global(struct MemoryRegion 
>> *memory);
>>  
>>  typedef struct AnnounceTimer {
>>  QEMUTimer *tm;
>> +QemuMutex active_lock;
>>  struct AnnounceTimer **entry;
>>  AnnounceParameters params;
>>  QEMUClockType type;
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index dcba8bd..e43658f 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -220,20 +220,29 @@ static void qemu_announce_self_iter(NICState *nic, 
>> void *opaque)
>>  
>>  AnnounceTimer *announce_timers[QEMU_ANNOUNCE__MAX];
>>  
>> +static void qemu_announce_timer_destroy(AnnounceTimer *timer)
>> +{
>> +timer_del(timer->tm);
>> +timer_free(timer->tm);
>> +qemu_mutex_destroy(&timer->active_lock);
> 
> Can you explain what makes this safe; we're not
> holding the lock at this point are we? Are we guaranteed
> that no one else will try and take it?
> Either way it should be commented to say why it's safe.

Looking at this again, it doesn't look safe.
The problem is the lookup code.  There is needs to be a lock on the
on the global array that needs to be held during creation/modification.

Thanks
-vlad

> 
> Dave
> 
>> +g_free(timer);
>> +}
>> +
>>  static void qemu_announce_self_once(void *opaque)
>>  {
>>  AnnounceTimer *timer = (AnnounceTimer *)opaque;
>>  
>> +qemu_mutex_lock(&timer->active_lock);
>>  qemu_foreach_nic(qemu_announce_self_iter, NULL);
>>  
>> -if (--timer->round) {
>> +if (--timer->round ) {
>>  timer_mod(timer->tm, qemu_clock_get_ms(timer->type) +
>>self_announce_delay(timer));
>> +qemu_mutex_unlock(&timer->active_lock);
>>  } else {
>> -*(timer->entry) = NULL;
>> -timer_del(timer->tm);
>> -timer_free(timer->tm);
>> -g_free(timer);
>> +*(timer->entry) = NULL;
>> +qemu_mutex_unlock(&timer->active_lock);
>> +qemu_announce_timer_destroy(timer);
>>  }
>>  }
>>  
>> @@ -242,6 +251,7 @@ AnnounceTimer 
>> *qemu_announce_timer_new(AnnounceParameters *params,
>>  {
>>  AnnounceTimer *timer = g_new(AnnounceTimer, 1);
>>  
>> +qemu_mutex_init(&timer->active_lock);
>>  timer->params = *params;
>>  timer->round = params->rounds;
>>  timer->type = type;
>> @@ -259,6 +269,21 @@ AnnounceTimer 
>> *qemu_announce_timer_create(AnnounceParameters *params,
>>  return timer;
>>  }
>>  
>> +static void qemu_announce_timer_update(AnnounceTimer *timer,
>> +   AnnounceParameters *params)
>> +{
>> +qemu_mutex_lock(&timer->active_lock);
>> +
>> +/* Update timer paramenter with any new values.
>> + * Reset the number of rounds to run, and stop the current timer.
>> + */
>> +timer->params = *params;
>> +timer->round = params->rounds;
>> +timer_del(timer->tm);
>> +
>> +qemu_mutex_unlock(&timer->active_lock);
>> +}
>> +
>>  void qemu_announce_self(AnnounceParameters *params, AnnounceType type)
>>  {
>>  AnnounceTimer *timer;
>> @@ -270,11 +295,7 @@ void qemu_announce_self(AnnounceParameters *params, 
>> AnnounceType type)
>>  announce_timers[type] = timer;
>>  timer->entry = &announce_timers[type];
>>  } else {
>> -/* For now, don't do anything.  If we want to reset the timer,
>> - * we'll need to add locking to each announce timer to prevent
>> - * races between timeout handling and a reset.
>> - */
>> -return;
>> +qemu_announce_timer_update(timer, params);
>>  }
>>  qemu_announce_self_once(timer);
>>  }
>> -- 
>> 2.7.4
>>
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 




Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer

2017-05-30 Thread Vlad Yasevich
On 05/30/2017 03:19 PM, Dr. David Alan Gilbert wrote:
> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>> Switch qemu_announce_self and virtio annoucements to use
>> the announcement timer framework.  This makes sure that both
>> timers use the same timeouts and number of annoucement attempts
>>
>> Based on work by Dr. David Alan Gilbert 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  hw/net/virtio-net.c| 28 
>>  include/hw/virtio/virtio-net.h |  3 +--
>>  include/migration/vmstate.h| 17 +++--
>>  include/sysemu/sysemu.h|  2 +-
>>  migration/migration.c  |  2 +-
>>  migration/savevm.c | 28 ++--
>>  6 files changed, 44 insertions(+), 36 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 7d091c9..1c65825 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -25,6 +25,7 @@
>>  #include "qapi/qmp/qjson.h"
>>  #include "qapi-event.h"
>>  #include "hw/virtio/virtio-access.h"
>> +#include "migration/migration.h"
>>  
>>  #define VIRTIO_NET_VM_VERSION11
>>  
>> @@ -115,7 +116,7 @@ static void virtio_net_announce_timer(void *opaque)
>>  VirtIONet *n = opaque;
>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>  
>> -n->announce_counter--;
>> +n->announce_timer->round--;
>>  n->status |= VIRTIO_NET_S_ANNOUNCE;
>>  virtio_notify_config(vdev);
>>  }
>> @@ -427,8 +428,8 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>  n->nobcast = 0;
>>  /* multiqueue is disabled by default */
>>  n->curr_queues = 1;
>> -timer_del(n->announce_timer);
>> -n->announce_counter = 0;
>> +timer_del(n->announce_timer->tm);
>> +n->announce_timer->round = 0;
>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>  
>>  /* Flush any MAC and VLAN filter table state */
>> @@ -872,10 +873,10 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>> uint8_t cmd,
>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> -if (n->announce_counter) {
>> -timer_mod(n->announce_timer,
>> +if (n->announce_timer->round) {
>> +timer_mod(n->announce_timer->tm,
>>qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> -  self_announce_delay(n->announce_counter));
>> +  self_announce_delay(n->announce_timer));
>>  }
>>  return VIRTIO_NET_OK;
>>  } else {
>> @@ -1615,8 +1616,8 @@ static int virtio_net_post_load_device(void *opaque, 
>> int version_id)
>>  
>>  if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>  virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS;
>> -timer_mod(n->announce_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>> +n->announce_timer->round = n->announce_timer->params.rounds;
>> +timer_mod(n->announce_timer->tm, 
>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
> 
> Do you think it's worth having a 
>   qemu_announce_timer_init(AnnounceTimer *, QEMU_CLOCK_*)
> that would initialise the round and any other values and 
> do the initial timer_mod ?

I had that initially, but ended up with just 1 user (virtio).  So removed.

I think I might put it back.  It's really a announce_timer_start() type thing.
May be it'll convert both places to use that api to make it cleaner.

> 
>>  }
>>  
>>  return 0;
>> @@ -1938,8 +1939,10 @@ static void virtio_net_device_realize(DeviceState 
>> *dev, Error **errp)
>>  qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
>>  memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
>>  n->status = VIRTIO_NET_S_LINK_UP;
>> -n->announce_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> - virtio_net_announce_timer, n);
>> +n->announce_timer = qemu_announce_timer_new(qemu_get_announce_params(),
>> +QEMU_CLOCK_VIRTUAL);
>> +n->announce_timer->tm = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> +  virtio_net_announce_timer, n);
>>  
>>  if (n->netclient_type) {
>>  /*
>> @@ -2001,8 +2004,9 @@ static void virtio_net_device_unrealize(DeviceState 
>> *dev, Error **errp)
>>  virtio_net_del_queue(n, i);
>>  }
>>  
>> -timer_del(n->announce_timer);
>> -timer_free(n->announce_timer);
>> +timer_del(n->announce_timer->tm);
>> +timer_free(n->announce_timer->tm);
>> +g_free(n->announce_timer);
> 
> Hmm, why is this all safe - I guess this is in the BQL?

Well, this is virito's explicit timer.  I didn't check, but it really no 
different then
destroying the virtio-specific QEMU timer created for the devices.

> 
>>  g_free(n->vqs);
>>  qemu_del_nic(n->nic);
>>  virtio_cleanup(vdev);
>> diff --git a/include/hw/virtio/vi

Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-30 Thread Vlad Yasevich
On 05/30/2017 10:14 AM, Daniel P. Berrange wrote:
> On Wed, May 24, 2017 at 02:05:22PM -0400, Vladislav Yasevich wrote:
>> Add a qmp command that can trigger guest announcements.
>>
>> Based on work of Germano Veit Michel 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  migration/savevm.c | 14 ++
>>  qapi-schema.json   | 19 +++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index a4097c9..b55ce6a 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params)
>>  qemu_announce_self_once(timer);
>>  }
>>  
>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>> +   Error **errp)
>> +{
>> +AnnounceParameters announce_params;
>> +
>> +memcpy(&announce_params, qemu_get_announce_params(),
>> +   sizeof(announce_params));
>> +
>> +if (has_params)
>> +qemu_set_announce_parameters(&announce_params, params);
>> +
>> +qemu_announce_self(&announce_params);
>> +}
> 
> I'm not a huge fan of the idea of setting announce parameters in a
> global namespace, via a previously issued separate command.
> 
> I realize this is already done for the 'migrate' command, but there
> it has been the cause of a number of bugs in libvirt mgmt of QEMU.
> At least with migrate it makes sense for some parameters which need
> to be tuned 'on the fly' after migrate already started. That doesn't
> apply to this command though.
> 
> So I tend to think it'd be nicer to just make the parameters mandatory
> for this command, so it is self-contanied does and not rely on previously
> set (or potentially unknown/indeterminate) global state.
> 

The reason I didn't do this is that I didn't want to require the user
to know or even think about the timeout values. I did want to provide
an option though for those who may want to tweak things.  In most circumstances,
the timeout values would never really be touched and we'll just use the
qemu defaults.  in rare circumstances, values might need to be changed
during migration.  Here, I can definitely see the need to give migration
it's own values.


> I'd also suggest that instad of adding an 'announce_set_parameters',
> the parameters for the automatic announce at end of migration should
> be set via the 'migrate_set_parameters' command.

I can do that.

> 
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index 2030087..126b09d 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -654,6 +654,25 @@
>>'returns': 'AnnounceParameters' }
>>  
>>  ##
>> +# @announce-self:
>> +#
>> +# Trigger generation of broadcast RARP frames to update network switches.
>> +# This can be useful when network bonds fail-over the active slave.
>> +#
>> +# Arguments: None.
> 
> "None", but it has some arguments listed right below

will fix


Thanks
-vlad

> 
>> +#
>> +# Example:
>> +#
>> +# -> { "execute": "announce-self"
>> +#  "arguments": { "announce-rounds": 5 } }
>> +# <- { "return": {} }
>> +#
>> +# Since: 2.10
>> +##
>> +{ 'command': 'announce-self',
>> +  'data' : {'*params': 'AnnounceParameters'} }
>> +
>> +##
>>  # @MigrationStats:
>>  #
>>  # Detailed migration status.
>> -- 
>> 2.7.4
>>
>>
> 
> Regards,
> Daniel
> 




Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-30 Thread Vlad Yasevich
On 05/30/2017 10:24 AM, Juan Quintela wrote:
> Vlad Yasevich  wrote:
>> On 05/30/2017 06:11 AM, Juan Quintela wrote:
>>> Vladislav Yasevich  wrote:
>>>> Add a qmp command that can trigger guest announcements.
>>>>
>>>> Based on work of Germano Veit Michel 
>>>>
>>>> Signed-off-by: Vladislav Yasevich 
>>>> ---
>>>>  migration/savevm.c | 14 ++
>>>>  qapi-schema.json   | 19 +++
>>>>  2 files changed, 33 insertions(+)
>>>>
>>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>>> index a4097c9..b55ce6a 100644
>>>> --- a/migration/savevm.c
>>>> +++ b/migration/savevm.c
>>>> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params)
>>>>  qemu_announce_self_once(timer);
>>>>  }
>>>>  
>>>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>>>> +   Error **errp)
>>>> +{
>>>> +AnnounceParameters announce_params;
>>>> +
>>>> +memcpy(&announce_params, qemu_get_announce_params(),
>>>> +   sizeof(announce_params));
>>>> +
>>>> +if (has_params)
>>>> +qemu_set_announce_parameters(&announce_params, params);
>>>> +
>>>> +qemu_announce_self(&announce_params);
>>>
>>> Are I missreading qemu_annouce_self()?
>>> My reading is that it passes announce_params to a timer (i.e. async
>>> function), but here announce_params is a local variable here, no?
>>>
>>
>> The AnnounceTimer holds a copy since each timer may have it's own values.
> 
> 
> 
>> AnnounceTimer *qemu_announce_timer_new(AnnounceParameters *params,
>>QEMUClockType type)
>> {
>>AnnounceTimer *timer = g_new(AnnounceTimer, 1);
>>
>>timer->params = *params;
> 
> I have to remomember that C has learn how to copy structures long ago.
> 
> 
> 
> I was missing the "*" on my previous reading, sorry for the noise.

that actually changed, since as Eric Pointed out, shallow copies shouldn't
be used.  In the v2 code, this uses qemu_set_announce_paramters(), but the copy
essentially remains.

-vlad
> 
>>timer->round = params->rounds;
>>timer->type = type;
>>
>>return timer;
>> }
> 
> 




Re: [Qemu-devel] [PATCH 09/12] hmp: add announce paraters info/set

2017-05-30 Thread Vlad Yasevich
On 05/30/2017 06:18 AM, Juan Quintela wrote:
> Vladislav Yasevich  wrote:
>> Add HMP command to control and read annoucment parameters.
>>
>> Signed-off-by: Vladislav Yasevich 
> 
> Reviewed-by: Juan Quintela 
> 
> 
>> + cleanup:
>> +if (err) {
>> +error_report_err(err);
>> +}
>> +}
> 
> 
> My understanding is that this the "cool way" to spell this nowadays is:
> 
> hmp_handle_error(mon, &err);
> 
> Just in case you want to change it.
> 

Thanks. I must have used an older function as base.  I'll check it out.

-vlad



Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer

2017-05-30 Thread Vlad Yasevich
On 05/30/2017 06:10 AM, Juan Quintela wrote:
> Vladislav Yasevich  wrote:
>> Switch qemu_announce_self and virtio annoucements to use
>> the announcement timer framework.  This makes sure that both
>> timers use the same timeouts and number of annoucement attempts
>>
>> Based on work by Dr. David Alan Gilbert 
>>
>> Signed-off-by: Vladislav Yasevich 
> 
>>  static void qemu_announce_self_once(void *opaque)
>>  {
>> -static int count = SELF_ANNOUNCE_ROUNDS;
>> -QEMUTimer *timer = *(QEMUTimer **)opaque;
>> +AnnounceTimer *timer = (AnnounceTimer *)opaque;
> 
> Cast from void * is never needed.
> 

OK

Thanks
-vlad



Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-30 Thread Vlad Yasevich
On 05/30/2017 06:11 AM, Juan Quintela wrote:
> Vladislav Yasevich  wrote:
>> Add a qmp command that can trigger guest announcements.
>>
>> Based on work of Germano Veit Michel 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  migration/savevm.c | 14 ++
>>  qapi-schema.json   | 19 +++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index a4097c9..b55ce6a 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -265,6 +265,20 @@ void qemu_announce_self(AnnounceParameters *params)
>>  qemu_announce_self_once(timer);
>>  }
>>  
>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>> +   Error **errp)
>> +{
>> +AnnounceParameters announce_params;
>> +
>> +memcpy(&announce_params, qemu_get_announce_params(),
>> +   sizeof(announce_params));
>> +
>> +if (has_params)
>> +qemu_set_announce_parameters(&announce_params, params);
>> +
>> +qemu_announce_self(&announce_params);
> 
> Are I missreading qemu_annouce_self()?
> My reading is that it passes announce_params to a timer (i.e. async
> function), but here announce_params is a local variable here, no?
> 

The AnnounceTimer holds a copy since each timer may have it's own values.

Thanks
-vlad



Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-30 Thread Vlad Yasevich
On 05/30/2017 05:58 AM, Juan Quintela wrote:
> Vladislav Yasevich  wrote:
>> Add parameters that control RARP/GARP announcement timeouts.
>> The parameters structure is added to the QAPI and a qmp command
>> is added to set/get the parameter data.
>>
>> Based on work by "Dr. David Alan Gilbert" 
>>
>> Signed-off-by: Vladislav Yasevich 
> 
> Hi
> 
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index f5e8194..cee2837 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -78,6 +78,104 @@ static struct mig_cmd_args {
>>  [MIG_CMD_MAX]  = { .len = -1, .name = "MAX" },
>>  };
>>  
> 
> Once that you are touching this, shouldn't it be better to be in
> net/?
> They have nothing to do with migration really.
> 

Yeah, I considered this, but could really find a good slot for them.
I can either put them into net.c directly or pull them out into their
own little file.

> 
>> +#define QEMU_ANNOUNCE_INITIAL50
>> +#define QEMU_ANNOUNCE_MAX   550
>> +#define QEMU_ANNOUNCE_ROUNDS  5
>> +#define QEMU_ANNOUNCE_STEP  100
>> +
>> +AnnounceParameters *qemu_get_announce_params(void)
>> +{
>> +static AnnounceParameters announce = {
>> +.initial = QEMU_ANNOUNCE_INITIAL,
>> +.max = QEMU_ANNOUNCE_MAX,
>> +.rounds = QEMU_ANNOUNCE_ROUNDS,
>> +.step = QEMU_ANNOUNCE_STEP,
>> +};
>> +
>> +return &announce;
>> +}
>> +
>> +void qemu_fill_announce_parameters(AnnounceParameters **to,
>> +   AnnounceParameters *from)
>> +{
>> +AnnounceParameters *params;
>> +
>> +params = *to = g_malloc0(sizeof(*params));
>> +params->has_initial = true;
>> +params->initial = from->initial;
>> +params->has_max = true;
>> +params->max = from->max;
>> +params->has_rounds = true;
>> +params->rounds = from->rounds;
>> +params->has_step = true;
>> +params->step = from->step;
>> +}
>> +
>> +bool qemu_validate_announce_parameters(AnnounceParameters *params, Error 
>> **errp)
>> +{
>> +if (params->has_initial &&
>> +(params->initial < 1 || params->initial > 10)) {
> 
> This is independent of this patch, but we really need a macro:
>   CHECK(field, mininum, maximum)
> 
> We repeat this left and right.
> 
>> +void qemu_set_announce_parameters(AnnounceParameters *announce_params,
>> +  AnnounceParameters *params)
> 
>> +void qmp_announce_set_parameters(AnnounceParameters *params, Error **errp)
> 
> Really similar functions name.  I assume by know that the 1st function
> is used somewhere else in the series.
> 

Yes, the first function is going to be used later.

Thanks
-vlad



Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-26 Thread Vlad Yasevich
On 05/26/2017 09:16 AM, Eric Blake wrote:
> On 05/24/2017 01:05 PM, Vladislav Yasevich wrote:
>> Add a qmp command that can trigger guest announcements.
>>
>> Based on work of Germano Veit Michel 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  migration/savevm.c | 14 ++
>>  qapi-schema.json   | 19 +++
>>  2 files changed, 33 insertions(+)
>>
> 
>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>> +   Error **errp)
>> +{
>> +AnnounceParameters announce_params;
>> +
>> +memcpy(&announce_params, qemu_get_announce_params(),
>> +   sizeof(announce_params));
> 
> Shallow copies of a QAPI type happen to work when the type is all scalar
> (as AnnounceParameters currently is), but it's more robust to use
> QAPI_CLONE() or QAPI_CLONE_MEMBERS() so that any future non-scalar
> additions to the parameters type will be correctly copied without
> introducing memory leaks or double frees.
> 
> Even this might be better:
> AnnounceParameters announce_params = { 0 };
> qemu_set_announce_parameters(&announce_params, qemu_get_announce_params());
> 

Ok.

>> +
>> +if (has_params)
>> +qemu_set_announce_parameters(&announce_params, params);
>> +
>> +qemu_announce_self(&announce_params);
>> +}
> The QMP changes look okay.  Do you have a testsuite covering the new QMP
> command somewhere in the series?
> 

There is basic test in patch 12.

Thanks
-vlad



Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-26 Thread Vlad Yasevich
On 05/26/2017 09:08 AM, Eric Blake wrote:
> On 05/24/2017 01:05 PM, Vladislav Yasevich wrote:
>> Add parameters that control RARP/GARP announcement timeouts.
>> The parameters structure is added to the QAPI and a qmp command
>> is added to set/get the parameter data.
>>
>> Based on work by "Dr. David Alan Gilbert" 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
> 
> Just an interface review for now:
> 
>> +++ b/qapi-schema.json
>> @@ -569,6 +569,90 @@
>>  ##
>>  { 'command': 'query-events', 'returns': ['EventInfo'] }
>>  
>> +
>> +##
>> +# @AnnounceParameter:
>> +#
>> +# @AnnounceParameter enumeration
>> +#
>> +# @initial: Initial delay (in ms) before sending the first GARP/RARP
>> +#   announcement
>> +#
>> +# @max: Maximum delay (in ms) to between GARP/RARP announcemnt packets
> 
> s/announcemnt/announcement/
> 
>> +#
>> +# @rounds: Number of self-announcement attempts
>> +#
>> +# @step: Delay increate (in ms) after each self-announcment attempt
> 
> s/increate/increase/
> s/announcment/announcement/
> 
>> +#
>> +# Since: 2.10
>> +##
>> +{ 'enum' : 'AnnounceParameter',
>> +  'data' : [ 'initial', 'max', 'rounds', 'step' ] }
> 
> Why are we creating an enum here?  Without reading further, it looks
> like you plan on using the enum to delineate members of a union? But
> that feels like it will be overly complicated.  A struct should be
> sufficient (each parameter being an optional member of the struct, where
> you can supply as many or as few on input, but all are reported on output).
> 

I end up using them for the HMP interface.  If it's better, I can move this
definition to the HMP patch.

Thanks
-vlad



Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-26 Thread Vlad Yasevich
On 05/26/2017 12:03 AM, Jason Wang wrote:
> 
> On 2017年05月25日 02:05, Vladislav Yasevich wrote:
>> Add parameters that control RARP/GARP announcement timeouts.
>> The parameters structure is added to the QAPI and a qmp command
>> is added to set/get the parameter data.
>>
>> Based on work by "Dr. David Alan Gilbert" 
>>
>> Signed-off-by: Vladislav Yasevich 
> 
> I think it's better to explain e.g under which condition do we need to tweak 
> such parameters.
> 
> Thanks
> 

OK.  I'll rip off what dgilbert wrote in his original series for the 
description.

Dave, if you have any text to add as to why migration might need to tweak this, 
I'd
appreciate it.

Thanks
-vlad



Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer

2017-05-26 Thread Vlad Yasevich
On 05/26/2017 12:16 AM, Jason Wang wrote:
> 
> 
> On 2017年05月25日 02:05, Vladislav Yasevich wrote:
>> Switch qemu_announce_self and virtio annoucements to use
>> the announcement timer framework.  This makes sure that both
>> timers use the same timeouts and number of annoucement attempts
>>
>> Based on work by Dr. David Alan Gilbert 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>   hw/net/virtio-net.c| 28 
>>   include/hw/virtio/virtio-net.h |  3 +--
>>   include/migration/vmstate.h| 17 +++--
>>   include/sysemu/sysemu.h|  2 +-
>>   migration/migration.c  |  2 +-
>>   migration/savevm.c | 28 ++--
>>   6 files changed, 44 insertions(+), 36 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 7d091c9..1c65825 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -25,6 +25,7 @@
>>   #include "qapi/qmp/qjson.h"
>>   #include "qapi-event.h"
>>   #include "hw/virtio/virtio-access.h"
>> +#include "migration/migration.h"
>> #define VIRTIO_NET_VM_VERSION11
>>   @@ -115,7 +116,7 @@ static void virtio_net_announce_timer(void *opaque)
>>   VirtIONet *n = opaque;
>>   VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>   -n->announce_counter--;
>> +n->announce_timer->round--;
>>   n->status |= VIRTIO_NET_S_ANNOUNCE;
>>   virtio_notify_config(vdev);
>>   }
>> @@ -427,8 +428,8 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>   n->nobcast = 0;
>>   /* multiqueue is disabled by default */
>>   n->curr_queues = 1;
>> -timer_del(n->announce_timer);
>> -n->announce_counter = 0;
>> +timer_del(n->announce_timer->tm);
>> +n->announce_timer->round = 0;
>>   n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> /* Flush any MAC and VLAN filter table state */
>> @@ -872,10 +873,10 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>> uint8_t cmd,
>>   if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>   n->status & VIRTIO_NET_S_ANNOUNCE) {
>>   n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> -if (n->announce_counter) {
>> -timer_mod(n->announce_timer,
>> +if (n->announce_timer->round) {
>> +timer_mod(n->announce_timer->tm,
>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> -  self_announce_delay(n->announce_counter));
>> +  self_announce_delay(n->announce_timer));
>>   }
>>   return VIRTIO_NET_OK;
>>   } else {
>> @@ -1615,8 +1616,8 @@ static int virtio_net_post_load_device(void *opaque, 
>> int version_id)
>> if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>   virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS;
>> -timer_mod(n->announce_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>> +n->announce_timer->round = n->announce_timer->params.rounds;
>> +timer_mod(n->announce_timer->tm, 
>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>>   }
>> return 0;
>> @@ -1938,8 +1939,10 @@ static void virtio_net_device_realize(DeviceState 
>> *dev, Error
>> **errp)
>>   qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
>>   memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
>>   n->status = VIRTIO_NET_S_LINK_UP;
>> -n->announce_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> - virtio_net_announce_timer, n);
>> +n->announce_timer = qemu_announce_timer_new(qemu_get_announce_params(),
>> +QEMU_CLOCK_VIRTUAL);
>> +n->announce_timer->tm = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> +  virtio_net_announce_timer, n);
>> if (n->netclient_type) {
>>   /*
>> @@ -2001,8 +2004,9 @@ static void virtio_net_device_unrealize(DeviceState 
>> *dev, Error
>> **errp)
>>   virtio_net_del_queue(n, i);
>>   }
>>   -timer_del(n->announce_timer);
>> -timer_free(n->announce_timer);
>> +timer_del(n->announce_timer->tm);
>> +timer_free(n->announce_timer->tm);
>> +g_free(n->announce_timer);
>>   g_free(n->vqs);
>>   qemu_del_nic(n->nic);
>>   virtio_cleanup(vdev);
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 1eec9a2..51dd4c3 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -94,8 +94,7 @@ typedef struct VirtIONet {
>>   char *netclient_name;
>>   char *netclient_type;
>>   uint64_t curr_guest_offloads;
>> -QEMUTimer *announce_timer;
>> -int announce_counter;
>> +AnnounceTimer *announce_timer;
>>   bool needs_vnet_hdr_swap;
>>   } VirtIONet;
>>   diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index a6bf84d..f8aed9b 100644
>> --- a/include/migration/vmstate.h
>> +++ b/inclu

Re: [Qemu-devel] [PATCH v2] virtio_net: Bypass backends for MTU feature negotiation

2017-05-23 Thread Vlad Yasevich
On 05/23/2017 08:31 AM, Maxime Coquelin wrote:
> This patch adds a new internal "x-mtu-bypass-backend" property
> to bypass backends for MTU feature negotiation.
> 
> When this property is set, the MTU feature is negotiated as soon
> as supported by the guest and a MTU value is set via the host_mtu
> parameter. In case the backend advertises the feature (e.g. DPDK's
> vhost-user backend), the feature negotiation is propagated down to
> the backend.
> 
> When this property is not set, the backend has to support the MTU
> feature for its negotiation to succeed.
> 
> For compatibility purpose, this property is disabled for machine
> types v2.9 and older.
> 
> Cc: Aaron Conole 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Maxime Coquelin 

Reviewed-by: Vlad Yasevich 

-vlad

> ---
> 
> Tests performed:
> - Vhost-net kernel backendi, host_mtu=2000:
>  * default machine type: guest MTU = 2000
>  * pc-q35-2.9 machine type: Guest MTU = 1500
> 
> - Vhost-net user backend, host_mtu=2000:
>  * DPDK v17.05 (MTU feature supported on backend side)
>   - default machine type: guest MTU = 2000
>   - pc-q35-2.9 machine type: guest MTU = 2000)
>  * DPDK v16.11 (MTU feature not supported on backend side)
>   - default machine type: guest MTU = 2000
>   - pc-q35-2.9 machine type: guest MTU = 1500)
> 
>  hw/net/virtio-net.c| 17 -
>  include/hw/compat.h|  4 
>  include/hw/virtio/virtio-net.h |  1 +
>  include/hw/virtio/virtio.h |  1 +
>  4 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 7d091c9..39c336e 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -589,7 +589,15 @@ static uint64_t virtio_net_get_features(VirtIODevice 
> *vdev, uint64_t features,
>  if (!get_vhost_net(nc->peer)) {
>  return features;
>  }
> -return vhost_net_get_features(get_vhost_net(nc->peer), features);
> +features = vhost_net_get_features(get_vhost_net(nc->peer), features);
> +vdev->backend_features = features;
> +
> +if (n->mtu_bypass_backend &&
> +(n->host_features & 1ULL << VIRTIO_NET_F_MTU)) {
> +features |= (1ULL << VIRTIO_NET_F_MTU);
> +}
> +
> +return features;
>  }
>  
>  static uint64_t virtio_net_bad_features(VirtIODevice *vdev)
> @@ -640,6 +648,11 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
> uint64_t features)
>  VirtIONet *n = VIRTIO_NET(vdev);
>  int i;
>  
> +if (n->mtu_bypass_backend &&
> +!virtio_has_feature(vdev->backend_features, VIRTIO_NET_F_MTU)) {
> +features &= ~(1ULL << VIRTIO_NET_F_MTU);
> +}
> +
>  virtio_net_set_multiqueue(n,
>virtio_has_feature(features, VIRTIO_NET_F_MQ));
>  
> @@ -2090,6 +2103,8 @@ static Property virtio_net_properties[] = {
>  DEFINE_PROP_UINT16("rx_queue_size", VirtIONet, net_conf.rx_queue_size,
> VIRTIO_NET_RX_QUEUE_DEFAULT_SIZE),
>  DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0),
> +DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend,
> + true),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index 55b1765..181f450 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -6,6 +6,10 @@
>  .driver   = "pci-bridge",\
>  .property = "shpc",\
>  .value= "off",\
> +},{\
> +.driver   = "virtio-net-device",\
> +.property = "x-mtu-bypass-backend",\
> +.value= "off",\
>  },
>  
>  #define HW_COMPAT_2_8 \
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 1eec9a2..602b486 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -97,6 +97,7 @@ typedef struct VirtIONet {
>  QEMUTimer *announce_timer;
>  int announce_counter;
>  bool needs_vnet_hdr_swap;
> +bool mtu_bypass_backend;
>  } VirtIONet;
>  
>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7b6edba..80c45c3 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -79,6 +79,7 @@ struct VirtIODevice
>  uint16_t queue_sel;
>  uint64_t guest_features;
>  uint64_t host_features;
> +uint64_t backend_features;
>  size_t config_len;
>  void *config;
>  uint16_t config_vector;
> 




Re: [Qemu-devel] [PATCH V2] migration: expose qemu_announce_self() via qmp

2017-05-12 Thread Vlad Yasevich
On 05/12/2017 03:24 PM, Dr. David Alan Gilbert wrote:
> * Vlad Yasevich (vyase...@redhat.com) wrote:
>> On 02/20/2017 07:16 PM, Germano Veit Michel wrote:
>>> qemu_announce_self() is triggered by qemu at the end of migrations
>>> to update the network regarding the path to the guest l2addr.
>>>
>>> however it is also useful when there is a network change such as
>>> an active bond slave swap. Essentially, it's the same as a migration
>>> from a network perspective - the guest moves to a different point
>>> in the network topology.
>>>
>>> this exposes the function via qmp.
>>>
>>> Signed-off-by: Germano Veit Michel 
>>> ---
>>>  include/migration/vmstate.h |  5 +
>>>  migration/savevm.c  | 30 +++---
>>>  qapi-schema.json| 18 ++
>>>  3 files changed, 42 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>>> index 63e7b02..a08715c 100644
>>> --- a/include/migration/vmstate.h
>>> +++ b/include/migration/vmstate.h
>>> @@ -1042,6 +1042,11 @@ int64_t self_announce_delay(int round)
>>>  return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>>>  }
>>>
>>> +struct AnnounceRound {
>>> +QEMUTimer *timer;
>>> +int count;
>>> +};
>>> +
>>>  void dump_vmstate_json_to_file(FILE *out_fp);
>>>
>>>  #endif
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index 5ecd264..44e196b 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -118,29 +118,37 @@ static void qemu_announce_self_iter(NICState
>>> *nic, void *opaque)
>>>  qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
>>>  }
>>>
>>> -
>>>  static void qemu_announce_self_once(void *opaque)
>>>  {
>>> -static int count = SELF_ANNOUNCE_ROUNDS;
>>> -QEMUTimer *timer = *(QEMUTimer **)opaque;
>>> +struct AnnounceRound *round = opaque;
>>>
>>>  qemu_foreach_nic(qemu_announce_self_iter, NULL);
>>>
>>> -if (--count) {
>>> +round->count--;
>>> +if (round->count) {
>>>  /* delay 50ms, 150ms, 250ms, ... */
>>> -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>>> -  self_announce_delay(count));
>>> +timer_mod(round->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
>>> +  self_announce_delay(round->count));
>>>  } else {
>>> -timer_del(timer);
>>> -timer_free(timer);
>>> +timer_del(round->timer);
>>> +timer_free(round->timer);
>>> +g_free(round);
>>>  }
>>>  }
>>>
>>>  void qemu_announce_self(void)
>>>  {
>>> -static QEMUTimer *timer;
>>> -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, 
>>> &timer);
>>> -qemu_announce_self_once(&timer);
>>> +struct AnnounceRound *round = g_malloc(sizeof(struct AnnounceRound));
>>> +if (!round)
>>> +return;
>>> +round->count = SELF_ANNOUNCE_ROUNDS;
>>> +round->timer = timer_new_ms(QEMU_CLOCK_REALTIME,
>>> qemu_announce_self_once, round);
>>> +qemu_announce_self_once(round);
>>> +}
>>
>> So, I've been looking and this code and have been playing with it and with 
>> David's
>> patches and my patches to include virtio self announcements as well.  What 
>> I've discovered
>> is what I think is a possible packet amplification issue here.
>>
>> This creates a new timer every time we do do a announce_self.  With just 
>> migration,
>> this is not an issue since you only migrate once at a time, so there is only 
>> 1 timer.
>> With exposing this as an API, a user can potentially call it in a tight loop
>> and now you have a ton of timers being created.  Add in David's patches 
>> allowing timeouts
>> and retries to be configurable, and you may now have a ton of long lived 
>> timers.
>> Add in the patches I am working on to let virtio do self announcements too 
>> (to really fix
>> bonding issues), and now you add in a possibility of a lot of packets being 
>> sent for
>> each timeout (RARP, GARP, NA, IGMPv4 Reports, IGMPv6 Reports [even worse if 
>> MLD1

Re: [Qemu-devel] [PATCH V2] migration: expose qemu_announce_self() via qmp

2017-05-11 Thread Vlad Yasevich
On 02/20/2017 07:16 PM, Germano Veit Michel wrote:
> qemu_announce_self() is triggered by qemu at the end of migrations
> to update the network regarding the path to the guest l2addr.
> 
> however it is also useful when there is a network change such as
> an active bond slave swap. Essentially, it's the same as a migration
> from a network perspective - the guest moves to a different point
> in the network topology.
> 
> this exposes the function via qmp.
> 
> Signed-off-by: Germano Veit Michel 
> ---
>  include/migration/vmstate.h |  5 +
>  migration/savevm.c  | 30 +++---
>  qapi-schema.json| 18 ++
>  3 files changed, 42 insertions(+), 11 deletions(-)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 63e7b02..a08715c 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -1042,6 +1042,11 @@ int64_t self_announce_delay(int round)
>  return 50 + (SELF_ANNOUNCE_ROUNDS - round - 1) * 100;
>  }
> 
> +struct AnnounceRound {
> +QEMUTimer *timer;
> +int count;
> +};
> +
>  void dump_vmstate_json_to_file(FILE *out_fp);
> 
>  #endif
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 5ecd264..44e196b 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -118,29 +118,37 @@ static void qemu_announce_self_iter(NICState
> *nic, void *opaque)
>  qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
>  }
> 
> -
>  static void qemu_announce_self_once(void *opaque)
>  {
> -static int count = SELF_ANNOUNCE_ROUNDS;
> -QEMUTimer *timer = *(QEMUTimer **)opaque;
> +struct AnnounceRound *round = opaque;
> 
>  qemu_foreach_nic(qemu_announce_self_iter, NULL);
> 
> -if (--count) {
> +round->count--;
> +if (round->count) {
>  /* delay 50ms, 150ms, 250ms, ... */
> -timer_mod(timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> -  self_announce_delay(count));
> +timer_mod(round->timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) +
> +  self_announce_delay(round->count));
>  } else {
> -timer_del(timer);
> -timer_free(timer);
> +timer_del(round->timer);
> +timer_free(round->timer);
> +g_free(round);
>  }
>  }
> 
>  void qemu_announce_self(void)
>  {
> -static QEMUTimer *timer;
> -timer = timer_new_ms(QEMU_CLOCK_REALTIME, qemu_announce_self_once, 
> &timer);
> -qemu_announce_self_once(&timer);
> +struct AnnounceRound *round = g_malloc(sizeof(struct AnnounceRound));
> +if (!round)
> +return;
> +round->count = SELF_ANNOUNCE_ROUNDS;
> +round->timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> qemu_announce_self_once, round);
> +qemu_announce_self_once(round);
> +}

So, I've been looking and this code and have been playing with it and with 
David's
patches and my patches to include virtio self announcements as well.  What I've 
discovered
is what I think is a possible packet amplification issue here.

This creates a new timer every time we do do a announce_self.  With just 
migration,
this is not an issue since you only migrate once at a time, so there is only 1 
timer.
With exposing this as an API, a user can potentially call it in a tight loop
and now you have a ton of timers being created.  Add in David's patches 
allowing timeouts
and retries to be configurable, and you may now have a ton of long lived timers.
Add in the patches I am working on to let virtio do self announcements too (to 
really fix
bonding issues), and now you add in a possibility of a lot of packets being 
sent for
each timeout (RARP, GARP, NA, IGMPv4 Reports, IGMPv6 Reports [even worse if 
MLD1 is used]).

As you can see, this can get rather ugly...

I think we need timer user here.  Migration and QMP being two to begin with.  
Each
one would get a single timer to play with.  If a given user already has a timer 
running,
we could return an error or just not do anything.

-vlad

> +
> +void qmp_announce_self(Error **errp)
> +{
> +qemu_announce_self();
>  }
> 
>  /***/
> diff --git a/qapi-schema.json b/qapi-schema.json
> index baa0d26..0d9bffd 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -6080,3 +6080,21 @@
>  #
>  ##
>  { 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] }
> +
> +##
> +# @announce-self:
> +#
> +# Trigger generation of broadcast RARP frames to update network switches.
> +# This can be useful when network bonds fail-over the active slave.
> +#
> +# Arguments: None.
> +#
> +# Example:
> +#
> +# -> { "execute": "announce-self" }
> +# <- { "return": {} }
> +#
> +# Since: 2.9
> +##
> +{ 'command': 'announce-self' }
> +
> 




Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-05-05 Thread Vlad Yasevich
On 05/02/2017 06:41 AM, Dr. David Alan Gilbert wrote:
> * Vlad Yasevich (vyase...@redhat.com) wrote:
>> On 03/30/2017 11:26 AM, Dr. David Alan Gilbert wrote:
>>> * Vlad Yasevich (vyase...@redhat.com) wrote:
>>>> On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote:
>>>>> * Vlad Yasevich (vyase...@redhat.com) wrote:
>>>>>> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote:
>>>>>>> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>>>>>>>> virtio announcements seem to run on its own timer with it's own
>>>>>>>> repetition loop and are invoked separately from qemu_announce_self.
>>>>>>>> This patch consolidates announcements into a single location and
>>>>>>>> allows other nics to provide it's own announcement capabilities, if
>>>>>>>> necessary.
>>>>>>>>
>>>>>>>> This is also usefull in support of exposing qemu_announce_self through
>>>>>>>> qmp/hmp.
>>>>>>>>
>>>>>>>> Signed-off-by: Vladislav Yasevich 
>>>>>>>> ---
>>>>>>>>  hw/net/virtio-net.c| 30 --
>>>>>>>>  include/hw/virtio/virtio-net.h |  2 --
>>>>>>>>  include/net/net.h  |  2 ++
>>>>>>>>  migration/savevm.c |  6 ++
>>>>>>>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>>>> index c321680..6e9ce4f 100644
>>>>>>>> --- a/hw/net/virtio-net.c
>>>>>>>> +++ b/hw/net/virtio-net.c
>>>>>>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, 
>>>>>>>> uint8_t status)
>>>>>>>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> -static void virtio_net_announce_timer(void *opaque)
>>>>>>>> +static void virtio_net_announce(NetClientState *nc)
>>>>>>>>  {
>>>>>>>> -VirtIONet *n = opaque;
>>>>>>>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>>>>>>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>>>>  
>>>>>>>> -n->announce_counter--;
>>>>>>>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> -virtio_notify_config(vdev);
>>>>>>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>>>>>>>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> +virtio_notify_config(vdev);
>>>>>>>> +}
>>>>>>>>  }
>>>>>>>>  
>>>>>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>>>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>>>>>  n->nobcast = 0;
>>>>>>>>  /* multiqueue is disabled by default */
>>>>>>>>  n->curr_queues = 1;
>>>>>>>> -timer_del(n->announce_timer);
>>>>>>>> -n->announce_counter = 0;
>>>>>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>>>  
>>>>>>>>  /* Flush any MAC and VLAN filter table state */
>>>>>>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet 
>>>>>>>> *n, uint8_t cmd,
>>>>>>>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>>>>>>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>>>>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>>> -if (n->announce_counter) {
>>>>>>>> -timer_mod(n->announce_timer,
>>>>>>>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>>>>>>> -  self_announce_delay(n->announce_counter));
>>>>

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-03-30 Thread Vlad Yasevich
On 03/30/2017 11:26 AM, Dr. David Alan Gilbert wrote:
> * Vlad Yasevich (vyase...@redhat.com) wrote:
>> On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote:
>>> * Vlad Yasevich (vyase...@redhat.com) wrote:
>>>> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote:
>>>>> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>>>>>> virtio announcements seem to run on its own timer with it's own
>>>>>> repetition loop and are invoked separately from qemu_announce_self.
>>>>>> This patch consolidates announcements into a single location and
>>>>>> allows other nics to provide it's own announcement capabilities, if
>>>>>> necessary.
>>>>>>
>>>>>> This is also usefull in support of exposing qemu_announce_self through
>>>>>> qmp/hmp.
>>>>>>
>>>>>> Signed-off-by: Vladislav Yasevich 
>>>>>> ---
>>>>>>  hw/net/virtio-net.c| 30 --
>>>>>>  include/hw/virtio/virtio-net.h |  2 --
>>>>>>  include/net/net.h  |  2 ++
>>>>>>  migration/savevm.c |  6 ++
>>>>>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>>> index c321680..6e9ce4f 100644
>>>>>> --- a/hw/net/virtio-net.c
>>>>>> +++ b/hw/net/virtio-net.c
>>>>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, 
>>>>>> uint8_t status)
>>>>>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>>>  }
>>>>>>  
>>>>>> -static void virtio_net_announce_timer(void *opaque)
>>>>>> +static void virtio_net_announce(NetClientState *nc)
>>>>>>  {
>>>>>> -VirtIONet *n = opaque;
>>>>>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>>>>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>>>  
>>>>>> -n->announce_counter--;
>>>>>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>> -virtio_notify_config(vdev);
>>>>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>>>>>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>>>> +virtio_notify_config(vdev);
>>>>>> +}
>>>>>>  }
>>>>>>  
>>>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>>>  n->nobcast = 0;
>>>>>>  /* multiqueue is disabled by default */
>>>>>>  n->curr_queues = 1;
>>>>>> -timer_del(n->announce_timer);
>>>>>> -n->announce_counter = 0;
>>>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>>  
>>>>>>  /* Flush any MAC and VLAN filter table state */
>>>>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>>>>>> uint8_t cmd,
>>>>>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>>>>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>>> -if (n->announce_counter) {
>>>>>> -timer_mod(n->announce_timer,
>>>>>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>>>>> -  self_announce_delay(n->announce_counter));
>>>>>> -}
>>>>>>  return VIRTIO_NET_OK;
>>>>>>  } else {
>>>>>>  return VIRTIO_NET_ERR;
>>>>>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void 
>>>>>> *opaque, int version_id)
>>>>>>  qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>>>  }
>>>>>>  
>>>>>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>>>> -virtio_vdev_has_feature(vdev, VIRTIO_NET_F

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-03-30 Thread Vlad Yasevich
On 03/30/2017 10:53 AM, Dr. David Alan Gilbert wrote:
> * Vlad Yasevich (vyase...@redhat.com) wrote:
>> On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote:
>>> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>>>> virtio announcements seem to run on its own timer with it's own
>>>> repetition loop and are invoked separately from qemu_announce_self.
>>>> This patch consolidates announcements into a single location and
>>>> allows other nics to provide it's own announcement capabilities, if
>>>> necessary.
>>>>
>>>> This is also usefull in support of exposing qemu_announce_self through
>>>> qmp/hmp.
>>>>
>>>> Signed-off-by: Vladislav Yasevich 
>>>> ---
>>>>  hw/net/virtio-net.c| 30 --
>>>>  include/hw/virtio/virtio-net.h |  2 --
>>>>  include/net/net.h  |  2 ++
>>>>  migration/savevm.c |  6 ++
>>>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index c321680..6e9ce4f 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, uint8_t 
>>>> status)
>>>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>>>  }
>>>>  
>>>> -static void virtio_net_announce_timer(void *opaque)
>>>> +static void virtio_net_announce(NetClientState *nc)
>>>>  {
>>>> -VirtIONet *n = opaque;
>>>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>>>  
>>>> -n->announce_counter--;
>>>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>> -virtio_notify_config(vdev);
>>>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>>>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>>>> +virtio_notify_config(vdev);
>>>> +}
>>>>  }
>>>>  
>>>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>>>  n->nobcast = 0;
>>>>  /* multiqueue is disabled by default */
>>>>  n->curr_queues = 1;
>>>> -timer_del(n->announce_timer);
>>>> -n->announce_counter = 0;
>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>>  
>>>>  /* Flush any MAC and VLAN filter table state */
>>>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>>>> uint8_t cmd,
>>>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>>> -if (n->announce_counter) {
>>>> -timer_mod(n->announce_timer,
>>>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>>>> -  self_announce_delay(n->announce_counter));
>>>> -}
>>>>  return VIRTIO_NET_OK;
>>>>  } else {
>>>>  return VIRTIO_NET_ERR;
>>>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void 
>>>> *opaque, int version_id)
>>>>  qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>>>  }
>>>>  
>>>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>>> -virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>>>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS;
>>>> -timer_mod(n->announce_timer, 
>>>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>>>> -}
>>>> -
>>>>  return 0;
>>>>  }
>>>>  
>>>> @@ -1829,6 +1818,7 @@ static NetClientInfo net_virtio_info = {
>>>>  .receive = virtio_net_receive,
>>>>  .link_status_changed = virtio_net_set_link_status,
>>>>  .query_rx_filter = virtio_net_query_rxfilter,
>>>> +.announce = virtio_net_announce,
>>>>  };
>>>>  
>>>>  stat

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-03-30 Thread Vlad Yasevich
On 03/29/2017 04:35 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 29, 2017 at 04:19:50PM -0400, Vladislav Yasevich wrote:
>> virtio announcements seem to run on its own timer with it's own
>> repetition loop and are invoked separately from qemu_announce_self.
>> This patch consolidates announcements into a single location and
>> allows other nics to provide it's own
> 
> their own
> 
>> announcement capabilities, if
>> necessary.
>>
>> This is also usefull
> 
> or useful?
> 
>> in support of exposing qemu_announce_self through
>> qmp/hmp.
> 
> I'm guessing this is the real reason for the change.
> How is it helpful for that? Pls clarify in the commit log.

Ok

> 
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  hw/net/virtio-net.c| 30 --
>>  include/hw/virtio/virtio-net.h |  2 --
>>  include/net/net.h  |  2 ++
>>  migration/savevm.c |  6 ++
>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index c321680..6e9ce4f 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, uint8_t 
>> status)
>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>  }
>>  
>> -static void virtio_net_announce_timer(void *opaque)
>> +static void virtio_net_announce(NetClientState *nc)
>>  {
>> -VirtIONet *n = opaque;
>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>  
>> -n->announce_counter--;
>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>> -virtio_notify_config(vdev);
>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>> +virtio_notify_config(vdev);
>> +}
>>  }
>>  
>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>  n->nobcast = 0;
>>  /* multiqueue is disabled by default */
>>  n->curr_queues = 1;
>> -timer_del(n->announce_timer);
>> -n->announce_counter = 0;
>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>  
>>  /* Flush any MAC and VLAN filter table state */
>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>> uint8_t cmd,
>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> -if (n->announce_counter) {
>> -timer_mod(n->announce_timer,
>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> -  self_announce_delay(n->announce_counter));
>> -}
>>  return VIRTIO_NET_OK;
>>  } else {
>>  return VIRTIO_NET_ERR;
> 
> you will notice that this runs at VM speeds with QEMU_CLOCK_VIRTUAL,
> while the self announce you are tying this to runs with
> QEMU_CLOCK_REALTIME.
> 
> So this switch seems wrong to me: by the time VM is started
> on destination all timers might have expired.
> 

That explains what I am seeing now.  Hmm... may be combining them this way
isn't the right approach.  I think I'll leave RARPs alone and export an ability
to start the a nic provided announce feature.

Then Germano's qmp command would be able to call this one if available and the 
old
one if not.  That way, on user-triggered announcements, we wouldn't need to do 
the
RARPs, if a better announcement capability is present.

That way current behavior is preserved for migrations.

Thanks
-vlad

> 
>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void *opaque, 
>> int version_id)
>>  qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>  }
>>  
>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>> -virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS;
>> -timer_mod(n->announce_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>> -}
>> -
>>  return 0;
>>  }
>>  
>> @@ -1829,6 +1818,7 @@ static NetClientInfo net_virtio_info = {
>>  .receive = virtio_net_receive,
>>  .link_status_changed = virtio_net_set_link_status,
>>  .query_rx_filter = virtio_net_query_rxfilter,
>> +.announce = virtio_net_announce,
>>  };
>>  
>>  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
>> @@ -1934,8 +1924,6 @@ static void virtio_net_device_realize(DeviceState 
>> *dev, Error **errp)
>>  qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
>>  memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
>>  n->status = VIRTIO_NET_S_LINK_UP;
>> -n->announce_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> - virtio_net_announce_timer, n);
>>  
>>  if (n->netclient_type) {
>>  /*
>> @@ -1997,8 +1985,6 @@ stati

Re: [Qemu-devel] [PATCH] virtio-net: consolidate guest announcement with qemu_announce_self

2017-03-30 Thread Vlad Yasevich
On 03/30/2017 04:24 AM, Dr. David Alan Gilbert wrote:
> * Vladislav Yasevich (vyase...@redhat.com) wrote:
>> virtio announcements seem to run on its own timer with it's own
>> repetition loop and are invoked separately from qemu_announce_self.
>> This patch consolidates announcements into a single location and
>> allows other nics to provide it's own announcement capabilities, if
>> necessary.
>>
>> This is also usefull in support of exposing qemu_announce_self through
>> qmp/hmp.
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  hw/net/virtio-net.c| 30 --
>>  include/hw/virtio/virtio-net.h |  2 --
>>  include/net/net.h  |  2 ++
>>  migration/savevm.c |  6 ++
>>  4 files changed, 16 insertions(+), 24 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index c321680..6e9ce4f 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -110,14 +110,16 @@ static bool virtio_net_started(VirtIONet *n, uint8_t 
>> status)
>>  (n->status & VIRTIO_NET_S_LINK_UP) && vdev->vm_running;
>>  }
>>  
>> -static void virtio_net_announce_timer(void *opaque)
>> +static void virtio_net_announce(NetClientState *nc)
>>  {
>> -VirtIONet *n = opaque;
>> +VirtIONet *n = qemu_get_nic_opaque(nc);
>>  VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>  
>> -n->announce_counter--;
>> -n->status |= VIRTIO_NET_S_ANNOUNCE;
>> -virtio_notify_config(vdev);
>> +if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>> +virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> +n->status |= VIRTIO_NET_S_ANNOUNCE;
>> +virtio_notify_config(vdev);
>> +}
>>  }
>>  
>>  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>> @@ -427,8 +429,6 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>  n->nobcast = 0;
>>  /* multiqueue is disabled by default */
>>  n->curr_queues = 1;
>> -timer_del(n->announce_timer);
>> -n->announce_counter = 0;
>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>>  
>>  /* Flush any MAC and VLAN filter table state */
>> @@ -868,11 +868,6 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>> uint8_t cmd,
>>  if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>  n->status & VIRTIO_NET_S_ANNOUNCE) {
>>  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> -if (n->announce_counter) {
>> -timer_mod(n->announce_timer,
>> -  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> -  self_announce_delay(n->announce_counter));
>> -}
>>  return VIRTIO_NET_OK;
>>  } else {
>>  return VIRTIO_NET_ERR;
>> @@ -1609,12 +1604,6 @@ static int virtio_net_post_load_device(void *opaque, 
>> int version_id)
>>  qemu_get_subqueue(n->nic, i)->link_down = link_down;
>>  }
>>  
>> -if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>> -virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS;
>> -timer_mod(n->announce_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>> -}
>> -
>>  return 0;
>>  }
>>  
>> @@ -1829,6 +1818,7 @@ static NetClientInfo net_virtio_info = {
>>  .receive = virtio_net_receive,
>>  .link_status_changed = virtio_net_set_link_status,
>>  .query_rx_filter = virtio_net_query_rxfilter,
>> +.announce = virtio_net_announce,
>>  };
>>  
>>  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
>> @@ -1934,8 +1924,6 @@ static void virtio_net_device_realize(DeviceState 
>> *dev, Error **errp)
>>  qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
>>  memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
>>  n->status = VIRTIO_NET_S_LINK_UP;
>> -n->announce_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> - virtio_net_announce_timer, n);
>>  
>>  if (n->netclient_type) {
>>  /*
>> @@ -1997,8 +1985,6 @@ static void virtio_net_device_unrealize(DeviceState 
>> *dev, Error **errp)
>>  virtio_net_del_queue(n, i);
>>  }
>>  
>> -timer_del(n->announce_timer);
>> -timer_free(n->announce_timer);
>>  g_free(n->vqs);
>>  qemu_del_nic(n->nic);
>>  virtio_cleanup(vdev);
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 1eec9a2..0c597f4 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -94,8 +94,6 @@ typedef struct VirtIONet {
>>  char *netclient_name;
>>  char *netclient_type;
>>  uint64_t curr_guest_offloads;
>> -QEMUTimer *announce_timer;
>> -int announce_counter;
>>  bool needs_vnet_hdr_swap;
>>  } VirtIONet;
>>  
>> diff --git a/include/net/net.h b/include/net/net.h
>> index 99b28d5..598f523 100644
>> --- a/include/net/net.h
>> +++ b/include/net/net.h
>> @@ -64,6 +64,7 @@ typedef int (SetVnetLE)(NetClientState *, b

Re: [Qemu-devel] [PATCH v3 2/2] rtl8139: correctly track full receive buffer in standard mode

2015-09-01 Thread Vlad Yasevich
On 08/31/2015 11:15 PM, Jason Wang wrote:
> 
> 
> On 08/31/2015 10:24 PM, Vlad Yasevich wrote:
>> On 08/31/2015 05:59 AM, Jason Wang wrote:
>>>
>>> On 08/28/2015 10:06 PM, Vladislav Yasevich wrote:
>>>> In standard operation mode, when the receive ring buffer
>>>> is full, the buffer actually appears empty to the driver since
>>>> the RxBufAddr (the location we wirte new data to) and RxBufPtr
>>>> (the location guest would stat reading from) are the same.
>>>> As a result, the call to rtl8139_RxBufferEmpty ends up
>>>> returning true indicating that the receive buffer is empty.
>>>> This would result in the next packet overwriting the recevie buffer
>>>> again and stalling receive operations.
>>>>
>>>> This patch tracks the number of unread bytes in the rxbuffer
>>>> using an unused C+ register.  On every read and write, the
>>>> number is adjsted and the special case of a full buffer is also
>>>> trapped.
>>>>
>>>> The C+ register trick is used to simplify migration and not require
>>>> a new machine type.  This register is not used in regular mode
>>>> and C+ mode doesn't have the same issue.
>>>>
>>>> Signed-off-by: Vladislav Yasevich 
>>>> ---
>>>>  hw/net/rtl8139.c | 45 +
>>>>  1 file changed, 41 insertions(+), 4 deletions(-)
>>> I'm not sure this can happen. For example, looks like the following
>>> check in rtl8139_do_receive():
>>>
>>> if (avail != 0 && size + 8 >= avail)
>>> {
>>>
>>> can guarantee there's no overwriting?
>>>
>> The problem is the calculation of avail.  When the buffer is full,
>> avail will be the the size of the receive buffer.  So the test
>> above will be false because the driver thinks there is actually
>> enough room.
>>
>> With his patch, 'avail' will be calculated to 0.
>>
>> -vlad
>>
> 
> If believe the condition size + 8 >= avail can guarantee that the buffer
> won't be full (if we allow size + 8 == avail, buffer will be full)? So
> avail == 0 means the buffer is empty. Or is there anything I miss?
> 

So the issue is that the RxBufAddr is 4 byte aligned, but when we do
availability check above, we don't 4 byte align the size+8 calculation.
That causes the check above to fail when it should succeed and we never
catch the overflow condition.

I'll resubmit with a simple alignment patch that makes this work.

-vlad




Re: [Qemu-devel] [PATCH v3 2/2] rtl8139: correctly track full receive buffer in standard mode

2015-08-31 Thread Vlad Yasevich
On 08/31/2015 05:59 AM, Jason Wang wrote:
> 
> 
> On 08/28/2015 10:06 PM, Vladislav Yasevich wrote:
>> In standard operation mode, when the receive ring buffer
>> is full, the buffer actually appears empty to the driver since
>> the RxBufAddr (the location we wirte new data to) and RxBufPtr
>> (the location guest would stat reading from) are the same.
>> As a result, the call to rtl8139_RxBufferEmpty ends up
>> returning true indicating that the receive buffer is empty.
>> This would result in the next packet overwriting the recevie buffer
>> again and stalling receive operations.
>>
>> This patch tracks the number of unread bytes in the rxbuffer
>> using an unused C+ register.  On every read and write, the
>> number is adjsted and the special case of a full buffer is also
>> trapped.
>>
>> The C+ register trick is used to simplify migration and not require
>> a new machine type.  This register is not used in regular mode
>> and C+ mode doesn't have the same issue.
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  hw/net/rtl8139.c | 45 +
>>  1 file changed, 41 insertions(+), 4 deletions(-)
> 
> I'm not sure this can happen. For example, looks like the following
> check in rtl8139_do_receive():
> 
> if (avail != 0 && size + 8 >= avail)
> {
> 
> can guarantee there's no overwriting?
> 

The problem is the calculation of avail.  When the buffer is full,
avail will be the the size of the receive buffer.  So the test
above will be false because the driver thinks there is actually
enough room.

With his patch, 'avail' will be calculated to 0.

-vlad



Re: [Qemu-devel] [PATCH v2 0/2] rtl8139: Fix buffer overflow in standard mode

2015-08-26 Thread Vlad Yasevich
On 08/26/2015 03:51 PM, Vladislav Yasevich wrote:
> When rtl8139 card is running in standard mode, it is very easy
> to overlflow and the receive buffer and get into a siutation
> where all packets are dropped.  Simply reproduction case is
> to ping the guest from the host with 6500 byte packets.  
> 
> There are actually 2 problems here.
>  1) When the rtl8129 buffer is overflow, the card emulation
> returns the size of the packet back to queue transmission.
> This signals successful reception even though the packet
> has been dropped.  The proper solution is to return 0, so
> that the packet is re-queued and will be resubmitted later.
> 
>  2) When packets are sized such that the fragments end up completely
> filling the receive buffer without overflow, the device thinks
> that the buffer is actually empty (instead of full).  This causes
> next packet to over-write the existing packets.  With the above
> ping reproducer, ever ICMP packet fills the buffer and thus keeps
> overwriting the previous packet and never waking up the guest.
> The solution here is track the number of unread bytes separately
> so we would know if we have anything in buffer to read or not.
> 
> V2: instead of tracking buffer_full condition, changed the code, as
> suggested by Stefan Hajnoczi, to track the number of unread bytes
> instead.  We initialize it to 0 at the start, adjust it on every
> receive from the network and read from the guest and can set
> the number of unread of bytes to full buffer size when the buffer
> full.
> 
> Vladislav Yasevich (2):
>   rtl8139: Do not consume the packet during overflow in standard mode.
>   rtl8139: correctly track full receive buffer in standard mode
> 

Self nack.  The second patch is wrong.  Will resubmit when fixed.

-vlad

>  hw/net/rtl8139.c | 44 +++-
>  1 file changed, 39 insertions(+), 5 deletions(-)
> 




Re: [Qemu-devel] [PATCH 2/2] rtl8139: correctly track full receive buffer in standard mode

2015-08-26 Thread Vlad Yasevich
On 08/26/2015 08:36 AM, Stefan Hajnoczi wrote:
> On Fri, Aug 21, 2015 at 02:59:25PM -0700, Vladislav Yasevich wrote:
>> In standard operation mode, when the receive ring buffer
>> is full, the buffer actually appears empty to the driver since
>> the RxBufAddr (the location we wirte new data to) and RxBufPtr
>> (the location guest would stat reading from) are the same.
>> As a result, the call to rtl8139_RxBufferEmpty ends up
>> returning true indicating that the receive buffer is empty.
>> This would result in the next packet overwriting the recevie buffer
>> again and stalling receive operations.
>>
>> This patch catches the "receive buffer full" condition
>> using an unused C+ register.  This is done to simplify
>> migration and not require a new machine type.
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  hw/net/rtl8139.c | 36 ++--
>>  1 file changed, 34 insertions(+), 2 deletions(-)
> 
> The rtl8139 code duplicates the following expression in several places:
> 
>   MOD2(s->RxBufferSize + s->RxBufAddr - s->RxBufPtr, s->RxBufferSize);
> 
> It may be cleaner to keep a rx_unread_bytes counter so that all these
> users can simply look at that variable.
> 
> That cleanup also eliminates the rx full vs empty problem because then
> we'll know whether rx_unread_bytes == 0 or rx_unread_bytes ==
> s->RxBufferSize.
> 
> The same trick of stashing the value in s->currCPlusRxDesc could be
> used.
> 

Good idea.  I'll give it a try.


>> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
>> index 359e001..3d572ab 100644
>> --- a/hw/net/rtl8139.c
>> +++ b/hw/net/rtl8139.c
>> @@ -816,6 +816,23 @@ static int rtl8139_can_receive(NetClientState *nc)
>>  }
>>  }
>>  
>> +static void rtl8139_set_rxbuf_full(RTL8139State *s, bool full)
>> +{
>> +/* In standard mode, C+ RxDesc isn't used.  Reuse it
>> + * to store the rx_buf_full status.
>> + */
> 
> assert(!s->cplus_enabled)?
> 
>> +s->currCPlusRxDesc = full;
>> +DPRINTF("received: rx buffer full\n");
>> +}
>> +
>> +static bool rtl8139_rxbuf_full(RTL8139State *s)
>> +{
>> +/* In standard mode, C+ RxDesc isn't used.  Reuse it
>> + * to store the rx_buf_full status.
>> + */
> 
> assert(!s->cplus_enabled)?
> 
>> @@ -2601,6 +2630,9 @@ static void rtl8139_RxBufPtr_write(RTL8139State *s, 
>> uint32_t val)
>>  /* this value is off by 16 */
>>  s->RxBufPtr = MOD2(val + 0x10, s->RxBufferSize);
>>  
>> +/* We just read data, clear full buffer state */
>> +rtl8139_set_rxbuf_full(s, false);
>> +
>>  /* more buffer space may be available so try to receive */
>>  qemu_flush_queued_packets(qemu_get_queue(s->nic));
> 
> What if the guest writes this register while we're in C+ mode?
> 

In C+ mode the guest uses a descriptor ring instead of liner buffer so a well 
behaved
C+ guest wouldn't write this value.  Validated this with a linux guest.
I guess a malicious guest could do this, but then it could do a lot of other 
things too.

-vlad




Re: [Qemu-devel] [PATCH 1/2] rtl8139: Do not consume the packet during overflow in standard mode.

2015-08-26 Thread Vlad Yasevich
On 08/26/2015 08:18 AM, Stefan Hajnoczi wrote:
> On Fri, Aug 21, 2015 at 02:59:24PM -0700, Vladislav Yasevich wrote:
>> When operation in standard mode, we currently return the size
>> of packet during buffer overflow.  This consumes the overflow
>> packet.  Return 0 instead so we can re-process the overflow packet
>> when we have room.
>>
>> This fixes issues with lost/dropped fragments of large messages.
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  hw/net/rtl8139.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
>> index edbb61c..359e001 100644
>> --- a/hw/net/rtl8139.c
>> +++ b/hw/net/rtl8139.c
>> @@ -1157,7 +1157,7 @@ static ssize_t rtl8139_do_receive(NetClientState *nc, 
>> const uint8_t *buf, size_t
>>  s->IntrStatus |= RxOverflow;
>>  ++s->RxMissed;
>>  rtl8139_update_irq(s);
>> -return size_;
>> +return 0;
> 
> Every .receive() return 0 must be paired with a
> qemu_flush_queued_packets() call.

Isn't that already there.  A few drivers already return 0 to trigger a requeue. 
 What's
missing?


> 
> Is rtl8139_RxBufPtr_write() guaranteed to be called when the guest
> refills rx buffers?
> 

It calls it on read, once it's consumed a packet, to advance to the next packet 
in the
buffer.

-vlad



Re: [Qemu-devel] [Qemu-discuss] GRO not happening in VM with VxLAN

2015-07-01 Thread Vlad Yasevich
On 07/01/2015 02:50 PM, Santosh R wrote:
> Since the vxlan UDP header checksum is 0, udp_tunnel_gro_complete (called
> via vxlan_gro_complete) is setting SKB_GSO_UDP_TUNNEL in
> skb_shinfo(skb)->gso_type.
> Later when bridge interface tries to forward this packet to tap interface
> (br_dev_queue_push_xmit -> __dev_queue_xmit -> validate_xmit_skb)
> netif_needs_gso will succeed (skb_gso_ok returns 0)
> resulting in packet segmentation (which was earlier aggregated) as tap
> interface doesn't have this feature. Is there a way to enable
> tx-udp_tnl-segmentation for tap interface?

No.  Those features are set by qemu when the guest is created and
mirror any features that are supported by the guest.  So, if the guest
virtio doesn't have that feature, neither will the tap device.

Which kernel version are you using on the host?

-vlad

> 
> # ethtool -K tap0 tx-udp_tnl-segmentation on
> Could not change any device features
> 
> # ethtool -k tap0 | grep -i tnl
> tx-udp_tnl-segmentation: off [fixed]
> 
> On Tue, Jun 30, 2015 at 12:24 PM, Santosh R  wrote:
> 
>>
>> On Mon, Jun 29, 2015 at 9:24 PM, Santosh R  wrote:
>>
>>>
>>> On Mon, Jun 29, 2015 at 9:04 PM, Vlad Yasevich 
>>> wrote:
>>>
>>>> On 06/29/2015 01:46 AM, Santosh R wrote:
>>>>> All,
>>>>>
>>>>>I am testing VxLAN performance in VM. For this I am using below
>>>> command
>>>>> to bring up the VM.
>>>>> # qemu-system-x86_64 -m 4096 -smp 4 -boot c -device
>>>>> virtio-net-pci,netdev=tap0,mac=00:11:22:33:44:55 -netdev
>>>>> tap,id=tap0,script=no,vhost=on  -drive file=/root/vdisk_rhel65.img &
>>>>>
>>>>> This is resulting in lower throughput in VM. On looking further I found
>>>>> that GRO is not happing on the tap interface.
>>>>> Using tcpdump I could see GRO happeing in physcial nic, vxlan and
>>>> bridge
>>>>> interface. but _not_ on tap interface.
>>>>
>>>> I don't think this is GRO.  GRO  only happens at the lowest possible
>>>> layer.
>>>> For packets coming from external sources, that happens just above the nic
>>>> when the packet enters the host for the first time.  From then on, the
>>>> features
>>>> of other devices determine if the large aggregated packet will be
>>>> segmented
>>>> again or not.
>>>>
>>>>
>>>>>
>>>>> So, I thought to use veth pair instead of the tap interface in VM.
>>>> What is
>>>>> the command to do this?
>>>>
>>>> veth pair will not help with this.  you need a tap or a macvtap to have
>>>> VM  talk to host.
>>>>
>>>>> Also,
>>>>> 1. Why is rx-checksumming off for tap interface?
>>>>
>>>> because tap doesn't do checksum validation.  it allows lower level
>>>> devices or
>>>> VM to do so.
>>>>
>>>>> 2. Why is GRO not happening with tap interface?
>>>>>
>>>>
>>>> Again.  I don't think it's GRO.  Try capturing traffic at tap and see if
>>>> you get
>>>> large packets (more the 1 mtu long) on the tap.  If not, then someone
>>>> else is fragmenting
>>>> packets and you need to find who.
>>>>
>>> [Santosh]:  Thanks for the reply.
>>> With VxLAN, aggregated packets are seen on physical, VxLAN and bridge
>>> interface but _not_ on the tap interface.
>>> Without VxLAN, I do see aggregated (GRO) packets all the way upto VM
>>> (i.e. physical, bridge and tap interface).
>>> As you rightly pointed, looks like VxLAN packets are getting fragmented.
>>> Any pointers to look for?
>>>
>>>
>> By the way, I was referring below link for VxLAN testing and ran into the
>> GRO issue.
>> This mentions "The veth0 is suppose to be connected to the VM, while the
>> veth1 is connected to the Linux bridge".
>> https://community.mellanox.com/docs/DOC-1473
>>
> 




Re: [Qemu-devel] [Qemu-discuss] GRO not happening in VM with VxLAN

2015-06-29 Thread Vlad Yasevich
On 06/29/2015 01:46 AM, Santosh R wrote:
> All,
> 
>I am testing VxLAN performance in VM. For this I am using below command
> to bring up the VM.
> # qemu-system-x86_64 -m 4096 -smp 4 -boot c -device
> virtio-net-pci,netdev=tap0,mac=00:11:22:33:44:55 -netdev
> tap,id=tap0,script=no,vhost=on  -drive file=/root/vdisk_rhel65.img &
> 
> This is resulting in lower throughput in VM. On looking further I found
> that GRO is not happing on the tap interface.
> Using tcpdump I could see GRO happeing in physcial nic, vxlan and bridge
> interface. but _not_ on tap interface.

I don't think this is GRO.  GRO  only happens at the lowest possible layer.
For packets coming from external sources, that happens just above the nic
when the packet enters the host for the first time.  From then on, the features
of other devices determine if the large aggregated packet will be segmented
again or not.


> 
> So, I thought to use veth pair instead of the tap interface in VM. What is
> the command to do this?

veth pair will not help with this.  you need a tap or a macvtap to have
VM  talk to host.

> Also,
> 1. Why is rx-checksumming off for tap interface?

because tap doesn't do checksum validation.  it allows lower level devices or
VM to do so.

> 2. Why is GRO not happening with tap interface?
> 

Again.  I don't think it's GRO.  Try capturing traffic at tap and see if you get
large packets (more the 1 mtu long) on the tap.  If not, then someone else is 
fragmenting
packets and you need to find who.

-vlad

> Pls copy me in all the replies as I am not on the list.
> 
> # brctl show
> bridge name bridge id   STP enabled interfaces
> br-vx   8000.323a9146b2d2   no  tap0
> vxlan0
> # ethtool -k eth35
> Features for eth35:
> rx-checksumming: on
> tx-checksumming: on
> tx-checksum-ipv4: on
> tx-checksum-ip-generic: off [fixed]
> tx-checksum-ipv6: on
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
> tx-tcp-segmentation: on
> tx-tcp-ecn-segmentation: on
> tx-tcp6-segmentation: on
> udp-fragmentation-offload: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on
> rx-vlan-filter: off [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: off [fixed]
> tx-ipip-segmentation: off [fixed]
> tx-sit-segmentation: off [fixed]
> tx-udp_tnl-segmentation: on
> tx-mpls-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: on [fixed]
> 
> # ethtool -k vxlan0
> Features for vxlan0:
> rx-checksumming: on
> tx-checksumming: on
> tx-checksum-ipv4: off [fixed]
> tx-checksum-ip-generic: on
> tx-checksum-ipv6: off [fixed]
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
> tx-tcp-segmentation: on
> tx-tcp-ecn-segmentation: on
> tx-tcp6-segmentation: on
> udp-fragmentation-offload: on
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: off [fixed]
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: off [fixed]
> rx-vlan-filter: off [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: on [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: off [fixed]
> tx-ipip-segmentation: off [fixed]
> tx-sit-segmentation: off [fixed]
> tx-udp_tnl-segmentation: off [fixed]
> tx-mpls-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: on
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: off [fixed]
> 
> # ethtool -k tap0
> Features for tap0:
> rx-checksumming: off [fixed]
> tx-checksumming: on
> tx-checksum-ipv4: off [fixed]
> tx-checksum-ip-generic: on
> tx-checksum-ipv6: off [fixed]
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> 

Re: [Qemu-devel] [PATCH] net: Fix link state inter-dependencies between NIC and backend

2015-04-02 Thread Vlad Yasevich
On 04/02/2015 05:21 AM, Stefan Hajnoczi wrote:
> On Thu, Apr 02, 2015 at 08:11:15AM +0200, Michael S. Tsirkin wrote:
>> On Wed, Apr 01, 2015 at 07:55:38PM -0400, Vladislav Yasevich wrote:
>>> Commit 02d38fcb2caa4454cf4ed728d5908c3cc9ba47be
>>> net: Update netdev peer on link change
>>>
>>> introduced a link state dependency between the backend
>>> netdev and the nic.  If the admin turned off the link
>>> on the backend, the nic link state was set to down because
>>> the link had no access to the network at all.  This created
>>> some inconsitet behavior for someone who wanted to play
>>> around the links states.
>>>  1) Turning off the nic link and then turning on the backend
>>> link (even if it was already on) would turn on the nic link
>>> again.
>>>  2) Turning off the backend link and then turning on the nic
>>> link would turn on the link in the VM, but would not change
>>> the backend state resulting in a broken/unreachable network.
>>>
>>> This patch attempts to correct these behaviors.  The patch treats
>>> the nic-backend relationship as two ends of a wire.  Each end tracks
>>> the administratively set link state in addition to actual link
>>> state.  Thus is is possible to set link down at each end effectively
>>> emulating plugging/unplugging the wire at either end.  The patch
>>> continues to preserve the old behavior where setting just
>>> nic side down does NOT impact the backend.
>>>
>>> Signed-off-by: Vladislav Yasevich 
>>
>> I never understood the point of
>> 02d38fcb2caa4454cf4ed728d5908c3cc9ba47be.
>>
>> If you want to tell guest link is down,
>> just set the nic link down.
>>
>> How about we just revert it?
>>
>> Then one can change link state on each end
>> independently, with no notification,
>> and no state to track.
> 
> I agree.  The simple solution would be nice, unless you are aware of a
> use case where it causes problems.

It causes problems when the user turns off link on the backend.  Problems range 
from much
longer boot up times on older VMs to Windows networks not working.

The problem is the if you turn off the link on the backend, the link to the VM 
stays
up but can't really reach the network.

Now, you could say "Don't do this" and document it, but it's still ugly and 
causes
issues.

We could disable the link state change on backends and only allow it on NIC 
type devices.
That would also solve this.

-vlad
> 
> Stefan
> 




Re: [Qemu-devel] [Bug 1362755] [NEW] QEmu +2 does not route VLAN tagged packets, that are originated within the Hypervisor itself.

2014-08-29 Thread Vlad Yasevich
[ realized that the bug and reporter were non cc'd, updated cc list]

On 08/28/2014 02:40 PM, Thiago Martins wrote:
> Public bug reported:
> 
> Guys,
> 
> Trusty QEmu 2.0 Hypervisor fails to create a consistent virtual network.
> It does not route tagged VLAN packets.
> 

The have a been a bunch of rather recent changes to the kernel to support
guest VLANs correctly.  The issues have been around TSO/GSO implementation
in the kernel.

Could try disabling TSO/GSO and tx checksums on the vlan devices in the guest
and see if it solves your problem?

If it does, could you try the kernel from
  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
turn the offloads back on and see if the problem is solved?

Thanks
-vlad

> That's it, it is impossible to use Trusty acting as a QEmu 2.0
> Hypervisor (metapakage `ubuntu-virt-server`), to make a basic virtual
> tagged network within itself. QEmu 2.X guest does not route traffic when
> with tagged VLANs!
> 
> So, Trusty QEmu 2.0 Hypervisor cannot be used to host guests acting as
> "firewalls / routers", and it have an easy to reproduce, connectivity
> problem.
> 
> This network problem affects Ubuntu 14.04.1 (Linux-3.13.0-35-generic)
> with QEmu 2.0 (it also affects 14.10, Linux 3.16 - QEmu 2.1).
> 
> I have this very same setup up and running, on about ~100 physical
> servers (others Trusty QEmu 2.0 Hypervisors), and in only a few of them,
> the QEmu Hypervisors dedicated to host "guest acting as routers /
> firewalls", like a "borger gateway" for example, that it does not work
> as expected.
> 
> One interesting thing to note is that, this BUG appear only, and only
> at, the QEmu Hypervisors dedicated to host guests that are used as
> `router / firewalls` (as I said above), others QEmu Hypervisors of my
> network does not suffer from this problem.
> 
> Another interesting point is that it fails to route tagged VLAN packets
> only when these packets are originated from within the Hypervisor
> itself, I mean, packets from both host and other guests (not the
> router/firewall guest itself), suffer from this connectivity problem.
> 
> As a workaroung / fix, Xen-4.4 can be used, instead of QEmu 2.0, as a
> "border hypervisor". So, this proves that there is something wrong with
> QEmu.
> 
> I already tested it with both `openvswitch-switch` and with `bridge-
> utils`, same bad results. So, don't waste your time trying `bridge-
> utils` (optional steps while reproducing it), you can keep OVS bridges
> from original design.
> 
> I think that I'm using the best pratices to build this environment, as
> follows...
> 
> 
> * Topology *
> 
> 
> QEmu 2.0 Hypervisor - (qemu-host-1.domain.com - the "border hypervisor"):
> 
> 1- Physical machine with 3 NICs;
> 2- Minimal Ubuntu 14.04.1 installed and upgraded;
> 3- Packages installed: "ubuntu-virt-server openvswitch-switch rdnssd tcpdump".
> 
> - eth0 connected to the Internet - VLAN tag 10;
> - eth1 connected to the LAN1 - VLAN tag 100;
> - eth2 connected to the LAN2 - VLAN tag 200;
> 
> 
> Guest (guest-fw-1.domain.com - the "border gateway" itself - regular guest 
> acting as a router with iptables/ip6tables):
> 
> 1- Virtual Machine with 3 NICs (VirtIO);
> 2- Minimal Virtual Machine Ubuntu 14.04.1 installed and upgraded;
> 3- Packages installed: "aiccu iptables vlan pv-grub-menu".
> 
> 
> OBS: You'll need `virt-manager` to connect at `qemu-host-1` to install
> `guest-fw-1`. Then, use `guest-fw-1` as a default gateway for your
> (virt-)lab network, including the `qemu-host-1` itself.
> 
> 
> Steps to reproduce
> 
> 
> * Preparing the `qemu-host-1` host:
> 
> - Configure the /etc/network/interfaces with:
> 
> ---
> # The loopback network interface
> auto lo
> iface lo inet loopback
> 
> auto eth0
> iface eth0 inet manual
> up ip link set $IFACE up
> down ip link set $IFACE down
> 
> auto eth1
> iface eth1 inet manual
>   up ip link set dev $IFACE up
>   down ip link set dev $IFACE down
> 
> auto ovsbr1p1
> iface ovsbr1p1 inet6 auto
> 
> iface ovsbr1p1 inet static
>   address 192.168.1.10
>   netmask 24
>   gateway 192.168.1.1
> 
> auto eth2
> iface eth2 inet manual
>   up ip link set $IFACE up
>   down ip link set $IFACE down
> ---
> 
> 
> - Creating the Hypervisor OVS Bridges:
> 
> ovs-vsctl add-br ovsbr0
> ovs-vsctl add-br ovsbr1
> ovs-vsctl add-br ovsbr2
> 
> 
> - Attaching the bridges to the NICs:
> 
> ovs-vsctl add-port ovsbr0 eth0
> ovs-vsctl add-port ovsbr1 eth1
> ovs-vsctl add-port ovsbr2 eth2
> 
> 
> - Creating the OVS internal tagged interface (best practice?), so the QEmu 
> Hypervisor itself can have its own IP (v4 and v6):
> 
> ovs-vsctl add-port ovsbr1 ovsbr1p1 tag=100 -- set interface ovsbr1p1 
> type=internal
> ovs-vsctl set interface ovsbr1p1 mac=\"32:ac:85:72:ab:fe\"
> 
> 
>  NOTE:
> 
>  * I'm fixing the MAC Address of ovsbr1p1 because I like to use IPv6
> with SLAAC, so, it remain fixed across host reboots.
> 
> 
> - Making Libvirt aware of OVS Bridges:
> 
> Creat

Re: [Qemu-devel] [Bug 1362755] [NEW] QEmu +2 does not route VLAN tagged packets, that are originated within the Hypervisor itself.

2014-08-28 Thread Vlad Yasevich
On 08/28/2014 02:40 PM, Thiago Martins wrote:
> Public bug reported:
> 
> Guys,
> 
> Trusty QEmu 2.0 Hypervisor fails to create a consistent virtual network.
> It does not route tagged VLAN packets.
> 

The have a been a bunch of rather recent changes to the kernel to support
guest VLANs correctly.  The issues have been around TSO/GSO implementation
in the kernel.

Could try disabling TSO/GSO and tx checksums on the vlan devices in the guest
and see if it solves your problem?

If it does, could you try the kernel from
  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
turn the offloads back on and see if the problem is solved?

Thanks
-vlad

> That's it, it is impossible to use Trusty acting as a QEmu 2.0
> Hypervisor (metapakage `ubuntu-virt-server`), to make a basic virtual
> tagged network within itself. QEmu 2.X guest does not route traffic when
> with tagged VLANs!
> 
> So, Trusty QEmu 2.0 Hypervisor cannot be used to host guests acting as
> "firewalls / routers", and it have an easy to reproduce, connectivity
> problem.
> 
> This network problem affects Ubuntu 14.04.1 (Linux-3.13.0-35-generic)
> with QEmu 2.0 (it also affects 14.10, Linux 3.16 - QEmu 2.1).
> 
> I have this very same setup up and running, on about ~100 physical
> servers (others Trusty QEmu 2.0 Hypervisors), and in only a few of them,
> the QEmu Hypervisors dedicated to host "guest acting as routers /
> firewalls", like a "borger gateway" for example, that it does not work
> as expected.
> 
> One interesting thing to note is that, this BUG appear only, and only
> at, the QEmu Hypervisors dedicated to host guests that are used as
> `router / firewalls` (as I said above), others QEmu Hypervisors of my
> network does not suffer from this problem.
> 
> Another interesting point is that it fails to route tagged VLAN packets
> only when these packets are originated from within the Hypervisor
> itself, I mean, packets from both host and other guests (not the
> router/firewall guest itself), suffer from this connectivity problem.
> 
> As a workaroung / fix, Xen-4.4 can be used, instead of QEmu 2.0, as a
> "border hypervisor". So, this proves that there is something wrong with
> QEmu.
> 
> I already tested it with both `openvswitch-switch` and with `bridge-
> utils`, same bad results. So, don't waste your time trying `bridge-
> utils` (optional steps while reproducing it), you can keep OVS bridges
> from original design.
> 
> I think that I'm using the best pratices to build this environment, as
> follows...
> 
> 
> * Topology *
> 
> 
> QEmu 2.0 Hypervisor - (qemu-host-1.domain.com - the "border hypervisor"):
> 
> 1- Physical machine with 3 NICs;
> 2- Minimal Ubuntu 14.04.1 installed and upgraded;
> 3- Packages installed: "ubuntu-virt-server openvswitch-switch rdnssd tcpdump".
> 
> - eth0 connected to the Internet - VLAN tag 10;
> - eth1 connected to the LAN1 - VLAN tag 100;
> - eth2 connected to the LAN2 - VLAN tag 200;
> 
> 
> Guest (guest-fw-1.domain.com - the "border gateway" itself - regular guest 
> acting as a router with iptables/ip6tables):
> 
> 1- Virtual Machine with 3 NICs (VirtIO);
> 2- Minimal Virtual Machine Ubuntu 14.04.1 installed and upgraded;
> 3- Packages installed: "aiccu iptables vlan pv-grub-menu".
> 
> 
> OBS: You'll need `virt-manager` to connect at `qemu-host-1` to install
> `guest-fw-1`. Then, use `guest-fw-1` as a default gateway for your
> (virt-)lab network, including the `qemu-host-1` itself.
> 
> 
> Steps to reproduce
> 
> 
> * Preparing the `qemu-host-1` host:
> 
> - Configure the /etc/network/interfaces with:
> 
> ---
> # The loopback network interface
> auto lo
> iface lo inet loopback
> 
> auto eth0
> iface eth0 inet manual
> up ip link set $IFACE up
> down ip link set $IFACE down
> 
> auto eth1
> iface eth1 inet manual
>   up ip link set dev $IFACE up
>   down ip link set dev $IFACE down
> 
> auto ovsbr1p1
> iface ovsbr1p1 inet6 auto
> 
> iface ovsbr1p1 inet static
>   address 192.168.1.10
>   netmask 24
>   gateway 192.168.1.1
> 
> auto eth2
> iface eth2 inet manual
>   up ip link set $IFACE up
>   down ip link set $IFACE down
> ---
> 
> 
> - Creating the Hypervisor OVS Bridges:
> 
> ovs-vsctl add-br ovsbr0
> ovs-vsctl add-br ovsbr1
> ovs-vsctl add-br ovsbr2
> 
> 
> - Attaching the bridges to the NICs:
> 
> ovs-vsctl add-port ovsbr0 eth0
> ovs-vsctl add-port ovsbr1 eth1
> ovs-vsctl add-port ovsbr2 eth2
> 
> 
> - Creating the OVS internal tagged interface (best practice?), so the QEmu 
> Hypervisor itself can have its own IP (v4 and v6):
> 
> ovs-vsctl add-port ovsbr1 ovsbr1p1 tag=100 -- set interface ovsbr1p1 
> type=internal
> ovs-vsctl set interface ovsbr1p1 mac=\"32:ac:85:72:ab:fe\"
> 
> 
>  NOTE:
> 
>  * I'm fixing the MAC Address of ovsbr1p1 because I like to use IPv6
> with SLAAC, so, it remain fixed across host reboots.
> 
> 
> - Making Libvirt aware of OVS Bridges:
> 
> Create 3 files, one for each bridge, like this (ovsbr0.xml, ovsbr1.xml
> an

Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change

2014-04-02 Thread Vlad Yasevich
On 04/02/2014 11:29 AM, Michael S. Tsirkin wrote:
> On Wed, Apr 02, 2014 at 11:25:32AM -0400, Vlad Yasevich wrote:
>> On 04/02/2014 11:03 AM, Michael S. Tsirkin wrote:
>>> On Wed, Apr 02, 2014 at 10:57:08AM -0400, Vlad Yasevich wrote:
>>>> On 04/02/2014 06:46 AM, Yan Vugenfirer wrote:
>>>>>
>>>>> On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin  wrote:
>>>>>
>>>>>> On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote:
>>>>>>> When a link change occurs on a backend (like tap), we currently do
>>>>>>> not propage such change to the nic.  As a result, when someone turns
>>>>>>> off a link on a tap device, for instance, then a guest doesn't see
>>>>>>> that change and continues to try to send traffic or run DHCP even
>>>>>>> though the lower-layer is disconnected.  This is OK when the network
>>>>>>> is set up as a HUB since the the guest may be connected to other HUB
>>>>>>> ports too, but when it's set up as a netdev, it makes thinkgs worse.
>>>>>>>
>>>>>>> The patch addresses this by setting the peers link down only when the
>>>>>>> peer is not a HUBPORT device.  With this patch, in the following config
>>>>>>>  -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
>>>>>>> when net0 link is turned off, the guest e1000 shows lower-layer link
>>>>>>> down. This allows guests to boot much faster in such configurations.
>>>>>>> With windows guest, it also allows the network to recover properly
>>>>>>> since windows will not configure the link-local IPv4 address, and
>>>>>>> when the link is turned on, the proper address address is configured.
>>>>>>>
>>>>>>> Signed-off-by: Vlad Yasevich 
>>>>>>> ---
>>>>>>> net/net.c | 26 +-
>>>>>>> 1 file changed, 17 insertions(+), 9 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/net.c b/net/net.c
>>>>>>> index 0a88e68..8a084bf 100644
>>>>>>> --- a/net/net.c
>>>>>>> +++ b/net/net.c
>>>>>>> @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, 
>>>>>>> Error **errp)
>>>>>>> nc->info->link_status_changed(nc);
>>>>>>> }
>>>>>>>
>>>>>>> -/* Notify peer. Don't update peer link status: this makes it 
>>>>>>> possible to
>>>>>>> - * disconnect from host network without notifying the guest.
>>>>>>> - * FIXME: is disconnected link status change operation useful?
>>>>>>> - *
>>>>>>> - * Current behaviour is compatible with qemu vlans where there 
>>>>>>> could be
>>>>>>> - * multiple clients that can still communicate with each other in
>>>>>>> - * disconnected mode. For now maintain this compatibility. */
>>>>>>
>>>>>> Hmm this went in without much discussion.
>>>>>>
>>>>>> But before this change went in, it was possible to control link state on 
>>>>>> NIC
>>>>>> and on link separately, without guest noticing.
>>>>>>
>>>>>> Why did we decide to drop this functionality?
>>>>>
>>>>> Not sure there was a real need for the patch.
>>>>>
>>>>> On other hand Windows guest will not receive IP address from DHCP server 
>>>>> if you start VM with tap link down and then set it to up without toggling 
>>>>> link state of the guest device as well.
>>>>
>>>> Same for a linux guest.  It a fault scenario.  So we either handle
>>>> it by propagating the link event, or we forbid it.  Doing neither
>>>> allows for a bad state in common configuration.
>>>
>>> It's how networking works.
>>> If I turn off my router at home I can't get dhcp either.
>>
>> And you device link will show that it has not carrier.
> 
> It doesn't as long as the switch it's connected to is up.
> 
> 
>>>
>>> We had a way to disable networking  at link or at
>>> the router, then removed the ability to turn it
>>> off at th

Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change

2014-04-02 Thread Vlad Yasevich
On 04/02/2014 11:03 AM, Michael S. Tsirkin wrote:
> On Wed, Apr 02, 2014 at 10:57:08AM -0400, Vlad Yasevich wrote:
>> On 04/02/2014 06:46 AM, Yan Vugenfirer wrote:
>>>
>>> On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin  wrote:
>>>
>>>> On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote:
>>>>> When a link change occurs on a backend (like tap), we currently do
>>>>> not propage such change to the nic.  As a result, when someone turns
>>>>> off a link on a tap device, for instance, then a guest doesn't see
>>>>> that change and continues to try to send traffic or run DHCP even
>>>>> though the lower-layer is disconnected.  This is OK when the network
>>>>> is set up as a HUB since the the guest may be connected to other HUB
>>>>> ports too, but when it's set up as a netdev, it makes thinkgs worse.
>>>>>
>>>>> The patch addresses this by setting the peers link down only when the
>>>>> peer is not a HUBPORT device.  With this patch, in the following config
>>>>>  -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
>>>>> when net0 link is turned off, the guest e1000 shows lower-layer link
>>>>> down. This allows guests to boot much faster in such configurations.
>>>>> With windows guest, it also allows the network to recover properly
>>>>> since windows will not configure the link-local IPv4 address, and
>>>>> when the link is turned on, the proper address address is configured.
>>>>>
>>>>> Signed-off-by: Vlad Yasevich 
>>>>> ---
>>>>> net/net.c | 26 +-
>>>>> 1 file changed, 17 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/net/net.c b/net/net.c
>>>>> index 0a88e68..8a084bf 100644
>>>>> --- a/net/net.c
>>>>> +++ b/net/net.c
>>>>> @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, 
>>>>> Error **errp)
>>>>> nc->info->link_status_changed(nc);
>>>>> }
>>>>>
>>>>> -/* Notify peer. Don't update peer link status: this makes it 
>>>>> possible to
>>>>> - * disconnect from host network without notifying the guest.
>>>>> - * FIXME: is disconnected link status change operation useful?
>>>>> - *
>>>>> - * Current behaviour is compatible with qemu vlans where there could 
>>>>> be
>>>>> - * multiple clients that can still communicate with each other in
>>>>> - * disconnected mode. For now maintain this compatibility. */
>>>>
>>>> Hmm this went in without much discussion.
>>>>
>>>> But before this change went in, it was possible to control link state on 
>>>> NIC
>>>> and on link separately, without guest noticing.
>>>>
>>>> Why did we decide to drop this functionality?
>>>
>>> Not sure there was a real need for the patch.
>>>
>>> On other hand Windows guest will not receive IP address from DHCP server if 
>>> you start VM with tap link down and then set it to up without toggling link 
>>> state of the guest device as well.
>>
>> Same for a linux guest.  It a fault scenario.  So we either handle
>> it by propagating the link event, or we forbid it.  Doing neither
>> allows for a bad state in common configuration.
> 
> It's how networking works.
> If I turn off my router at home I can't get dhcp either.

And you device link will show that it has not carrier.

> 
> We had a way to disable networking  at link or at
> the router, then removed the ability to turn it
> off at the router and I'm asking why.

We didn't remove the ability to turn it off at router/switch.
We just now propagate carrier loss correctly.

In fact, if you have a configuration where the VM is connected
to a hub in qemu, turning off the link on the 'tap' connected
to the same hub doesn't not change guest state.

> 
>> This patch chose to propagate the event under certain configuration.
>>
>> -vlad
> 
> So don't do this then.
> If you want guest to know that link is down,
> put it down on guest side.
> 
> It's that simple.
> 

Sorry, but that's not always possible.  The host administrator
may not always be vm administrator and without this patch vm
administrator has no idea what happened to his

Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change

2014-04-02 Thread Vlad Yasevich
On 04/02/2014 06:46 AM, Yan Vugenfirer wrote:
> 
> On Apr 2, 2014, at 11:51 AM, Michael S. Tsirkin  wrote:
> 
>> On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote:
>>> When a link change occurs on a backend (like tap), we currently do
>>> not propage such change to the nic.  As a result, when someone turns
>>> off a link on a tap device, for instance, then a guest doesn't see
>>> that change and continues to try to send traffic or run DHCP even
>>> though the lower-layer is disconnected.  This is OK when the network
>>> is set up as a HUB since the the guest may be connected to other HUB
>>> ports too, but when it's set up as a netdev, it makes thinkgs worse.
>>>
>>> The patch addresses this by setting the peers link down only when the
>>> peer is not a HUBPORT device.  With this patch, in the following config
>>>  -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
>>> when net0 link is turned off, the guest e1000 shows lower-layer link
>>> down. This allows guests to boot much faster in such configurations.
>>> With windows guest, it also allows the network to recover properly
>>> since windows will not configure the link-local IPv4 address, and
>>> when the link is turned on, the proper address address is configured.
>>>
>>> Signed-off-by: Vlad Yasevich 
>>> ---
>>> net/net.c | 26 +-
>>> 1 file changed, 17 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/net/net.c b/net/net.c
>>> index 0a88e68..8a084bf 100644
>>> --- a/net/net.c
>>> +++ b/net/net.c
>>> @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error 
>>> **errp)
>>> nc->info->link_status_changed(nc);
>>> }
>>>
>>> -/* Notify peer. Don't update peer link status: this makes it possible 
>>> to
>>> - * disconnect from host network without notifying the guest.
>>> - * FIXME: is disconnected link status change operation useful?
>>> - *
>>> - * Current behaviour is compatible with qemu vlans where there could be
>>> - * multiple clients that can still communicate with each other in
>>> - * disconnected mode. For now maintain this compatibility. */
>>
>> Hmm this went in without much discussion.
>>
>> But before this change went in, it was possible to control link state on NIC
>> and on link separately, without guest noticing.
>>
>> Why did we decide to drop this functionality?
> 
> Not sure there was a real need for the patch.
> 
> On other hand Windows guest will not receive IP address from DHCP server if 
> you start VM with tap link down and then set it to up without toggling link 
> state of the guest device as well.

Same for a linux guest.  It a fault scenario.  So we either handle
it by propagating the link event, or we forbid it.  Doing neither
allows for a bad state in common configuration.

This patch chose to propagate the event under certain configuration.

-vlad

> 
> Yan. 
> 
>>
>>
>>
>>> -if (nc->peer && nc->peer->info->link_status_changed) {
>>> -nc->peer->info->link_status_changed(nc->peer);
>>> +if (nc->peer) {
>>> +/* Change peer link only if the peer is NIC and then notify peer.
>>> + * If the peer is a HUBPORT or a backend, we do not change the
>>> + * link status.
>>> + *
>>> + * This behavior is compatible with qemu vlans where there could be
>>> + * multiple clients that can still communicate with each other in
>>> + * disconnected mode. For now maintain this compatibility.
>>> + */
>>> +if (nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
>>> +for (i = 0; i < queues; i++) {
>>> +ncs[i]->peer->link_down = !up;
>>> +}
>>> +}
>>> +if (nc->peer->info->link_status_changed) {
>>> +nc->peer->info->link_status_changed(nc->peer);
>>> +}
>>> }
>>> }
>>>
>>> -- 
>>> 1.8.4.2
>>>
>>
> 




Re: [Qemu-devel] [PATCH] net: Update netdev peer on link change

2014-04-02 Thread Vlad Yasevich
On 04/02/2014 04:51 AM, Michael S. Tsirkin wrote:
> On Thu, Nov 21, 2013 at 09:05:51PM -0500, Vlad Yasevich wrote:
>> When a link change occurs on a backend (like tap), we currently do
>> not propage such change to the nic.  As a result, when someone turns
>> off a link on a tap device, for instance, then a guest doesn't see
>> that change and continues to try to send traffic or run DHCP even
>> though the lower-layer is disconnected.  This is OK when the network
>> is set up as a HUB since the the guest may be connected to other HUB
>> ports too, but when it's set up as a netdev, it makes thinkgs worse.
>>
>> The patch addresses this by setting the peers link down only when the
>> peer is not a HUBPORT device.  With this patch, in the following config
>>   -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
>> when net0 link is turned off, the guest e1000 shows lower-layer link
>> down. This allows guests to boot much faster in such configurations.
>> With windows guest, it also allows the network to recover properly
>> since windows will not configure the link-local IPv4 address, and
>> when the link is turned on, the proper address address is configured.
>>
>> Signed-off-by: Vlad Yasevich 
>> ---
>>  net/net.c | 26 +-
>>  1 file changed, 17 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/net.c b/net/net.c
>> index 0a88e68..8a084bf 100644
>> --- a/net/net.c
>> +++ b/net/net.c
>> @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error 
>> **errp)
>>  nc->info->link_status_changed(nc);
>>  }
>>  
>> -/* Notify peer. Don't update peer link status: this makes it possible to
>> - * disconnect from host network without notifying the guest.
>> - * FIXME: is disconnected link status change operation useful?
>> - *
>> - * Current behaviour is compatible with qemu vlans where there could be
>> - * multiple clients that can still communicate with each other in
>> - * disconnected mode. For now maintain this compatibility. */
> 
> Hmm this went in without much discussion.
> 
> But before this change went in, it was possible to control link state on NIC
> and on link separately, without guest noticing.
> 
> Why did we decide to drop this functionality?

This was causing some interesting issues in the non-hub
configurations.  In these configs, the loss of link on the backend
did not propagate to the guest.  The guest would continue to send
traffic and it would just queue up and get dropped.  Eventually the
guest could lose the DHCP lease and there wouldn't be any indication
at all as to why it happened.  From the guest perspective, the link
is working, but it is completely disconnected.
This patch essentially equates the backend to a carrier on the guest
and the loss of a backend is loss of carrier.

This is only done for configurations where you have a 1-to-1
relationship between a hw nic and netdev.  For the old vlan/hubport,
there is no change, since in those case, the guest nic can talk to
other hubports.  In the case of netdev, it can't.

As an example, try booting a VM in which the netdev link is down.
First, it'll take a while to boot if it uses DHCP.  When you
re-enable, the link, the VM will not issue DHCP requests and you'd
have to manually restart the nic in the VM.

With this patch, link detection works properly.

-vlad


> 
> 
> 
>> -if (nc->peer && nc->peer->info->link_status_changed) {
>> -nc->peer->info->link_status_changed(nc->peer);
>> +if (nc->peer) {
>> +/* Change peer link only if the peer is NIC and then notify peer.
>> + * If the peer is a HUBPORT or a backend, we do not change the
>> + * link status.
>> + *
>> + * This behavior is compatible with qemu vlans where there could be
>> + * multiple clients that can still communicate with each other in
>> + * disconnected mode. For now maintain this compatibility.
>> + */
>> +if (nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
>> +for (i = 0; i < queues; i++) {
>> +ncs[i]->peer->link_down = !up;
>> +}
>> +}
>> +if (nc->peer->info->link_status_changed) {
>> +nc->peer->info->link_status_changed(nc->peer);
>> +}
>>  }
>>  }
>>  
>> -- 
>> 1.8.4.2
>>




Re: [Qemu-devel] [PATCH v2 for 2.0] virtio-net: Do not filter VLANs without F_CTRL_VLAN

2014-03-26 Thread Vlad Yasevich
On 03/26/2014 06:29 AM, Amos Kong wrote:
> From: Stefan Fritsch 
> 
> If VIRTIO_NET_F_CTRL_VLAN is not negotiated, do not filter out all
> VLAN-tagged packets but send them to the guest.
> 
> This fixes VLANs with OpenBSD guests (and probably NetBSD, too, because
> the OpenBSD driver started as a port from NetBSD).
> 
> Signed-off-by: Stefan Fritsch 
> Signed-off-by: Amos Kong 
> ---
> V2: it's not good to check guest features at reset time,
> so reset vlan table in setting features
> ---
>  hw/net/virtio-net.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 43b4eda..439477b 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -530,6 +530,12 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
> uint32_t features)
>  }
>  vhost_net_ack_features(tap_get_vhost_net(nc->peer), features);
>  }
> +
> +if ((1 << VIRTIO_NET_F_CTRL_VLAN) & features) {
> +memset(n->vlans, 0, MAX_VLAN >> 3);
> +} else {
> +memset(n->vlans, 0xff, MAX_VLAN >> 3);
> +}
>  }
>  
>  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> 

Reviewed-by: Vlad Yasevich 

-vlad



Re: [Qemu-devel] [PATCH v2] virtio-net: add a field to indicate if vlan table is used

2014-02-20 Thread Vlad Yasevich
On 02/20/2014 11:38 AM, Amos Kong wrote:
> Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
> filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.
> 
> This patch added a new field to @RxFilterInfo to indicate if management
> uses the vlan table.
> 
> [1] http://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg02604.html
> 
> Signed-off-by: Amos Kong 
> ---
> V2: don't make vlan-table optional, add a flag to indicate
> if vlan table is used by management
> ---
>  hw/net/virtio-net.c | 38 +-
>  qapi-schema.json|  3 +++
>  qmp-commands.hx |  2 ++
>  3 files changed, 30 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 3626608..f591f4e 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -222,13 +222,33 @@ static char *mac_strdup_printf(const uint8_t *mac)
>  mac[1], mac[2], mac[3], mac[4], mac[5]);
>  }
>  
> +static intList *get_vlan_table(VirtIONet *n)
> +{
> +intList *list, *entry;
> +int i, j;
> +
> +list = NULL;
> +for (i = 0; i < MAX_VLAN >> 5; i++) {
> +for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> +if (n->vlans[i] & (1U << j)) {
> +entry = g_malloc0(sizeof(*entry));
> +entry->value = (i << 5) + j;
> +entry->next = list;
> +list = entry;
> +}
> +}
> +}
> +
> +return list;
> +}
> +
>  static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
>  {
>  VirtIONet *n = qemu_get_nic_opaque(nc);
> +VirtIODevice *vdev = VIRTIO_DEVICE(n);
>  RxFilterInfo *info;
>  strList *str_list, *entry;
> -intList *int_list, *int_entry;
> -int i, j;
> +int i;
>  
>  info = g_malloc0(sizeof(*info));
>  info->name = g_strdup(nc->name);
> @@ -273,19 +293,11 @@ static RxFilterInfo 
> *virtio_net_query_rxfilter(NetClientState *nc)
>  str_list = entry;
>  }
>  info->multicast_table = str_list;
> +info->vlan_table = get_vlan_table(n);
>  
> -int_list = NULL;
> -for (i = 0; i < MAX_VLAN >> 5; i++) {
> -for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> -if (n->vlans[i] & (1U << j)) {
> -int_entry = g_malloc0(sizeof(*int_entry));
> -int_entry->value = (i << 5) + j;
> -int_entry->next = int_list;
> -int_list = int_entry;
> -}
> -}
> +if ((1 << VIRTIO_NET_F_CTRL_VLAN) & vdev->guest_features) {
> +info->vlan = true;
>  }

So, in the case that vlan filtering is not supported in the guest
we get:
 "vlan": false,
 "vlan-table": [
 0,
 1,
 2,
 ...
 4095
]
since virtio_net now initializes the table to all 1s.
Seems a bit awkward.  We are providing a lot of data that
is simply going to be ignored.

> -info->vlan_table = int_list;
>  
>  /* enable event notification after query */
>  nc->rxfilter_notify_enabled = 1;
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 7cfb5e5..5b54e94 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -4032,6 +4032,8 @@
>  #
>  # @unicast-overflow: unicast table is overflowed or not
>  #
> +# @vlan: whether management uses the vlan table
> +#

The above description seems a bit confusing to me.  The value
we are returning describes whether or not qemu is performing
vlan filtering.  I am not sure if it has any bearing on what
management may be doing.

I think the idea is that management, in the future, would look at
this value and make some decision about applying provided filter
to the current host configuration.

>  # @main-mac: the main macaddr string
>  #
>  # @vlan-table: a list of active vlan id
> @@ -4052,6 +4054,7 @@
>  'broadcast-allowed':  'bool',
>  'multicast-overflow': 'bool',
>  'unicast-overflow':   'bool',
> +'vlan':   'bool',

Not terribly descriptive.  May be call it vlan-filter?

Thanks
-vlad
>  'main-mac':   'str',
>  'vlan-table': ['int'],
>  'unicast-table':  ['str'],
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index cce6b81..b170c79 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3307,6 +3307,7 @@ Each array entry contains the following:
>  - "broadcast-allowed": allow to receive broadcast (json-bool)
>  - "multicast-overflow": multicast table is overflowed (json-bool)
>  - "unicast-overflow": unicast table is overflowed (json-bool)
> +- "vlan": management uses the vlan table (json-bool)
>  - "main-mac": main macaddr string (json-string)
>  - "vlan-table": a json-array of active vlan id
>  - "unicast-table": a json-array of unicast macaddr string
> @@ -3321,6 +3322,7 @@ Example:
>  "name": "vnet0",
>  "main-mac": "52:54:00:12:34:56",
>  "

Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated

2014-02-18 Thread Vlad Yasevich
On 02/18/2014 05:06 AM, Michael S. Tsirkin wrote:
> On Mon, Feb 17, 2014 at 11:58:11AM -0500, Vlad Yasevich wrote:
>> On 02/17/2014 11:56 AM, Eric Blake wrote:
>>> On 02/17/2014 09:52 AM, Eric Blake wrote:
>>>> On 02/16/2014 07:27 PM, Amos Kong wrote:
>>>>> Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
>>>>> filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.
>>>>>
>>>>> We should also not send the vlan table to management, this patch makes
>>>>> the vlan-talbe optional.
>>>>
>>>> s/talbe/table/
>>>>
>>>
>>>>> @@ -4053,7 +4053,7 @@
>>>>>  'multicast-overflow': 'bool',
>>>>>  'unicast-overflow':   'bool',
>>>>>  'main-mac':   'str',
>>>>> -'vlan-table': ['int'],
>>>>> +'*vlan-table': ['int'],
>>>>
>>>> Indentation is now off.
>>>>
>>>>>  - "main-mac": main macaddr string (json-string)
>>>>> -- "vlan-table": a json-array of active vlan id
>>>>> +- "vlan-table": a json-array of active vlan id (optoinal)
>>>>
>>>> s/optoinal/optional/
>>>>
>>>> Those fixes are trivial enough, so I'm okay if you correct them then add:
>>>
>>> Scratch that.  I revoke my R-b, on the following grounds:
>>>
>>> On 02/17/2014 08:27 AM, Vlad Yasevich wrote:
>>>> On 02/16/2014 09:27 PM, Amos Kong wrote:
>>>>> Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
>>>>> filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't
>>> negotiated.
>>>>>
>>>>> We should also not send the vlan table to management, this patch makes
>>>>> the vlan-talbe optional.
>>>>   ^^
>>>>table.
>>>>
>>>> One question I have from the API perspective is can we suddenly change
>>>> something to be optional?  If there are any users of this ,
>>>> wouldn't they have to change now to check the existence of this
>>>> list?
>>>
>>> You are correct.  Since the parameter is an output field, older clients
>>> may be depending on it existing.  It is okay to generate an empty array,
>>> but you must not entirely omit the array unless you add further
>>> justification in your commit message that you are 100% positive that
>>> there are no clients of 1.6 that will be broken when the array no longer
>>> appears in the output.
>>>
>>> Can you rework the patch to just leave the array empty in the case where
>>> the bit does not indicate it is used?  Or do we need to add a new bool
>>> field to the output for new enough management to know whether to use the
>>> array?
>>>
>>
>> I think a completely empty array should be sufficient.
>>
>> -vlad
> 
> An empty array could mean either no vlans allowed or
> all vlans allowed. Also it's nice if users can detect an old
> buggy qemu.
> 
> Let's just add an rx state.
> 

Fine with me.

-vlad



Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated

2014-02-17 Thread Vlad Yasevich
On 02/17/2014 11:56 AM, Eric Blake wrote:
> On 02/17/2014 09:52 AM, Eric Blake wrote:
>> On 02/16/2014 07:27 PM, Amos Kong wrote:
>>> Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
>>> filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.
>>>
>>> We should also not send the vlan table to management, this patch makes
>>> the vlan-talbe optional.
>>
>> s/talbe/table/
>>
> 
>>> @@ -4053,7 +4053,7 @@
>>>  'multicast-overflow': 'bool',
>>>  'unicast-overflow':   'bool',
>>>  'main-mac':   'str',
>>> -'vlan-table': ['int'],
>>> +'*vlan-table': ['int'],
>>
>> Indentation is now off.
>>
>>>  - "main-mac": main macaddr string (json-string)
>>> -- "vlan-table": a json-array of active vlan id
>>> +- "vlan-table": a json-array of active vlan id (optoinal)
>>
>> s/optoinal/optional/
>>
>> Those fixes are trivial enough, so I'm okay if you correct them then add:
> 
> Scratch that.  I revoke my R-b, on the following grounds:
> 
> On 02/17/2014 08:27 AM, Vlad Yasevich wrote:
>> On 02/16/2014 09:27 PM, Amos Kong wrote:
>>> Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
>>> filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't
> negotiated.
>>>
>>> We should also not send the vlan table to management, this patch makes
>>> the vlan-talbe optional.
>>   ^^
>>table.
>>
>> One question I have from the API perspective is can we suddenly change
>> something to be optional?  If there are any users of this ,
>> wouldn't they have to change now to check the existence of this
>> list?
> 
> You are correct.  Since the parameter is an output field, older clients
> may be depending on it existing.  It is okay to generate an empty array,
> but you must not entirely omit the array unless you add further
> justification in your commit message that you are 100% positive that
> there are no clients of 1.6 that will be broken when the array no longer
> appears in the output.
> 
> Can you rework the patch to just leave the array empty in the case where
> the bit does not indicate it is used?  Or do we need to add a new bool
> field to the output for new enough management to know whether to use the
> array?
> 

I think a completely empty array should be sufficient.

-vlad



Re: [Qemu-devel] [PATCH] virtio-net: only output the vlan table when VIRTIO_NET_F_CTRL_VLAN is negotiated

2014-02-17 Thread Vlad Yasevich
On 02/16/2014 09:27 PM, Amos Kong wrote:
> Stefan Fritsch just fixed a virtio-net driver bug [1], virtio-net won't
> filter out VLAN-tagged packets if VIRTIO_NET_F_CTRL_VLAN isn't negotiated.
> 
> We should also not send the vlan table to management, this patch makes
> the vlan-talbe optional.
  ^^
   table.

One question I have from the API perspective is can we suddenly change
something to be optional?  If there are any users of this ,
wouldn't they have to change now to check the existence of this
list?

Thanks
-vlad

> [1] http://lists.nongnu.org/archive/html/qemu-devel/2014-02/msg02604.html
> 
> Signed-off-by: Amos Kong 
> ---
>  hw/net/virtio-net.c | 38 +-
>  qapi-schema.json|  4 ++--
>  qmp-commands.hx |  2 +-
>  3 files changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 3626608..0b32e6a 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -222,13 +222,33 @@ static char *mac_strdup_printf(const uint8_t *mac)
>  mac[1], mac[2], mac[3], mac[4], mac[5]);
>  }
>  
> +static intList *get_vlan_table(VirtIONet *n)
> +{
> +intList *list, *entry;
> +int i, j;
> +
> +list = NULL;
> +for (i = 0; i < MAX_VLAN >> 5; i++) {
> +for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> +if (n->vlans[i] & (1U << j)) {
> +entry = g_malloc0(sizeof(*entry));
> +entry->value = (i << 5) + j;
> +entry->next = list;
> +list = entry;
> +}
> +}
> +}
> +
> +return list;
> +}
> +
>  static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
>  {
>  VirtIONet *n = qemu_get_nic_opaque(nc);
> +VirtIODevice *vdev = VIRTIO_DEVICE(n);
>  RxFilterInfo *info;
>  strList *str_list, *entry;
> -intList *int_list, *int_entry;
> -int i, j;
> +int i;
>  
>  info = g_malloc0(sizeof(*info));
>  info->name = g_strdup(nc->name);
> @@ -274,18 +294,10 @@ static RxFilterInfo 
> *virtio_net_query_rxfilter(NetClientState *nc)
>  }
>  info->multicast_table = str_list;
>  
> -int_list = NULL;
> -for (i = 0; i < MAX_VLAN >> 5; i++) {
> -for (j = 0; n->vlans[i] && j < 0x1f; j++) {
> -if (n->vlans[i] & (1U << j)) {
> -int_entry = g_malloc0(sizeof(*int_entry));
> -int_entry->value = (i << 5) + j;
> -int_entry->next = int_list;
> -int_list = int_entry;
> -}
> -}
> +if ((1 << VIRTIO_NET_F_CTRL_VLAN) & vdev->guest_features) {
> +info->has_vlan_table = true;
> +info->vlan_table = get_vlan_table(n);
>  }
> -info->vlan_table = int_list;

>  
>  /* enable event notification after query */
>  nc->rxfilter_notify_enabled = 1;
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 7cfb5e5..5d48fa9 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -4034,7 +4034,7 @@
>  #
>  # @main-mac: the main macaddr string
>  #
> -# @vlan-table: a list of active vlan id
> +# @vlan-table: #optional a list of active vlan id
>  #
>  # @unicast-table: a list of unicast macaddr string
>  #
> @@ -4053,7 +4053,7 @@
>  'multicast-overflow': 'bool',
>  'unicast-overflow':   'bool',
>  'main-mac':   'str',
> -'vlan-table': ['int'],
> +'*vlan-table': ['int'],
>  'unicast-table':  ['str'],
>  'multicast-table':['str'] }}
>  
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index cce6b81..a1c1dfa 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3308,7 +3308,7 @@ Each array entry contains the following:
>  - "multicast-overflow": multicast table is overflowed (json-bool)
>  - "unicast-overflow": unicast table is overflowed (json-bool)
>  - "main-mac": main macaddr string (json-string)
> -- "vlan-table": a json-array of active vlan id
> +- "vlan-table": a json-array of active vlan id (optoinal)
>  - "unicast-table": a json-array of unicast macaddr string
>  - "multicast-table": a json-array of multicast macaddr string
>  
> 




Re: [Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch

2013-11-22 Thread Vlad Yasevich
On 11/22/2013 04:47 AM, Jason Wang wrote:
> On 11/22/2013 04:04 AM, Vlad Yasevich wrote:
>> e1000 provides a E1000_RAH_AV bit on every complete write
>> to the Receive Address Register.  We can use this bit
>> 2 ways:
>>  1) To trigger HMP notifications.  When the bit is set the
>> mac address is fully set and we can update the HMP.
>>
>>  2) We can turn off he bit on the write to low order bits of
>> the Receive Address Register, so that we would not try
>> to match received traffic to this address when it is
>> not completely set.
>>
>> Signed-off-by: Vlad Yasevich 
>> ---
>>  hw/net/e1000.c | 11 ++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index ae63591..82978ea 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -1106,10 +1106,19 @@ mac_writereg(E1000State *s, int index, uint32_t val)
>>  
>>  s->mac_reg[index] = val;
>>  
>> -if (index == RA || index == RA + 1) {
>> +switch (index) {
>> +case RA:
>> +/* Mask off AV bit on the write of the low dword.  The write of
>> + * the high dword will set the bit.  This way a half-written
>> + * mac address will not be used to filter on rx.
>> + */
>> +s->mac_reg[RA+1] &= ~E1000_RAH_AV;
> 
> If a stupid driver write high dword first, it won't receive any packets.

I need to ping Intel guys again.  I asked them what happens when only
the low register is set, but haven't heard back.

>> +break;
>> +case (RA + 1):
>>  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
>>  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t 
>> *)macaddr);
> 
> Guest may invalid the mac address by clearing the AV bit through writing
> to high dword. So this may notify a wrong mac address.

In this case, testing for the AV bit would solve this issue.
> 
> Generally, we could teset the AV bit before notification, and try to do
> the this on both high and low dword. This obeys specs and
> receive_filter() above.

This will not really help since the AV bit would already be set from the
prior mac address.  So, if a stupid driver writes just the low word,
the AV bit would already be set.

> 
> If we don't want half-written status, driver should clear AV bit before
> each writing of new mac address. But looks like linux and freebsd does
> not do this.  But the window is really small and harmless.

We can emulate this.  Thanks for the idea.

-vlad

>> +break;
>>  }
>>  }
>>  
> 




[Qemu-devel] [PATCH] net: Update netdev peer on link change

2013-11-21 Thread Vlad Yasevich
When a link change occurs on a backend (like tap), we currently do
not propage such change to the nic.  As a result, when someone turns
off a link on a tap device, for instance, then a guest doesn't see
that change and continues to try to send traffic or run DHCP even
though the lower-layer is disconnected.  This is OK when the network
is set up as a HUB since the the guest may be connected to other HUB
ports too, but when it's set up as a netdev, it makes thinkgs worse.

The patch addresses this by setting the peers link down only when the
peer is not a HUBPORT device.  With this patch, in the following config
  -netdev tap,id=net0 -device e1000,mac=X,netdev=net0
when net0 link is turned off, the guest e1000 shows lower-layer link
down. This allows guests to boot much faster in such configurations.
With windows guest, it also allows the network to recover properly
since windows will not configure the link-local IPv4 address, and
when the link is turned on, the proper address address is configured.

Signed-off-by: Vlad Yasevich 
---
 net/net.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/net/net.c b/net/net.c
index 0a88e68..8a084bf 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error 
**errp)
 nc->info->link_status_changed(nc);
 }
 
-/* Notify peer. Don't update peer link status: this makes it possible to
- * disconnect from host network without notifying the guest.
- * FIXME: is disconnected link status change operation useful?
- *
- * Current behaviour is compatible with qemu vlans where there could be
- * multiple clients that can still communicate with each other in
- * disconnected mode. For now maintain this compatibility. */
-if (nc->peer && nc->peer->info->link_status_changed) {
-nc->peer->info->link_status_changed(nc->peer);
+if (nc->peer) {
+/* Change peer link only if the peer is NIC and then notify peer.
+ * If the peer is a HUBPORT or a backend, we do not change the
+ * link status.
+ *
+ * This behavior is compatible with qemu vlans where there could be
+ * multiple clients that can still communicate with each other in
+ * disconnected mode. For now maintain this compatibility.
+ */
+if (nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) {
+for (i = 0; i < queues; i++) {
+ncs[i]->peer->link_down = !up;
+}
+}
+if (nc->peer->info->link_status_changed) {
+nc->peer->info->link_status_changed(nc->peer);
+}
 }
 }
 
-- 
1.8.4.2




[Qemu-devel] [RFC PATCH 2/2] rtl8139: update HMP only when the address is fully written

2013-11-21 Thread Vlad Yasevich
rtl8139 hardware requires 9346 config register to be set into
write mode before mac address can be changed even though it is
not documented.  Every driver inspected so far appears to do
this along with comments that this is an undocumented requirement.

We can use this to help us identify when the mac address has been
completely written.  Simple set a flag whenever mac has changed
and at the next transition of 9346 register from Write to Normal
mode, we update the HMP.

Signed-off-by: Vlad Yasevich 
---
 hw/i386/pc_piix.c|  4 
 hw/i386/pc_q35.c |  4 
 hw/net/rtl8139.c | 50 +-
 include/hw/i386/pc.h |  8 
 4 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2daa111..731ae3b 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -372,6 +372,10 @@ static QEMUMachine pc_i440fx_machine_v1_7 = {
 PC_I440FX_1_7_MACHINE_OPTIONS,
 .name = "pc-i440fx-1.7",
 .init = pc_init_pci_1_7,
+.compat_props = (GlobalProperty[]) {
+PC_COMPAT_1_7,
+{ /* end of list */ }
+},
 };
 
 #define PC_I440FX_1_6_MACHINE_OPTIONS PC_I440FX_1_7_MACHINE_OPTIONS
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index e2b8907..7b6aedf 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -292,6 +292,10 @@ static QEMUMachine pc_q35_machine_v1_7 = {
 PC_Q35_1_7_MACHINE_OPTIONS,
 .name = "pc-q35-1.7",
 .init = pc_q35_init_1_7,
+.compat_props = (GlobalProperty[]) {
+PC_COMPAT_1_7,
+{ /* end of list */ }
+},
 };
 
 #define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_1_7_MACHINE_OPTIONS
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 7f2b4db..5c4caec 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -476,6 +476,7 @@ typedef struct RTL8139State {
 
 uint16_t CpCmd;
 uint8_t  TxThresh;
+bool mac_changed;
 
 NICState *nic;
 NICConf conf;
@@ -515,6 +516,9 @@ typedef struct RTL8139State {
 
 /* Support migration to/from old versions */
 int rtl8139_mmio_io_addr_dummy;
+#define RTL8139_FLAG_MAC_BIT 0
+#define RTL8139_FLAG_MAC_COMPLETE (1 << RTL8139_FLAG_MAC_BIT)
+uint32_tcompat_flags;
 } RTL8139State;
 
 /* Writes tally counters to memory via DMA */
@@ -1215,6 +1219,7 @@ static void rtl8139_reset(DeviceState *d)
 /* restore MAC address */
 memcpy(s->phys, s->conf.macaddr.a, 6);
 qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed = false;
 
 /* reset interrupt mask */
 s->IntrStatus = 0;
@@ -1563,6 +1568,14 @@ static void rtl8139_Cfg9346_write(RTL8139State *s, 
uint32_t val)
 /* Reset.  */
 val = 0;
 rtl8139_reset(d);
+} else if (opmode == Cfg9346_Normal && s->mac_changed) {
+/* Even though it is not documented, it is required to set
+ * opmode to Cfg9346_ConfigWrite when changing the mac address
+ * of the card and to set to Cfg9346_Normal when done.  We
+ * can use this as an idication to kick off the notification event.
+ */
+qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed = false;
 }
 
 s->Cfg9346 = val;
@@ -2743,7 +2756,12 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)
 {
 case MAC0 ... MAC0+5:
 s->phys[addr - MAC0] = val;
-qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+if (s->compat_flags & RTL8139_FLAG_MAC_COMPLETE) {
+s->mac_changed = true;
+} else if (addr == MAC0+5) {
+/* Emulate old style updates on the last write */
+qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+}
 break;
 case MAC0+6 ... MAC0+7:
 /* reserved */
@@ -3256,6 +3274,13 @@ static int rtl8139_post_load(void *opaque, int 
version_id)
  * to link status bit in BasicModeStatus */
 qemu_get_queue(s->nic)->link_down = (s->BasicModeStatus & 0x04) == 0;
 
+/* Emulate old behavior if we don't support mac change completion
+ * tracking
+ */
+if (!(s->compat_flags & RTL8139_FLAG_MAC_COMPLETE)) {
+s->mac_changed = false;
+}
+
 return 0;
 }
 
@@ -3286,6 +3311,24 @@ static void rtl8139_pre_save(void *opaque)
 s->rtl8139_mmio_io_addr_dummy = 0;
 }
 
+static bool rtl8139_mac_state_needed(void *opaque)
+{
+RTL8139State *s = opaque;
+
+return (s->compat_flags & RTL8139_FLAG_MAC_COMPLETE) && s->mac_changed;
+}
+
+static const VMStateDescription vmstate_rtl8139_mac_state ={
+.name = "rtl8139/mac_state",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_BOOL(mac_changed, RTL8139State),
+V

[Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch

2013-11-21 Thread Vlad Yasevich
e1000 provides a E1000_RAH_AV bit on every complete write
to the Receive Address Register.  We can use this bit
2 ways:
 1) To trigger HMP notifications.  When the bit is set the
mac address is fully set and we can update the HMP.

 2) We can turn off he bit on the write to low order bits of
the Receive Address Register, so that we would not try
to match received traffic to this address when it is
not completely set.

Signed-off-by: Vlad Yasevich 
---
 hw/net/e1000.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index ae63591..82978ea 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1106,10 +1106,19 @@ mac_writereg(E1000State *s, int index, uint32_t val)
 
 s->mac_reg[index] = val;
 
-if (index == RA || index == RA + 1) {
+switch (index) {
+case RA:
+/* Mask off AV bit on the write of the low dword.  The write of
+ * the high dword will set the bit.  This way a half-written
+ * mac address will not be used to filter on rx.
+ */
+s->mac_reg[RA+1] &= ~E1000_RAH_AV;
+break;
+case (RA + 1):
 macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
 macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
 qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t *)macaddr);
+break;
 }
 }
 
-- 
1.8.4.2




[Qemu-devel] [RFC PATCH 0/2] Update HMP only upon mac change completion.

2013-11-21 Thread Vlad Yasevich
Recent threads regarding e1000/rtl8139 and mac address change notifications
prompted some research into the respecitive hw data sheets as well as
available drivers.  What I found is that each hw has a mechanism that
can be used by our emulation layer to determine when the mac address
change has completed (when the OS finished writing the mac address),
and we can use these mechanisms to trigger HMP notifications.

This is only an RFC series.  It's been tested and works well.
I've split e1000 and rtl8139 changes as they are sufficiently
different.  e1000 make this very clean and easy, but rtl8139
isn't as nice.

Please take a look and I'd like to hear your comments.

Thanks
-vlad

Vlad Yasevich (2):
  e1000: Use Address_Available bit as HW latch
  rtl8139: update HMP only when the address is fully written

 hw/i386/pc_piix.c|  4 
 hw/i386/pc_q35.c |  4 
 hw/net/e1000.c   | 11 ++-
 hw/net/rtl8139.c | 50 +-
 include/hw/i386/pc.h |  8 
 5 files changed, 75 insertions(+), 2 deletions(-)

-- 
1.8.4.2




Re: [Qemu-devel] [PATCH for-1.7] qdev-properties-system.c: Allow vlan or netdev for -device, not both

2013-11-21 Thread Vlad Yasevich
On 11/11/2013 02:50 AM, Paolo Bonzini wrote:
> Il 11/11/2013 06:18, Jason Wang ha scritto:
>> On 11/08/2013 10:13 AM, Vlad Yasevich wrote:
>>> It is currently possible to specify things like:
>>> -device e1000,netdev=foo,vlan=1
>>> With this usage, whichever argument was specified last (vlan or netdev)
>>> overwrites what was previousely set and results in a non-working
>>> configuration.  Even worse, when used with multiqueue devices,
>>> it causes a segmentation fault on exit in qemu_free_net_client.
>>>
>>> That patch treates the above command line options as invalid and
>>> generates an error at start-up.
>>>
>>> Signed-off-by: Vlad Yasevich 
>>> ---
>>>  hw/core/qdev-properties-system.c | 9 +
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/hw/core/qdev-properties-system.c 
>>> b/hw/core/qdev-properties-system.c
>>> index 0eada32..729efa8 100644
>>> --- a/hw/core/qdev-properties-system.c
>>> +++ b/hw/core/qdev-properties-system.c
>>> @@ -205,6 +205,11 @@ static int parse_netdev(DeviceState *dev, const char 
>>> *str, void **ptr)
>>>  goto err;
>>>  }
>>>  
>>> +if (ncs[i]) {
>>> +ret = -EINVAL;
>>> +goto err;
>>> +}
>>> +
>>>  ncs[i] = peers[i];
>>>  ncs[i]->queue_index = i;
>>>  }
>>> @@ -301,6 +306,10 @@ static void set_vlan(Object *obj, Visitor *v, void 
>>> *opaque,
>>>  *ptr = NULL;
>>>  return;
>>>  }
>>> +if (*ptr) {
>>> +error_set_from_qdev_prop_error(errp, -EINVAL, dev, prop, name);
>>> +return;
>>> +}
>>>  
>>>  hubport = net_hub_port_find(id);
>>>  if (!hubport) {
>>
>> Acked-by: Jason Wang 
> 
> This patch is good for 1.7.
> 
> Paolo
> 

Hi All

Just wondering what's to become of this patch?  It has an ACK, but has
been abandoned.  It does a fix a crash in qemu.

Thanks
-vlad




Re: [Qemu-devel] [PATCH for-1.7] Revert "e1000/rtl8139: update HMP NIC when every bit is written"

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 05:40 PM, Alex Williamson wrote:
> On Mon, 2013-11-18 at 17:07 -0500, Vlad Yasevich wrote:
>> On 11/18/2013 04:33 PM, Alex Williamson wrote:
>>> On Mon, 2013-11-18 at 15:57 -0500, Vlad Yasevich wrote:
>>>> On 11/18/2013 03:33 PM, Alex Williamson wrote:
>>>>> On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote:
>>>>>> On 11/18/2013 02:58 PM, Alex Williamson wrote:
>>>>>>> On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote:
>>>>>>>> This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
>>>>>>>> Digging into hardware specs shows this does not
>>>>>>>> actually make QEMU behave more like hardware.
>>>>>>>> Let's stick to the tried heuristic for 1.7 and
>>>>>>>> possibly revisit for 1.8.
>>>>>>>
>>>>>>> If this is broken, then so are these:
>>>>>>>
>>>>>>> 23c37c37f0280761072c23bf67d3a4f3c0ff25aa
>>>>>>> 7c36507c2b8776266f50c5e2739bd18279953b93
>>>>>>
>>>>>> These aren't really broken.  They just assume that the high order
>>>>>> writes will happen after the low order writes.
>>>>>>
>>>>>> In the case of e1000, this is a little more then an assumption
>>>>>> (particularly in the case of nic initilization).
>>>>>
>>>>> But AIUI there's also a valid bit in that high order byte on e1000, so
>>>>> reverting cd5be582 means we stuff a new mac into qemu less often, but
>>>>> it's still only accurate some of the time.
>>>>
>>>> Yes, there is a slight issue with validity of mac at the time of
>>>> processing packets.  I have an outstanding question on the Intel
>>>> list about this behavior with real HW.  But, with e1000, the validity
>>>> bit provides a much higher guarantee that a guest that will be
>>>> setting the mac address will write the high register second to
>>>> guarantee that when the valid bit is written, the mac is fully
>>>> valid.  As a result we don't really need the e1000 part of the
>>>> cd5be5829.
>>>
>>> But doesn't that valid bit mean that a mac update will start and end
>>> with a write to the high order register?  So we're assuming:
>>>
>>> a) write RA + 1 (invalidate)
>>> b) write RA (write low)
>>> c) write RA + 1 (write high + valid)
>>>
>>
>> No. On update, only steps b and c typically happen.  Thus my question
>> to the on the intel list.
> 
> So perhaps the bit is some kind of data latch bit and the mac address
> fields within those registers are effectively scratch until that bit is
> written?
> 
>>> Without cd5be5829 the only change is that we don't store a new mac into
>>> the monitor at b).  The mac stored in the monitor is still wrong from a)
>>> until c).  So it's ever so slightly less broken without cd5be5829.
>>
>> Since there is really no a), the mac in the monitor is only different
>> after step b).  since it's is incomplete and we expect step c), there
>> is really no point in updating it.
> 
> Great, so I have no argument against reverting, or just fixing, that
> chunk of cd5be5829.  Let's implement the latch bit too.
> 
>>>>>
>>>>>> In the case of RTL nic, it is just an  assumption, but it hasn't
>>>>>> been shown faulty yet.  We do plan to address this a bit more
>>>>>> thoroughly.
>>>>>
>>>>> So how is RTL less broken without cd5be582?  AIUI the valid bit is off
>>>>> in a separate register on RTL, so we have no guarantee about order of
>>>>> updating the mac.  Without cd5be582 the info in the monitor may be
>>>>> permanently broken if the guest uses a write order other than what we
>>>>> assume.
>>>>>
>>>>
>>>> This one is actually not as bad either.  RTL spec requires that
>>>> receive register writes happen as 32 bit word writes.  This is
>>>> what linux and bsd drivers do, so from driver perspective, the
>>>> issue is the same.  What our emulation layer does is turn these
>>>> 32 bit writes into 4 8-bit writes.  This is likely due to some
>>>> very broken and very old drivers, but I am not sure.
>>>>
>>>> So, the information in the monitor will be broken if the

Re: [Qemu-devel] [PATCH for-1.7] Revert "e1000/rtl8139: update HMP NIC when every bit is written"

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 04:33 PM, Alex Williamson wrote:
> On Mon, 2013-11-18 at 15:57 -0500, Vlad Yasevich wrote:
>> On 11/18/2013 03:33 PM, Alex Williamson wrote:
>>> On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote:
>>>> On 11/18/2013 02:58 PM, Alex Williamson wrote:
>>>>> On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote:
>>>>>> This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
>>>>>> Digging into hardware specs shows this does not
>>>>>> actually make QEMU behave more like hardware.
>>>>>> Let's stick to the tried heuristic for 1.7 and
>>>>>> possibly revisit for 1.8.
>>>>>
>>>>> If this is broken, then so are these:
>>>>>
>>>>> 23c37c37f0280761072c23bf67d3a4f3c0ff25aa
>>>>> 7c36507c2b8776266f50c5e2739bd18279953b93
>>>>
>>>> These aren't really broken.  They just assume that the high order
>>>> writes will happen after the low order writes.
>>>>
>>>> In the case of e1000, this is a little more then an assumption
>>>> (particularly in the case of nic initilization).
>>>
>>> But AIUI there's also a valid bit in that high order byte on e1000, so
>>> reverting cd5be582 means we stuff a new mac into qemu less often, but
>>> it's still only accurate some of the time.
>>
>> Yes, there is a slight issue with validity of mac at the time of
>> processing packets.  I have an outstanding question on the Intel
>> list about this behavior with real HW.  But, with e1000, the validity
>> bit provides a much higher guarantee that a guest that will be
>> setting the mac address will write the high register second to
>> guarantee that when the valid bit is written, the mac is fully
>> valid.  As a result we don't really need the e1000 part of the
>> cd5be5829.
> 
> But doesn't that valid bit mean that a mac update will start and end
> with a write to the high order register?  So we're assuming:
> 
> a) write RA + 1 (invalidate)
> b) write RA (write low)
> c) write RA + 1 (write high + valid)
> 

No. On update, only steps b and c typically happen.  Thus my question
to the on the intel list.


> Without cd5be5829 the only change is that we don't store a new mac into
> the monitor at b).  The mac stored in the monitor is still wrong from a)
> until c).  So it's ever so slightly less broken without cd5be5829.

Since there is really no a), the mac in the monitor is only different
after step b).  since it's is incomplete and we expect step c), there
is really no point in updating it.

> 
>>>
>>>> In the case of RTL nic, it is just an  assumption, but it hasn't
>>>> been shown faulty yet.  We do plan to address this a bit more
>>>> thoroughly.
>>>
>>> So how is RTL less broken without cd5be582?  AIUI the valid bit is off
>>> in a separate register on RTL, so we have no guarantee about order of
>>> updating the mac.  Without cd5be582 the info in the monitor may be
>>> permanently broken if the guest uses a write order other than what we
>>> assume.
>>>
>>
>> This one is actually not as bad either.  RTL spec requires that
>> receive register writes happen as 32 bit word writes.  This is
>> what linux and bsd drivers do, so from driver perspective, the
>> issue is the same.  What our emulation layer does is turn these
>> 32 bit writes into 4 8-bit writes.  This is likely due to some
>> very broken and very old drivers, but I am not sure.
>>
>> So, the information in the monitor will be broken if the guest
>> does: write_hi(); write_lo();  A part of me would really like
>> to see a guest that does this :)
> 
> So the argument for reverting here is that it seems unlikely that the
> dwords get written in the hi->lo order and we'd rather the monitor get
> stuck with the wrong mac forever

For how many/which guests?  I know it's not linux or BSD.  I need to
boot windows to see what it does, but I think it does the right thing.

> than it show the wrong mac address for
> a tiny window for everybody else?

Yes, this would happen for everybody.  If you are querying the output,
you might see this and it will show up as 2 changes.

We are talking about 2 "tiny" amounts here:
On the one hand, we __might__ have guests that write mac and reverse
order thus showing wrong address.
On the other hand we have all the guests who will show the wrong address
for a _short_ time.

I have a hard time deciding, but have a slight preference for a small
uncertainty (the 

Re: [Qemu-devel] [PATCH for-1.7] Revert "e1000/rtl8139: update HMP NIC when every bit is written"

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 03:33 PM, Alex Williamson wrote:
> On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote:
>> On 11/18/2013 02:58 PM, Alex Williamson wrote:
>>> On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote:
>>>> This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
>>>> Digging into hardware specs shows this does not
>>>> actually make QEMU behave more like hardware.
>>>> Let's stick to the tried heuristic for 1.7 and
>>>> possibly revisit for 1.8.
>>>
>>> If this is broken, then so are these:
>>>
>>> 23c37c37f0280761072c23bf67d3a4f3c0ff25aa
>>> 7c36507c2b8776266f50c5e2739bd18279953b93
>>
>> These aren't really broken.  They just assume that the high order
>> writes will happen after the low order writes.
>>
>> In the case of e1000, this is a little more then an assumption
>> (particularly in the case of nic initilization).
> 
> But AIUI there's also a valid bit in that high order byte on e1000, so
> reverting cd5be582 means we stuff a new mac into qemu less often, but
> it's still only accurate some of the time.

Yes, there is a slight issue with validity of mac at the time of
processing packets.  I have an outstanding question on the Intel
list about this behavior with real HW.  But, with e1000, the validity
bit provides a much higher guarantee that a guest that will be
setting the mac address will write the high register second to
guarantee that when the valid bit is written, the mac is fully
valid.  As a result we don't really need the e1000 part of the
cd5be5829.


> 
>> In the case of RTL nic, it is just an  assumption, but it hasn't
>> been shown faulty yet.  We do plan to address this a bit more
>> thoroughly.
> 
> So how is RTL less broken without cd5be582?  AIUI the valid bit is off
> in a separate register on RTL, so we have no guarantee about order of
> updating the mac.  Without cd5be582 the info in the monitor may be
> permanently broken if the guest uses a write order other than what we
> assume.
> 

This one is actually not as bad either.  RTL spec requires that
receive register writes happen as 32 bit word writes.  This is
what linux and bsd drivers do, so from driver perspective, the
issue is the same.  What our emulation layer does is turn these
32 bit writes into 4 8-bit writes.  This is likely due to some
very broken and very old drivers, but I am not sure.

So, the information in the monitor will be broken if the guest
does: write_hi(); write_lo();  A part of me would really like
to see a guest that does this :)

The current code isn't perfect either.  It still has a potential
to show the wrong mac address in the monitor.  I doesn't make
a lot of sense to me to replace one sub-optimal solution
with another sub-optimal solution, especially since no-one
complained.

-vlad

>> The patch that was applied was controversial and more then 1 person
>> expressed reservations.
> 
> Understood, but it also raised and addressed a shortcoming in the
> previous patches.  If cd5be582 was controversial because the monitor was
> getting updated with incorrect mac addresses then this simple revert
> doesn't solve that problem.  Thanks,
> 
> Alex
> 
>>>
>>> None of these change the behavior of hardware, they only change when the
>>> monitor gets told about mac address changes.  I'd suggest either add the
>>> emulation described in each spec or revert all of them.  A partial
>>> revert is just noise.  Thanks,
>>>
>>> Alex
>>>
>>>>
>>>> Reported-by: Vlad Yasevich 
>>>> Cc: Amos Kong 
>>>> Cc: Alex Williamson 
>>>> ---
>>>>  hw/net/e1000.c   | 2 +-
>>>>  hw/net/rtl8139.c | 5 -
>>>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>>>> index ae63591..8387443 100644
>>>> --- a/hw/net/e1000.c
>>>> +++ b/hw/net/e1000.c
>>>> @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)
>>>>  
>>>>  s->mac_reg[index] = val;
>>>>  
>>>> -if (index == RA || index == RA + 1) {
>>>> +if (index == RA + 1) {
>>>>  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
>>>>  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
>>>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t 
>>>> *)macaddr);
>>>> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
>>>> index 7f2b4db..5329f44 100644
>>>> --- a/hw/net/rtl8139.c
>>>> +++ b/hw/net/rtl8139.c
>>>> @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
>>>> addr, uint32_t val)
>>>>  
>>>>  switch (addr)
>>>>  {
>>>> -case MAC0 ... MAC0+5:
>>>> +case MAC0 ... MAC0+4:
>>>> +s->phys[addr - MAC0] = val;
>>>> +break;
>>>> +case MAC0+5:
>>>>  s->phys[addr - MAC0] = val;
>>>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
>>>>  break;
>>>
>>>
>>>
>>
> 
> 
> 




Re: [Qemu-devel] [PATCH for-1.7] Revert "e1000/rtl8139: update HMP NIC when every bit is written"

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 02:58 PM, Alex Williamson wrote:
> On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote:
>> This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
>> Digging into hardware specs shows this does not
>> actually make QEMU behave more like hardware.
>> Let's stick to the tried heuristic for 1.7 and
>> possibly revisit for 1.8.
> 
> If this is broken, then so are these:
> 
> 23c37c37f0280761072c23bf67d3a4f3c0ff25aa
> 7c36507c2b8776266f50c5e2739bd18279953b93

These aren't really broken.  They just assume that the high order
writes will happen after the low order writes.

In the case of e1000, this is a little more then an assumption
(particularly in the case of nic initilization).

In the case of RTL nic, it is just an  assumption, but it hasn't
been shown faulty yet.  We do plan to address this a bit more
thoroughly.

The patch that was applied was controversial and more then 1 person
expressed reservations.

-vlad

> 
> None of these change the behavior of hardware, they only change when the
> monitor gets told about mac address changes.  I'd suggest either add the
> emulation described in each spec or revert all of them.  A partial
> revert is just noise.  Thanks,
> 
> Alex
> 
>>
>> Reported-by: Vlad Yasevich 
>> Cc: Amos Kong 
>> Cc: Alex Williamson 
>> ---
>>  hw/net/e1000.c   | 2 +-
>>  hw/net/rtl8139.c | 5 -
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index ae63591..8387443 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)
>>  
>>  s->mac_reg[index] = val;
>>  
>> -if (index == RA || index == RA + 1) {
>> +if (index == RA + 1) {
>>  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
>>  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t 
>> *)macaddr);
>> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
>> index 7f2b4db..5329f44 100644
>> --- a/hw/net/rtl8139.c
>> +++ b/hw/net/rtl8139.c
>> @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
>> addr, uint32_t val)
>>  
>>  switch (addr)
>>  {
>> -case MAC0 ... MAC0+5:
>> +case MAC0 ... MAC0+4:
>> +s->phys[addr - MAC0] = val;
>> +break;
>> +case MAC0+5:
>>  s->phys[addr - MAC0] = val;
>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
>>  break;
> 
> 
> 




Re: [Qemu-devel] [PATCH for-1.7] Revert "e1000/rtl8139: update HMP NIC when every bit is written"

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 02:47 PM, Michael S. Tsirkin wrote:
> This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
> Digging into hardware specs shows this does not
> actually make QEMU behave more like hardware.
> Let's stick to the tried heuristic for 1.7 and
> possibly revisit for 1.8.
> 
> Reported-by: Vlad Yasevich 
> Cc: Amos Kong 
> Cc: Alex Williamson 

Reviewed-by: Vlad Yasevich  ---
>  hw/net/e1000.c   | 2 +-
>  hw/net/rtl8139.c | 5 -
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index ae63591..8387443 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -1106,7 +1106,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)
>  
>  s->mac_reg[index] = val;
>  
> -if (index == RA || index == RA + 1) {
> +if (index == RA + 1) {
>  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
>  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
>  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t *)macaddr);
> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> index 7f2b4db..5329f44 100644
> --- a/hw/net/rtl8139.c
> +++ b/hw/net/rtl8139.c
> @@ -2741,7 +2741,10 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
> addr, uint32_t val)
>  
>  switch (addr)
>  {
> -case MAC0 ... MAC0+5:
> +case MAC0 ... MAC0+4:
> +s->phys[addr - MAC0] = val;
> +break;
> +case MAC0+5:
>  s->phys[addr - MAC0] = val;
>  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
>  break;
> 




Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 02:42 PM, Michael S. Tsirkin wrote:
> On Mon, Nov 18, 2013 at 12:33:20PM -0500, Vlad Yasevich wrote:
>> On 11/18/2013 10:02 AM, Michael S. Tsirkin wrote:
>>> On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote:
>>>> We currently just update the HMP NIC info when the last bit of macaddr
>>>> is written. This assumes that guest driver will write all the macaddr
>>>> from bit 0 to bit 5 when it changes the macaddr, this is the current
>>>> behavior of linux driver (e1000/rtl8139cp), but we can't do this
>>>> assumption.
>>>>
>>>> The macaddr that is used for rx-filter will be updated when every bit
>>>> is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
>>>> info when every bit is changed. It will be same as virtio-net.
>>>>
>>>> Signed-off-by: Amos Kong 
>>>
>>> Vlad here told me he did some research and this
>>> does not actually match hardware behaviour
>>> for either e1000 or rtl8139.
>>>
>>> Vlad, would you like to elaborate on-list?
>>
>> Sure.
>>
>> I decided to dig into the hardware data-sheets and the OS drivers
>> that use the HW to see what's really going on and how the
>> hw expects this data to be programmed.
>>
>> Here is what I've found so far:
>> E1000:
>>e1000 stores each mac address in 2 registers:
>>RAL - receive register low
>>RAH - receive register high
>>The highest order bit of RAH (bit 31) is called the address available
>>bit.  When this bit is set the HW will attempt to use the address for
>>packet matching.  So, when the mac address is initially programmed
>>into HW, that AV bit is not set until RAH is written.  Thus drivers
>>really should do writes in RAL+RAH order, otherwise AV bit would be
>>set on a partially set address.
>>There is a slight issue when the receive address registers already
>>have a value.  Since the address is written in 2 separate writes,
>>there is a very small window of time when the RAL is set to the new
>>value and RAH is set to the old value with AV still set.  I am
>>talking to Intel guys about this now.
>>
>>So from the POV of notifying libvirt/management that address is
>>changed, it should be done when RAH is set.
>>
>> RTL8139:
>>Realtek devices have a 9346CR Command Register that gates write
>>access to certain configuration regions on the HW.  It is placed
>>into "Configuration register write enabled" mode before driver can
>>write to one of a set of configuration spaces.  Even though
>>the data sheet doesn't mention this, it appears that this must
>>also must be used to guard write access to receive address register
>>of the card.  All variants of BSD and linux drivers that I've found
>>use this along with comment that say that this is an undocumented
>>requirement.
> 
> What does a windows driver do BTW?

I'll boot windows and find out.

-vlad

> 
>>I am not sure what the HW does to incoming frames when
>>the command register is to this mode.
>>I see 2 things that we might be able to do here:
>>  1) A low-impact change might be to only notify the management when
>> we've detected an address change and currently exiting
>>  "config write" mode.
>>  2) A more invasive change might be to disable rx_handling while in
>> "config wirte" mode.  This would prevent attempting to match
>>  packets to a partially written mac address.
>>
>>I have a patch that does (1) above.
>>
>>
>> Thoughts?
>> -vlad
> 
> Let's start by reverting cd5be5829c1ce87aa6b3a7806524fac07ac9a757.
> 
>>>
>>> I think we should revert this for 1.8 and
>>> look at emulating actual hardware behaviour.
>>>
>>>> ---
>>>>  hw/net/e1000.c   | 2 +-
>>>>  hw/net/rtl8139.c | 5 +
>>>>  2 files changed, 2 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>>>> index ec8ecd7..2d60639 100644
>>>> --- a/hw/net/e1000.c
>>>> +++ b/hw/net/e1000.c
>>>> @@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t
>> val)
>>>>
>>>>  s->mac_reg[index] = val;
>>>>
>>>> -if (index == RA + 1) {
>>>> +if (index == RA || index == RA + 1) {
>>>>  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
>>>>  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
>>>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t
>> *)macaddr);
>>>> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
>>>> index 5329f44..7f2b4db 100644
>>>> --- a/hw/net/rtl8139.c
>>>> +++ b/hw/net/rtl8139.c
>>>> @@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque,
>> uint8_t addr, uint32_t val)
>>>>
>>>>  switch (addr)
>>>>  {
>>>> -case MAC0 ... MAC0+4:
>>>> -s->phys[addr - MAC0] = val;
>>>> -break;
>>>> -case MAC0+5:
>>>> +case MAC0 ... MAC0+5:
>>>>  s->phys[addr - MAC0] = val;
>>>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
>>>>  break;
>>>> --
>>>> 1.8.3.1
>>>>




Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 10:02 AM, Michael S. Tsirkin wrote:
> On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote:
>> We currently just update the HMP NIC info when the last bit of macaddr
>> is written. This assumes that guest driver will write all the macaddr
>> from bit 0 to bit 5 when it changes the macaddr, this is the current
>> behavior of linux driver (e1000/rtl8139cp), but we can't do this
>> assumption.
>>
>> The macaddr that is used for rx-filter will be updated when every bit
>> is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
>> info when every bit is changed. It will be same as virtio-net.
>>
>> Signed-off-by: Amos Kong 
>
> Vlad here told me he did some research and this
> does not actually match hardware behaviour
> for either e1000 or rtl8139.
>
> Vlad, would you like to elaborate on-list?

Sure.

I decided to dig into the hardware data-sheets and the OS drivers
that use the HW to see what's really going on and how the
hw expects this data to be programmed.

Here is what I've found so far:
E1000:
   e1000 stores each mac address in 2 registers:
   RAL - receive register low
   RAH - receive register high
   The highest order bit of RAH (bit 31) is called the address available
   bit.  When this bit is set the HW will attempt to use the address for
   packet matching.  So, when the mac address is initially programmed
   into HW, that AV bit is not set until RAH is written.  Thus drivers
   really should do writes in RAL+RAH order, otherwise AV bit would be
   set on a partially set address.
   There is a slight issue when the receive address registers already
   have a value.  Since the address is written in 2 separate writes,
   there is a very small window of time when the RAL is set to the new
   value and RAH is set to the old value with AV still set.  I am
   talking to Intel guys about this now.

   So from the POV of notifying libvirt/management that address is
   changed, it should be done when RAH is set.

RTL8139:
   Realtek devices have a 9346CR Command Register that gates write
   access to certain configuration regions on the HW.  It is placed
   into "Configuration register write enabled" mode before driver can
   write to one of a set of configuration spaces.  Even though
   the data sheet doesn't mention this, it appears that this must
   also must be used to guard write access to receive address register
   of the card.  All variants of BSD and linux drivers that I've found
   use this along with comment that say that this is an undocumented
   requirement.  I am not sure what the HW does to incoming frames when
   the command register is to this mode.
   I see 2 things that we might be able to do here:
 1) A low-impact change might be to only notify the management when
we've detected an address change and currently exiting
"config write" mode.
 2) A more invasive change might be to disable rx_handling while in
"config wirte" mode.  This would prevent attempting to match
packets to a partially written mac address.

   I have a patch that does (1) above.


Thoughts?
-vlad

>
> I think we should revert this for 1.8 and
> look at emulating actual hardware behaviour.
>
>> ---
>>  hw/net/e1000.c   | 2 +-
>>  hw/net/rtl8139.c | 5 +
>>  2 files changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
>> index ec8ecd7..2d60639 100644
>> --- a/hw/net/e1000.c
>> +++ b/hw/net/e1000.c
>> @@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t
val)
>>
>>  s->mac_reg[index] = val;
>>
>> -if (index == RA + 1) {
>> +if (index == RA || index == RA + 1) {
>>  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
>>  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t
*)macaddr);
>> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
>> index 5329f44..7f2b4db 100644
>> --- a/hw/net/rtl8139.c
>> +++ b/hw/net/rtl8139.c
>> @@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque,
uint8_t addr, uint32_t val)
>>
>>  switch (addr)
>>  {
>> -case MAC0 ... MAC0+4:
>> -s->phys[addr - MAC0] = val;
>> -break;
>> -case MAC0+5:
>> +case MAC0 ... MAC0+5:
>>  s->phys[addr - MAC0] = val;
>>  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
>>  break;
>> --
>> 1.8.3.1
>>




Re: [Qemu-devel] [PATCH v2] virtio-net: fix the memory leak in rxfilter_notify()

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 10:32 AM, Amos Kong wrote:
> object_get_canonical_path() returns a gchar*, it should be freeed by the
> caller.
> 
> Signed-off-by: Amos Kong 

Reviewed-by: Vlad Yasevich 

-vlad

> ---
> v2: put gchar *path inside rxfilter_notify_enabled block
> ---
>  hw/net/virtio-net.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 613f144..b75c753 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -200,16 +200,16 @@ static void rxfilter_notify(NetClientState *nc)
>  VirtIONet *n = qemu_get_nic_opaque(nc);
>  
>  if (nc->rxfilter_notify_enabled) {
> +gchar *path = object_get_canonical_path(OBJECT(n->qdev));
>  if (n->netclient_name) {
>  event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> -n->netclient_name,
> -
> object_get_canonical_path(OBJECT(n->qdev)));
> +n->netclient_name, path);
>  } else {
> -event_data = qobject_from_jsonf("{ 'path': %s }",
> -
> object_get_canonical_path(OBJECT(n->qdev)));
> +event_data = qobject_from_jsonf("{ 'path': %s }", path);
>  }
>  monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
>  qobject_decref(event_data);
> +g_free(path);
>  
>  /* disable event notification to avoid events flooding */
>  nc->rxfilter_notify_enabled = 0;
> 




Re: [Qemu-devel] [PATCH] virtio-net: fix the memory leak in rxfilter_notify()

2013-11-18 Thread Vlad Yasevich
On 11/18/2013 08:47 AM, Amos Kong wrote:
> object_get_canonical_path() returns a gchar*, it should be freeed by the
> caller.
> 
> Signed-off-by: Amos Kong 

Reviewed-by: Vlad Yasevich 

-vlad

> ---
>  hw/net/virtio-net.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 613f144..2b2fb57 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -198,15 +198,14 @@ static void rxfilter_notify(NetClientState *nc)
>  {
>  QObject *event_data;
>  VirtIONet *n = qemu_get_nic_opaque(nc);
> +gchar *path = object_get_canonical_path(OBJECT(n->qdev));
>  
>  if (nc->rxfilter_notify_enabled) {
>  if (n->netclient_name) {
>  event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
> -n->netclient_name,
> -
> object_get_canonical_path(OBJECT(n->qdev)));
> +n->netclient_name, path);
>  } else {
> -event_data = qobject_from_jsonf("{ 'path': %s }",
> -
> object_get_canonical_path(OBJECT(n->qdev)));
> +event_data = qobject_from_jsonf("{ 'path': %s }", path);
>  }
>  monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
>  qobject_decref(event_data);
> @@ -214,6 +213,7 @@ static void rxfilter_notify(NetClientState *nc)
>  /* disable event notification to avoid events flooding */
>  nc->rxfilter_notify_enabled = 0;
>  }
> +g_free(path);
>  }
>  
>  static char *mac_strdup_printf(const uint8_t *mac)
> 




[Qemu-devel] [PATCH v2] qom: Fix memory leak in object_property_set_link()

2013-11-15 Thread Vlad Yasevich
Save the result of the call to object_get_cannonical_path()
so we can free it.

Signed-off-by: Vlad Yasevich 
---
v1->v2:  Builds and works :)

 qom/object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index b617f26..fc19cf6 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name,
 void object_property_set_link(Object *obj, Object *value,
   const char *name, Error **errp)
 {
-object_property_set_str(obj, object_get_canonical_path(value),
-name, errp);
+gchar *path = object_get_canonical_path(value);
+object_property_set_str(obj, path, name, errp);
+g_free(path);
 }
 
 Object *object_property_get_link(Object *obj, const char *name,
-- 
1.8.4.2




Re: [Qemu-devel] [PATCH] qom: Fix memory leak in object_property_set_link()

2013-11-15 Thread Vlad Yasevich

On 11/15/2013 12:06 PM, Vlad Yasevich wrote:

Save the result of the call to object_get_cannonical_path()
so we can free it.

Signed-off-by: Vlad Yasevich 
---
  qom/object.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index b617f26..21b6f87 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name,
  void object_property_set_link(Object *obj, Object *value,
const char *name, Error **errp)
  {
-object_property_set_str(obj, object_get_canonical_path(value),
-name, errp);
+gchar *path = object_get_cannonical_path(value);
+object_property_set_str(obj, path, name, errp);
+gfree(path);
  }

  Object *object_property_get_link(Object *obj, const char *name,



Self-nack.  Forgot to commit the changes that actually make it build and 
work :/


-vlad



[Qemu-devel] [PATCH] qom: Fix memory leak in object_property_set_link()

2013-11-15 Thread Vlad Yasevich
Save the result of the call to object_get_cannonical_path()
so we can free it.

Signed-off-by: Vlad Yasevich 
---
 qom/object.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index b617f26..21b6f87 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -838,8 +838,9 @@ char *object_property_get_str(Object *obj, const char *name,
 void object_property_set_link(Object *obj, Object *value,
   const char *name, Error **errp)
 {
-object_property_set_str(obj, object_get_canonical_path(value),
-name, errp);
+gchar *path = object_get_cannonical_path(value);
+object_property_set_str(obj, path, name, errp);
+gfree(path);
 }
 
 Object *object_property_get_link(Object *obj, const char *name,
-- 
1.8.4.2




Re: [Qemu-devel] [PATCH] net: move rxfilter_notify() to net.c

2013-11-15 Thread Vlad Yasevich

On 11/15/2013 10:27 AM, Stefan Hajnoczi wrote:

On Tue, Nov 05, 2013 at 07:00:48PM +0800, Amos Kong wrote:

@@ -545,7 +523,7 @@ static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t 
cmd,
  return VIRTIO_NET_ERR;
  }

-rxfilter_notify(nc);
+rxfilter_notify(nc, object_get_canonical_path(OBJECT(n->qdev)));

  return VIRTIO_NET_OK;
  }

[...]

diff --git a/net/net.c b/net/net.c
index c330c9a..f41a457 100644
--- a/net/net.c
+++ b/net/net.c
@@ -40,6 +40,7 @@
  #include "qapi-visit.h"
  #include "qapi/opts-visitor.h"
  #include "qapi/dealloc-visitor.h"
+#include "qapi/qmp/qjson.h"

  /* Net bridge is currently not supported for W32. */
  #if !defined(_WIN32)
@@ -962,6 +963,25 @@ void print_net_client(Monitor *mon, NetClientState *nc)
 nc->info_str);
  }

+void rxfilter_notify(NetClientState *nc, const char *path)
+{
+QObject *event_data;
+
+if (nc->rxfilter_notify_enabled) {
+if (nc->name) {
+event_data = qobject_from_jsonf("{ 'name': %s, 'path': %s }",
+   nc->name, path);
+} else {
+event_data = qobject_from_jsonf("{ 'path': %s }", path);
+}
+monitor_protocol_event(QEVENT_NIC_RX_FILTER_CHANGED, event_data);
+qobject_decref(event_data);
+
+/* disable event notification to avoid events flooding */
+nc->rxfilter_notify_enabled = 0;
+}
+}


Please fix the memory leak:
object_get_canonical_path() returns a gchar* that the caller must free.



Wow, this memory leak is all over the place and not just in this patch.

-vlad




Re: [Qemu-devel] [PATCH v2] pc: add 1.8 machine type

2013-11-15 Thread Vlad Yasevich

On 11/14/2013 05:37 AM, Michael S. Tsirkin wrote:



-#define PC_Q35_1_7_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS
+#define PC_Q35_1_8_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS
+
+static QEMUMachine pc_q35_machine_v1_8 = {
+PC_Q35_1_8_MACHINE_OPTIONS,
+.name = "pc-q35-1.8",
+.alias = "q35",
+.init = pc_q35_init,
+};
+
+#define PC_Q35_1_7_MACHINE_OPTIONS PC_Q35_1_8_MACHINE_OPTIONS

  static QEMUMachine pc_q35_machine_v1_7 = {
  PC_Q35_1_7_MACHINE_OPTIONS,
  .name = "pc-q35-1.7",
  .alias = "q35",
-.init = pc_q35_init,
+.init = pc_q35_init_1_7,
  };


Hi Michael

Shouldn't the '.alias' be removed from the 1.7 machine?

Thanks
-vlad


-#define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_MACHINE_OPTIONS
+#define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_1_7_MACHINE_OPTIONS

  static QEMUMachine pc_q35_machine_v1_6 = {
  PC_Q35_1_6_MACHINE_OPTIONS,
@@ -313,6 +333,7 @@ static QEMUMachine pc_q35_machine_v1_4 = {

  static void pc_q35_machine_init(void)
  {
+qemu_register_machine(&pc_q35_machine_v1_8);
  qemu_register_machine(&pc_q35_machine_v1_7);
  qemu_register_machine(&pc_q35_machine_v1_6);
  qemu_register_machine(&pc_q35_machine_v1_5);






Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-13 Thread Vlad Yasevich

On 11/13/2013 03:00 PM, Michael S. Tsirkin wrote:

On Wed, Nov 13, 2013 at 11:21:18AM -0500, Vlad Yasevich wrote:

On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote:

On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote:

What about this approach?  This only updates the monitory when all the
bits have been written to.

-vlad



Thanks!
Some comments below.


-- >8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr

>from bit 0 to bit 5 when it changes the macaddr, this is the current

behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.


I would rather say "it seems better not to make this assumption".
This does look somewhat safer than what Amos proposed.



The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich 
---
  hw/net/e1000.c   | 17 -
  hw/net/rtl8139.c | 14 +-
  2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
  #define E1000_FLAG_AUTONEG (1 << E1000_FLAG_AUTONEG_BIT)
  #define E1000_FLAG_MIT (1 << E1000_FLAG_MIT_BIT)
  uint32_t compat_flags;
+uint32_t mac_changed;


Hmm why uint32_t? uint8_t is enough here isn't?

This new state has to be migrated then, and
we need to fallback to old behaviour if migrating to/from
an old version (see compat_flags for one way to
detect this compatibility mode).



Hi Michael

I started looking at migrating this thing and I now starting to question
the whole approach.

The only reason to migrate this is if we can migrate between writes to
the mac address registers.


Absolutely. For some reason below you only discuss cross version
migration but it needs to be migrated for same version migration too.


  We can fairly easily solve the issue of
migrating from net to old versions.


I'm not sure it's easier, but it needs to happen anymore.


  The more interesting question
is migrating from old to new versions.

If someone is migrating from an older version (without this feature)
to a newer version and doing so between writes, the bitmap state will
have no idea that a partial write has already happened.  The completing
write will just set one of the bits and notifications that we are
looking for do not happen.

-vlad


Yep, that's a problem too.


For 1.8 just send the bitmap across.


Right.  That's simple enough.



For cross version migration I would say we should just detect -M 1.7
and older and just emulate old behaviour, disregard the bitmap
completely.
Don't do special tricks just for migration.



The compat flags allow for simple migration to 1.7.  However, it's the
migration from 1.7 to 1.8 that's the issue.

There is a version number on the E1000 state and I am thinking of maybe
bumping that so that we can catch older state that doesn't support mac
sate change.  Thoughts?

-vlad




Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-13 Thread Vlad Yasevich

On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote:

On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote:

What about this approach?  This only updates the monitory when all the
bits have been written to.

-vlad



Thanks!
Some comments below.


-- >8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.


I would rather say "it seems better not to make this assumption".
This does look somewhat safer than what Amos proposed.



The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich 
---
  hw/net/e1000.c   | 17 -
  hw/net/rtl8139.c | 14 +-
  2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
  #define E1000_FLAG_AUTONEG (1 << E1000_FLAG_AUTONEG_BIT)
  #define E1000_FLAG_MIT (1 << E1000_FLAG_MIT_BIT)
  uint32_t compat_flags;
+uint32_t mac_changed;


Hmm why uint32_t? uint8_t is enough here isn't?

This new state has to be migrated then, and
we need to fallback to old behaviour if migrating to/from
an old version (see compat_flags for one way to
detect this compatibility mode).



Hi Michael

I started looking at migrating this thing and I now starting to question
the whole approach.

The only reason to migrate this is if we can migrate between writes to
the mac address registers.  We can fairly easily solve the issue of
migrating from net to old versions.  The more interesting question
is migrating from old to new versions.

If someone is migrating from an older version (without this feature)
to a newer version and doing so between writes, the bitmap state will
have no idea that a partial write has already happened.  The completing
write will just set one of the bits and notifications that we are
looking for do not happen.

-vlad




Re: [Qemu-devel] [PATCH] virtio-net: don't update mac_table in error state

2013-11-12 Thread Vlad Yasevich

On 11/10/2013 10:48 PM, Amos Kong wrote:

mac_table was always cleaned up first in handling
VIRTIO_NET_CTRL_MAC_TABLE_SET command, and we din't recover
mac_table content in error state, it's not correct.

This patch makes all the changes in temporal variables,
only update the real mac_table if everything is ok.
We won't change mac_table in error state, so rxfilter
notification isn't needed.

This patch also fixed same problame in
  http://lists.nongnu.org/archive/html/qemu-devel/2013-11/msg01188.html
  (not merge)

I will send patch for virtio spec to clarifying this change.

Signed-off-by: Amos Kong 
---
  hw/net/virtio-net.c | 34 --
  1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 22dbd05..880fa4a 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -610,11 +610,11 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t 
cmd,
  return VIRTIO_NET_ERR;
  }

-n->mac_table.in_use = 0;
-n->mac_table.first_multi = 0;
-n->mac_table.uni_overflow = 0;
-n->mac_table.multi_overflow = 0;
-memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
+int in_use = 0;
+int first_multi = 0;
+uint8_t uni_overflow = 0;
+uint8_t multi_overflow = 0;
+uint8_t *macs = g_malloc0(MAC_TABLE_ENTRIES * ETH_ALEN);


I am not sure what the practice is for mixing declarations with code in 
qemu.  Can't find anything in the docs.


Otherwise, looks good to me.

Reviewed-by: Vlad Yasevich 

-vlad



  s = iov_to_buf(iov, iov_cnt, 0, &mac_data.entries,
 sizeof(mac_data.entries));
@@ -629,19 +629,19 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t 
cmd,
  }

  if (mac_data.entries <= MAC_TABLE_ENTRIES) {
-s = iov_to_buf(iov, iov_cnt, 0, n->mac_table.macs,
+s = iov_to_buf(iov, iov_cnt, 0, macs,
 mac_data.entries * ETH_ALEN);
  if (s != mac_data.entries * ETH_ALEN) {
  goto error;
  }
-n->mac_table.in_use += mac_data.entries;
+in_use += mac_data.entries;
  } else {
-n->mac_table.uni_overflow = 1;
+uni_overflow = 1;
  }

  iov_discard_front(&iov, &iov_cnt, mac_data.entries * ETH_ALEN);

-n->mac_table.first_multi = n->mac_table.in_use;
+first_multi = in_use;

  s = iov_to_buf(iov, iov_cnt, 0, &mac_data.entries,
 sizeof(mac_data.entries));
@@ -656,23 +656,29 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t 
cmd,
  goto error;
  }

-if (n->mac_table.in_use + mac_data.entries <= MAC_TABLE_ENTRIES) {
-s = iov_to_buf(iov, iov_cnt, 0, n->mac_table.macs,
+if (in_use + mac_data.entries <= MAC_TABLE_ENTRIES) {
+s = iov_to_buf(iov, iov_cnt, 0, &macs[in_use * ETH_ALEN],
 mac_data.entries * ETH_ALEN);
  if (s != mac_data.entries * ETH_ALEN) {
  goto error;
  }
-n->mac_table.in_use += mac_data.entries;
+in_use += mac_data.entries;
  } else {
-n->mac_table.multi_overflow = 1;
+multi_overflow = 1;
  }

+n->mac_table.in_use = in_use;
+n->mac_table.first_multi = first_multi;
+n->mac_table.uni_overflow = uni_overflow;
+n->mac_table.multi_overflow = multi_overflow;
+memcpy(n->mac_table.macs, macs, MAC_TABLE_ENTRIES * ETH_ALEN);
+g_free(macs);
  rxfilter_notify(nc);

  return VIRTIO_NET_OK;

  error:
-rxfilter_notify(nc);
+g_free(macs);
  return VIRTIO_NET_ERR;
  }







Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-12 Thread Vlad Yasevich

On 11/08/2013 10:43 PM, Amos Kong wrote:

On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote:

What about this approach?  This only updates the monitory when all the
bits have been written to.


Hi Vlad,

Looks good to me.

Using this patch, we don't need to care the writing order.
If we add event notify in future, it can reduce the noise
event.

BTW, can we use a tmp buffer to record the new mac (changing is unfinished),
and write the new mac to registers until all bits is written?


I've thought about this as well, but I am not sure what it would buy us
and what's the best way to do it.

At first I thought that we could have a function local static that we
could accumulate into.  But then Michael mentioned migration and what
happens if we migrate in middle of the mac change?

So we put into the device, but then it isn't much different then what
we have now with the possible exception that the mac change happens
slightly "more atomically". :)  For this, it might make more sense to
temporarily stop the link when the mac address change is happening.

-vlad



Amos


-vlad

-- >8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written


Subject: [PATCH] e1000/rtl8139: update HMP NIC when all bits are written


We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.

The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich 
---
  hw/net/e1000.c   | 17 -
  hw/net/rtl8139.c | 14 +-
  2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
  #define E1000_FLAG_AUTONEG (1 << E1000_FLAG_AUTONEG_BIT)
  #define E1000_FLAG_MIT (1 << E1000_FLAG_MIT_BIT)
  uint32_t compat_flags;
+uint32_t mac_changed;
+#define E1000_RA0_CHANGED 0
+#define E1000_RA1_CHANGED 1
+#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED)
  } E1000State;

  #define TYPE_E1000 "e1000"
@@ -402,6 +406,7 @@ static void e1000_reset(void *opaque)
  d->mac_reg[RA + 1] |= (i < 2) ? macaddr[i + 4] << (8 * i) : 0;
  }
  qemu_format_nic_info_str(qemu_get_queue(d->nic), macaddr);
+d->mac_changed = 0;
  }

  static void
@@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val)

  s->mac_reg[index] = val;

-if (index == RA + 1) {
+switch (index) {
+case RA:
+s->mac_changed |= E1000_RA0_CHANGED;
+break;
+case (RA + 1):
+s->mac_changed |= E1000_RA1_CHANGED;
+break;
+}
+
+if (s->mac_changed ==  E1000_RA_ALL_CHANGED) {
  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t *)macaddr);
+   s->mac_changed = 0;
  }
  }

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 5329f44..6dac10c 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -476,6 +476,8 @@ typedef struct RTL8139State {

  uint16_t CpCmd;
  uint8_t  TxThresh;
+uint8_t  mac_changed;
+#define RTL8139_MAC_CHANGED_ALL 0x3F

  NICState *nic;
  NICConf conf;
@@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d)
  /* restore MAC address */
  memcpy(s->phys, s->conf.macaddr.a, 6);
  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed = 0;

  /* reset interrupt mask */
  s->IntrStatus = 0;
@@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)

  switch (addr)
  {
-case MAC0 ... MAC0+4:
-s->phys[addr - MAC0] = val;
-break;
-case MAC0+5:
+case MAC0 ... MAC0+5:
  s->phys[addr - MAC0] = val;
-qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed |= (1 << (addr - MAC0));
+if (s->mac_changed == RTL8139_MAC_CHANGED_ALL) {
+qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed = 0;
+}
  break;
  case MAC0+6 ... MAC0+7:
  /* reserved */
--
1.8.4.2





Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-12 Thread Vlad Yasevich

On 11/10/2013 07:11 AM, Michael S. Tsirkin wrote:

On Fri, Nov 08, 2013 at 02:42:27PM -0500, Vlad Yasevich wrote:

What about this approach?  This only updates the monitory when all the
bits have been written to.

-vlad



Thanks!
Some comments below.


-- >8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.


I would rather say "it seems better not to make this assumption".
This does look somewhat safer than what Amos proposed.



The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich 
---
  hw/net/e1000.c   | 17 -
  hw/net/rtl8139.c | 14 +-
  2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
  #define E1000_FLAG_AUTONEG (1 << E1000_FLAG_AUTONEG_BIT)
  #define E1000_FLAG_MIT (1 << E1000_FLAG_MIT_BIT)
  uint32_t compat_flags;
+uint32_t mac_changed;


Hmm why uint32_t? uint8_t is enough here isn't?


Yes, that should be fine.  I'll fix and find a better spot to
put it. (may be a hole in the struct).



This new state has to be migrated then, and
we need to fallback to old behaviour if migrating to/from
an old version (see compat_flags for one way to
detect this compatibility mode).


Good point.  Didn't think of that...




+#define E1000_RA0_CHANGED 0
+#define E1000_RA1_CHANGED 1
+#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED)


I don't get it. So E1000_RA_ALL_CHANGED is 0 | 1 == 1.
it looks like the trigger is when E1000_RA1_CHANGED
so that's more or less equivalent.


Goofed on the bits.  That should have been (1<<0) and (1<<1).




  } E1000State;

  #define TYPE_E1000 "e1000"
@@ -402,6 +406,7 @@ static void e1000_reset(void *opaque)
  d->mac_reg[RA + 1] |= (i < 2) ? macaddr[i + 4] << (8 * i) : 0;
  }
  qemu_format_nic_info_str(qemu_get_queue(d->nic), macaddr);
+d->mac_changed = 0;
  }

  static void
@@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val)

  s->mac_reg[index] = val;

-if (index == RA + 1) {
+switch (index) {
+case RA:
+s->mac_changed |= E1000_RA0_CHANGED;
+break;
+case (RA + 1):
+s->mac_changed |= E1000_RA1_CHANGED;
+break;
+}
+
+if (s->mac_changed ==  E1000_RA_ALL_CHANGED) {


Some whitespace damage here.


  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t *)macaddr);
+   s->mac_changed = 0;


Need to use spaces for indent in qemu.


  }
  }

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c


Best to split out in a separate commit.


OK




index 5329f44..6dac10c 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -476,6 +476,8 @@ typedef struct RTL8139State {

  uint16_t CpCmd;
  uint8_t  TxThresh;
+uint8_t  mac_changed;


This new state has to be migrated then, and
we need to fallback to old behaviour if migrating to/from
an old version.


+#define RTL8139_MAC_CHANGED_ALL 0x3F

  NICState *nic;
  NICConf conf;
@@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d)
  /* restore MAC address */
  memcpy(s->phys, s->conf.macaddr.a, 6);
  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed = 0;

  /* reset interrupt mask */
  s->IntrStatus = 0;
@@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)

  switch (addr)
  {
-case MAC0 ... MAC0+4:
-s->phys[addr - MAC0] = val;
-break;
-case MAC0+5:
+case MAC0 ... MAC0+5:
  s->phys[addr - MAC0] = val;
-qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed |= (1 << (addr - MAC0));


Better drop the external () here otherwise it starts looking like lisp :)


Heh.  OK.

Thanks
-vlad




+if (s->mac_changed == RTL8139_MAC_CHANGED_ALL) {
+qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed = 0;
+}
  break;
  case MAC0+6 ... MAC0+7:
  /* reserved */
--
1.8.4.2





[Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-08 Thread Vlad Yasevich
What about this approach?  This only updates the monitory when all the
bits have been written to.

-vlad

-- >8 --
Subject: [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.

The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit has been changed. It will be same as virtio-net.

Signed-off-by: Vlad Yasevich 
---
 hw/net/e1000.c   | 17 -
 hw/net/rtl8139.c | 14 +-
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..a5967ed 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
 #define E1000_FLAG_AUTONEG (1 << E1000_FLAG_AUTONEG_BIT)
 #define E1000_FLAG_MIT (1 << E1000_FLAG_MIT_BIT)
 uint32_t compat_flags;
+uint32_t mac_changed;
+#define E1000_RA0_CHANGED 0
+#define E1000_RA1_CHANGED 1
+#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED)
 } E1000State;
 
 #define TYPE_E1000 "e1000"
@@ -402,6 +406,7 @@ static void e1000_reset(void *opaque)
 d->mac_reg[RA + 1] |= (i < 2) ? macaddr[i + 4] << (8 * i) : 0;
 }
 qemu_format_nic_info_str(qemu_get_queue(d->nic), macaddr);
+d->mac_changed = 0;
 }
 
 static void
@@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val)
 
 s->mac_reg[index] = val;
 
-if (index == RA + 1) {
+switch (index) {
+case RA:
+s->mac_changed |= E1000_RA0_CHANGED;
+break;
+case (RA + 1):
+s->mac_changed |= E1000_RA1_CHANGED;
+break;
+}
+
+if (s->mac_changed ==  E1000_RA_ALL_CHANGED) {
 macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
 macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
 qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t *)macaddr);
+   s->mac_changed = 0;
 }
 }
 
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 5329f44..6dac10c 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -476,6 +476,8 @@ typedef struct RTL8139State {
 
 uint16_t CpCmd;
 uint8_t  TxThresh;
+uint8_t  mac_changed;
+#define RTL8139_MAC_CHANGED_ALL 0x3F
 
 NICState *nic;
 NICConf conf;
@@ -1215,6 +1217,7 @@ static void rtl8139_reset(DeviceState *d)
 /* restore MAC address */
 memcpy(s->phys, s->conf.macaddr.a, 6);
 qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed = 0;
 
 /* reset interrupt mask */
 s->IntrStatus = 0;
@@ -2741,12 +2744,13 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)
 
 switch (addr)
 {
-case MAC0 ... MAC0+4:
-s->phys[addr - MAC0] = val;
-break;
-case MAC0+5:
+case MAC0 ... MAC0+5:
 s->phys[addr - MAC0] = val;
-qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed |= (1 << (addr - MAC0));
+if (s->mac_changed == RTL8139_MAC_CHANGED_ALL) {
+qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
+s->mac_changed = 0;
+}
 break;
 case MAC0+6 ... MAC0+7:
 /* reserved */
-- 
1.8.4.2




Re: [Qemu-devel] [PATCH] virtio-net: Correctly store multicast filter entries

2013-11-08 Thread Vlad Yasevich

On 11/08/2013 11:07 AM, Vlad Yasevich wrote:

Commit 921ac5d0f3a0df869db5ce4edf752f51d8b1596a
 virtio-net: remove layout assumptions for ctrl vq
introduced a regression where the multicast address filter
entries are written to the beginning of the mac table array,
thus overwriting any unicast addresses that may have been
programmed in the filter.
The multicast addresses should be written after all the
unicast addresses.

Signed-off-by: Vlad Yasevich 


Please ignore.  Just saw pull pull request with similar patch.

-vlad


---
  hw/net/virtio-net.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 22dbd05..94e8b68 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -657,7 +657,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
  }

  if (n->mac_table.in_use + mac_data.entries <= MAC_TABLE_ENTRIES) {
-s = iov_to_buf(iov, iov_cnt, 0, n->mac_table.macs,
+s = iov_to_buf(iov, iov_cnt, 0,
+   n->mac_table.macs + (n->mac_table.in_use * ETH_ALEN),
 mac_data.entries * ETH_ALEN);
  if (s != mac_data.entries * ETH_ALEN) {
  goto error;






[Qemu-devel] [PATCH] virtio-net: Correctly store multicast filter entries

2013-11-08 Thread Vlad Yasevich
Commit 921ac5d0f3a0df869db5ce4edf752f51d8b1596a
 virtio-net: remove layout assumptions for ctrl vq
introduced a regression where the multicast address filter
entries are written to the beginning of the mac table array,
thus overwriting any unicast addresses that may have been
programmed in the filter.
The multicast addresses should be written after all the
unicast addresses.

Signed-off-by: Vlad Yasevich 
---
 hw/net/virtio-net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 22dbd05..94e8b68 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -657,7 +657,8 @@ static int virtio_net_handle_mac(VirtIONet *n, uint8_t cmd,
 }
 
 if (n->mac_table.in_use + mac_data.entries <= MAC_TABLE_ENTRIES) {
-s = iov_to_buf(iov, iov_cnt, 0, n->mac_table.macs,
+s = iov_to_buf(iov, iov_cnt, 0,
+   n->mac_table.macs + (n->mac_table.in_use * ETH_ALEN),
mac_data.entries * ETH_ALEN);
 if (s != mac_data.entries * ETH_ALEN) {
 goto error;
-- 
1.8.4.2




[Qemu-devel] [PATCH] qdev-properties-system.c: Allow vlan or netdev for -device, not both

2013-11-07 Thread Vlad Yasevich
It is currently possible to specify things like:
-device e1000,netdev=foo,vlan=1
With this usage, whichever argument was specified last (vlan or netdev)
overwrites what was previousely set and results in a non-working
configuration.  Even worse, when used with multiqueue devices,
it causes a segmentation fault on exit in qemu_free_net_client.

That patch treates the above command line options as invalid and
generates an error at start-up.

Signed-off-by: Vlad Yasevich 
---
 hw/core/qdev-properties-system.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 0eada32..729efa8 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -205,6 +205,11 @@ static int parse_netdev(DeviceState *dev, const char *str, 
void **ptr)
 goto err;
 }
 
+if (ncs[i]) {
+ret = -EINVAL;
+goto err;
+}
+
 ncs[i] = peers[i];
 ncs[i]->queue_index = i;
 }
@@ -301,6 +306,10 @@ static void set_vlan(Object *obj, Visitor *v, void *opaque,
 *ptr = NULL;
 return;
 }
+if (*ptr) {
+error_set_from_qdev_prop_error(errp, -EINVAL, dev, prop, name);
+return;
+}
 
 hubport = net_hub_port_find(id);
 if (!hubport) {
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-07 Thread Vlad Yasevich

On 11/07/2013 09:33 AM, Alex Williamson wrote:

On Thu, 2013-11-07 at 12:26 +0200, Michael S. Tsirkin wrote:

On Thu, Nov 07, 2013 at 03:32:29PM +0800, Amos Kong wrote:

On Thu, Nov 07, 2013 at 08:59:22AM +0200, Michael S. Tsirkin wrote:

On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote:

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.

The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit is changed. It will be same as virtio-net.

Signed-off-by: Amos Kong 


I'm not sure I buy this.

If we actually implement e.g. mac change notifications,
sending them on writes of random bytes will confuse
the host.


This patch just effects the monitor display of macaddr.
During each writing, the macaddr is used for rx-filter is really
changed.

In the real hardware, it supports to just write part of bits,
the rx-filtering is effected by every bit writing.


Yes but again, the window can just be too small to matter
on real hardware.

Our emulation is not perfect, fixing this to be just like real
hardware just might expose other bugs we can't fix
that easily.


If we were to implement mac change notification, then every partial
update would send a notify and the host.  Is that a problem?  It seems
no different than if the device had an atomic mac update procedure and
the guest admin changed the mac several times.


Yes, but the issue is exercebated in the non-atomic case.  RTL8139
is the worst since it has byte oriented writes so that for ever mac
change, we have the worst case potential to generate 6 notificates
(assuming libvirt is so fast as to pick up ever change).

Additionally, once libvirt is updated, this would cause rather serious
churn on the host as for ever update, libvirt is going to push the
address down to the physical host nic and the fewer of these updates
we do the better.



The problem with assuming that a given byte is always written last is
that unless the hardware spec identifies an order, we're just basing our
code on examples where we know what the guest driver does, either by
code inspection or access tracing.  If there are examples of guests that
write the last byte first, then the host will never know the correct mac
address.  Maybe there are no guests that use the wrong order, but that's
a pretty exhaustive search.

The patch doesn't change anything about how the NIC operates, only when
mac changes are updated.  During an update the mac is in a transitory
state and we can't go back in time to make it atomic on this hardware
design to avoid a window where the wrong mac may be seen.  I think the
patch is the right thing to do.  Thanks,



Reporting half-complete state is not the right thing, IMO.  Right now,
it's doesn't have much impact, but if we start writing these addresses
to the host nic, then this will have a much bigger impact as I said above.

-vlad


Alex


I would say let's leave e1000/rtl8139 well alone unless
we see guests that actually write mac without touching
the last byte.


At least, linux rtl8139cp/e1000 writes macaddr from bit 0 to bit 5.
It works to just watch the last bit.

Thanks, Amos


Then think of ways to detect when mac change is done
for these.


---
  hw/net/e1000.c   | 2 +-
  hw/net/rtl8139.c | 5 +
  2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index ec8ecd7..2d60639 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1110,7 +1110,7 @@ mac_writereg(E1000State *s, int index, uint32_t val)

  s->mac_reg[index] = val;

-if (index == RA + 1) {
+if (index == RA || index == RA + 1) {
  macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
  macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
  qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t *)macaddr);
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 5329f44..7f2b4db 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -2741,10 +2741,7 @@ static void rtl8139_io_writeb(void *opaque, uint8_t 
addr, uint32_t val)

  switch (addr)
  {
-case MAC0 ... MAC0+4:
-s->phys[addr - MAC0] = val;
-break;
-case MAC0+5:
+case MAC0 ... MAC0+5:
  s->phys[addr - MAC0] = val;
  qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
  break;
--
1.8.3.1












Re: [Qemu-devel] [PATCH] e1000/rtl8139: update HMP NIC when every bit is written

2013-11-07 Thread Vlad Yasevich

On 11/07/2013 10:27 AM, Michael S. Tsirkin wrote:

On Thu, Nov 07, 2013 at 07:33:57AM -0700, Alex Williamson wrote:

On Thu, 2013-11-07 at 12:26 +0200, Michael S. Tsirkin wrote:

On Thu, Nov 07, 2013 at 03:32:29PM +0800, Amos Kong wrote:

On Thu, Nov 07, 2013 at 08:59:22AM +0200, Michael S. Tsirkin wrote:

On Tue, Nov 05, 2013 at 07:17:18PM +0800, Amos Kong wrote:

We currently just update the HMP NIC info when the last bit of macaddr
is written. This assumes that guest driver will write all the macaddr
from bit 0 to bit 5 when it changes the macaddr, this is the current
behavior of linux driver (e1000/rtl8139cp), but we can't do this
assumption.

The macaddr that is used for rx-filter will be updated when every bit
is changed. This patch updates the e1000/rtl8139 nic to update HMP NIC
info when every bit is changed. It will be same as virtio-net.

Signed-off-by: Amos Kong 


I'm not sure I buy this.

If we actually implement e.g. mac change notifications,
sending them on writes of random bytes will confuse
the host.


This patch just effects the monitor display of macaddr.
During each writing, the macaddr is used for rx-filter is really
changed.

In the real hardware, it supports to just write part of bits,
the rx-filtering is effected by every bit writing.


Yes but again, the window can just be too small to matter
on real hardware.

Our emulation is not perfect, fixing this to be just like real
hardware just might expose other bugs we can't fix
that easily.


If we were to implement mac change notification, then every partial
update would send a notify and the host.  Is that a problem?  It seems
no different than if the device had an atomic mac update procedure and
the guest admin changed the mac several times.


I think modern nics make address updates atomic.
Problem is, we are emulating an ancient one,
so to make host configuration of a modern one
reasonable we need to resort to tricks.


The problem with assuming that a given byte is always written last is
that unless the hardware spec identifies an order, we're just basing our
code on examples where we know what the guest driver does, either by
code inspection or access tracing.  If there are examples of guests that
write the last byte first, then the host will never know the correct mac
address.  Maybe there are no guests that use the wrong order, but that's
a pretty exhaustive search.


I agree what we have is a hack. Maybe we need some better hacks.
For example, maybe we should update mac at link state change
(I think  link is always down when mac is updated?).
Needs some thought.


I thought this too, but checking recent linux kernel, e1000 and rtl8139 
seem to allow live mac change so link is up.


So here is a stupid, untested patch for e1000 to notify only once:
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8387443..b99eba4 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -149,6 +149,10 @@ typedef struct E1000State_st {
 #define E1000_FLAG_AUTONEG (1 << E1000_FLAG_AUTONEG_BIT)
 #define E1000_FLAG_MIT (1 << E1000_FLAG_MIT_BIT)
 uint32_t compat_flags;
+uint32_t mac_change_flags;
+#define E1000_RA0_CHANGED 0
+#define E1000_RA1_CHANGED 1
+#define E1000_RA_ALL_CHANGED (E1000_RA0_CHANGED|E1000_RA1_CHANGED)
 } E1000State;

 #define TYPE_E1000 "e1000"
@@ -402,6 +406,7 @@ static void e1000_reset(void *opaque)
 d->mac_reg[RA + 1] |= (i < 2) ? macaddr[i + 4] << (8 * i) : 0;
 }
 qemu_format_nic_info_str(qemu_get_queue(d->nic), macaddr);
+d->mac_change_flags = 0;
 }

 static void
@@ -1106,10 +,20 @@ mac_writereg(E1000State *s, int index, uint32_t val)

 s->mac_reg[index] = val;

-if (index == RA + 1) {
+switch (index) {
+   case RA:
+   s->mac_change_flags |= E1000_RA0_CHANGED;
+   break;
+   case (RA + 1):
+   s->mac_change_flags |= E1000_RA1_CHANGED;
+   break;
+}
+
+if (!(s->mac_change_flags ^ E1000_RA_ALL_CHANGED)) {
 macaddr[0] = cpu_to_le32(s->mac_reg[RA]);
 macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]);
 qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t 
*)macaddr);

+s->mac_change_flags = 0;
 }
 }

Any thoughts?

-vlad