Re: [PATCH 1/6] Drivers: hv: kvp: fix IP Failover

2016-03-29 Thread Vitaly Kuznetsov
"K. Y. Srinivasan" <k...@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
>
> Hyper-V VMs can be replicated to another hosts and there is a feature to
> set different IP for replicas, it is called 'Failover TCP/IP'. When
> such guest starts Hyper-V host sends it KVP_OP_SET_IP_INFO message as soon
> as we finish negotiation procedure. The problem is that it can happen (and
> it actually happens) before userspace daemon connects and we reply with
> HV_E_FAIL to the message. As there are no repetitions we fail to set the
> requested IP.
>
> Solve the issue by postponing our reply to the negotiation message till
> userspace daemon is connected. We can't wait too long as there is a
> host-side timeout (cca. 75 seconds) and if we fail to reply in this time
> frame the whole KVP service will become inactive. The solution is not
> ideal - if it takes userspace daemon more than 60 seconds to connect
> IP Failover will still fail but I don't see a solution with our current
> separation between kernel and userspace parts.
>
> Other two modules (VSS and FCOPY) don't require such delay, leave them
> untouched.
>
> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>

An issue was found with this patch: we don't cancel
kvp_host_handshake_work on module unload and if the work is still
pending we end up crashing. As far as I can see this series didn't make
it to char-misc tree so instead of sending a patch to this patch I'll
send v2 for the original issue.

Sorry for the inconvenience.

> ---
>  drivers/hv/hv_kvp.c   |   30 ++
>  drivers/hv/hyperv_vmbus.h |5 +
>  2 files changed, 35 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
> index 9b9b370..0d3fcd6 100644
> --- a/drivers/hv/hv_kvp.c
> +++ b/drivers/hv/hv_kvp.c
> @@ -78,9 +78,11 @@ static void kvp_send_key(struct work_struct *dummy);
>
>  static void kvp_respond_to_host(struct hv_kvp_msg *msg, int error);
>  static void kvp_timeout_func(struct work_struct *dummy);
> +static void kvp_host_handshake_func(struct work_struct *dummy);
>  static void kvp_register(int);
>
>  static DECLARE_DELAYED_WORK(kvp_timeout_work, kvp_timeout_func);
> +static DECLARE_DELAYED_WORK(kvp_host_handshake_work, 
> kvp_host_handshake_func);
>  static DECLARE_WORK(kvp_sendkey_work, kvp_send_key);
>
>  static const char kvp_devname[] = "vmbus/hv_kvp";
> @@ -130,6 +132,11 @@ static void kvp_timeout_func(struct work_struct *dummy)
>   hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
>  }
>
> +static void kvp_host_handshake_func(struct work_struct *dummy)
> +{
> + hv_poll_channel(kvp_transaction.recv_channel, hv_kvp_onchannelcallback);
> +}
> +
>  static int kvp_handle_handshake(struct hv_kvp_msg *msg)
>  {
>   switch (msg->kvp_hdr.operation) {
> @@ -154,6 +161,12 @@ static int kvp_handle_handshake(struct hv_kvp_msg *msg)
>   pr_debug("KVP: userspace daemon ver. %d registered\n",
>KVP_OP_REGISTER);
>   kvp_register(dm_reg_value);
> +
> + /*
> +  * If we're still negotiating with the host cancel the timeout
> +  * work to not poll the channel twice.
> +  */
> + cancel_delayed_work_sync(_host_handshake_work);
>   hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
>
>   return 0;
> @@ -594,7 +607,22 @@ void hv_kvp_onchannelcallback(void *context)
>   struct icmsg_negotiate *negop = NULL;
>   int util_fw_version;
>   int kvp_srv_version;
> + static enum {NEGO_NOT_STARTED,
> +  NEGO_IN_PROGRESS,
> +  NEGO_FINISHED} host_negotiatied = NEGO_NOT_STARTED;
>
> + if (host_negotiatied == NEGO_NOT_STARTED &&
> + kvp_transaction.state < HVUTIL_READY) {
> + /*
> +  * If userspace daemon is not connected and host is asking
> +  * us to negotiate we need to delay to not lose messages.
> +  * This is important for Failover IP setting.
> +  */
> + host_negotiatied = NEGO_IN_PROGRESS;
> + schedule_delayed_work(_host_handshake_work,
> +   HV_UTIL_NEGO_TIMEOUT * HZ);
> + return;
> + }
>   if (kvp_transaction.state > HVUTIL_READY)
>   return;
>
> @@ -672,6 +700,8 @@ void hv_kvp_onchannelcallback(void *context)
>   vmbus_sendpacket(channel, recv_buffer,
>  recvlen, requestid,
>  VM_PKT_DATA_INBAND, 0);
> +
> + ho

Re: KVP daemon/module race on module unloading

2016-03-24 Thread Vitaly Kuznetsov
Dexuan Cui  writes:

> Hi Vitaly,
> We got the below warning once -- it's pretty hard to repro it.
>
> I think the cause may be: before the daemon is terminated, we can successfully
> "rmmod hv_utils" -- as a result all the code/data segments of the module
> shouldn't be accessed any more,  but in the kernel we still has a dangling
> pointer set up by hvt_op_poll() -> poll_wait(file, >outmsg_q, wait);
>
> Finally, when the daemon is terminated, the call trace could happen.
>
> I think the module should never be unloaded when there is a fd opened
> on /dev/vmbus/hv_kvp?

Yes, I was pretty sure .owner = THIS_MODULE does that.

There was a fix from Alex a month ago:

commit e66853b09017a788dc384dadce9323396dae3293
Author: Alex Ng 
Date:   Fri Feb 26 15:13:20 2016 -0800

Drivers: hv: utils: Remove util transport handler from list if
registration fails

Do you have it in your kernel? (though I think it only applies if
transport registration fails...)

I'll take a closer look anyway, thanks for the report!

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2] Drivers: hv: vmbus: handle various crash scenarios

2016-03-22 Thread Vitaly Kuznetsov
Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
delivered to the CPU which was used for initial contact or to CPU0
depending on host version. vmbus_wait_for_unload() doesn't account for
the fact that in case we're crashing on some other CPU we won't get the
CHANNELMSG_UNLOAD_RESPONSE message and our wait on the current CPU will
never end.

Do the following:
1) Check for completion_done() in the loop. In case interrupt handler is
   still alive we'll get the confirmation we need.

2) Read message pages for all CPUs message page as we're unsure where
   CHANNELMSG_UNLOAD_RESPONSE is going to be delivered to. We can race with
   still-alive interrupt handler doing the same, add cmpxchg() to
   vmbus_signal_eom() to not lose CHANNELMSG_UNLOAD_RESPONSE message.

3) Cleanup message pages on all CPUs. This is required (at least for the
   current CPU as we're clearing CPU0 messages now but we may want to bring
   up additional CPUs on crash) as new messages won't be delivered till we
   consume what's pending. On boot we'll place message pages somewhere else
   and we won't be able to read stale messages.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel_mgmt.c | 58 +++
 drivers/hv/hyperv_vmbus.h | 16 +++--
 drivers/hv/vmbus_drv.c|  7 +++---
 3 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 38b682ba..b6c1211 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -597,27 +597,55 @@ static void init_vp_index(struct vmbus_channel *channel, 
u16 dev_type)
 
 static void vmbus_wait_for_unload(void)
 {
-   int cpu = smp_processor_id();
-   void *page_addr = hv_context.synic_message_page[cpu];
-   struct hv_message *msg = (struct hv_message *)page_addr +
- VMBUS_MESSAGE_SINT;
+   int cpu;
+   void *page_addr;
+   struct hv_message *msg;
struct vmbus_channel_message_header *hdr;
-   bool unloaded = false;
+   u32 message_type;
 
+   /*
+* CHANNELMSG_UNLOAD_RESPONSE is always delivered to the CPU which was
+* used for initial contact or to CPU0 depending on host version. When
+* we're crashing on a different CPU let's hope that IRQ handler on
+* the cpu which receives CHANNELMSG_UNLOAD_RESPONSE is still
+* functional and vmbus_unload_response() will complete
+* vmbus_connection.unload_event. If not, the last thing we can do is
+* read message pages for all CPUs directly.
+*/
while (1) {
-   if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
-   mdelay(10);
-   continue;
-   }
+   if (completion_done(_connection.unload_event))
+   break;
 
-   hdr = (struct vmbus_channel_message_header *)msg->u.payload;
-   if (hdr->msgtype == CHANNELMSG_UNLOAD_RESPONSE)
-   unloaded = true;
+   for_each_online_cpu(cpu) {
+   page_addr = hv_context.synic_message_page[cpu];
+   msg = (struct hv_message *)page_addr +
+   VMBUS_MESSAGE_SINT;
 
-   vmbus_signal_eom(msg);
+   message_type = READ_ONCE(msg->header.message_type);
+   if (message_type == HVMSG_NONE)
+   continue;
 
-   if (unloaded)
-   break;
+   hdr = (struct vmbus_channel_message_header *)
+   msg->u.payload;
+
+   if (hdr->msgtype == CHANNELMSG_UNLOAD_RESPONSE)
+   complete(_connection.unload_event);
+
+   vmbus_signal_eom(msg, message_type);
+   }
+
+   mdelay(10);
+   }
+
+   /*
+* We're crashing and already got the UNLOAD_RESPONSE, cleanup all
+* maybe-pending messages on all CPUs to be able to receive new
+* messages after we reconnect.
+*/
+   for_each_online_cpu(cpu) {
+   page_addr = hv_context.synic_message_page[cpu];
+   msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
+   msg->header.message_type = HVMSG_NONE;
}
 }
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 12321b9..113a8f5 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -620,9 +620,21 @@ extern struct vmbus_channel_message_table_entry
channel_message_table[CHANNELMSG_COUNT];
 
 /* Free the message slot and signal end-of-message if required */
-static inline void vmbus_signal_eom(struct hv_message *msg)
+static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 {
-   msg->he

Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

2016-03-22 Thread Vitaly Kuznetsov
KY Srinivasan <k...@microsoft.com> writes:

>> -Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Monday, March 21, 2016 12:52 AM
>> To: KY Srinivasan <k...@microsoft.com>
>> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; Haiyang
>> Zhang <haiya...@microsoft.com>; Alex Ng (LIS) <ale...@microsoft.com>;
>> Radim Krcmar <rkrc...@redhat.com>; Cathy Avery <cav...@redhat.com>
>> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> 
>> KY Srinivasan <k...@microsoft.com> writes:
>> 
>> >> -Original Message-
>> >> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> >> Sent: Friday, March 18, 2016 5:33 AM
>> >> To: de...@linuxdriverproject.org
>> >> Cc: linux-ker...@vger.kernel.org; KY Srinivasan <k...@microsoft.com>;
>> >> Haiyang Zhang <haiya...@microsoft.com>; Alex Ng (LIS)
>> >> <ale...@microsoft.com>; Radim Krcmar <rkrc...@redhat.com>; Cathy
>> >> Avery <cav...@redhat.com>
>> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> >>
>> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
>> always
>> >> delivered to CPU0 regardless of what CPU we're sending
>> >> CHANNELMSG_UNLOAD
>> >> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> >> we're crashing on some other CPU and CPU0 is still alive and operational
>> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> >> vmbus_connection.unload_event, our wait on the current CPU will never
>> >> end.
>> >
>> > What was the host you were testing on?
>> >
>> 
>> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
>> by forcing crash on a secondary CPU, e.g.:
>
> Prior to 2012R2, all messages would be delivered on CPU0 and this includes 
> CHANNELMSG_UNLOAD_RESPONSE.
> For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. From 
> 2012 R2 on, all vmbus 
> messages (responses) will be delivered on  the CPU that we initially set up - 
> look at the code in
> vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to 
> CHANNELMSG_UNLOAD_RESPONSE
> will be delivered on the CPU where we initiate the contact with the
> host - CHANNELMSG_INITIATE_CONTACT message.

Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4. On
WS2012R2 what you're saying is true and all messages including
CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for initial
contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a special
case and it is always delivered to CPU0, no matter which CPU we used for
initial contact. This can be a host bug. You can use the attached patch
to see the issue.

For now I can suggest we check message pages for all CPUs from
vmbus_wait_for_unload(). We can race with other CPUs again but we don't
care as we're checking for completion_done() in the loop as well. I'll
try this approach.

-- 
  Vitaly

>From 27170c1bb8f21f7b20c1716c1df65e4812b421f8 Mon Sep 17 00:00:00 2001
From: Vitaly Kuznetsov <vkuzn...@redhat.com>
Date: Thu, 17 Mar 2016 14:41:07 +0100
Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
delivered to the cpu which was used to initiate contact regardless of what
CPU we're sending CHANNELMSG_UNLOAD from. vmbus_wait_for_unload() doesn't
account for the fact that in case we're crashing on some other CPU and the
CPU which was used to initate contact is still alive and operational
CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
vmbus_connection.unload_event, our wait on the current CPU will never
end.

Do the following:
1) Remember the CPU we used to initiate contact in vmbus_connection.

1) Check for completion_done() in the loop. In case interrupt handler is
   still alive we'll get the confirmation we need.

2) Always read the init_cpu's message page as CHANNELMSG_UNLOAD_RESPONSE
   will be delivered there. We can race with still-alive interrupt handler
   doing the same but we don't care as we're checking completion_done()
   now.

3) Cleanup message pages on all CPUs. This is required (at least for the
   current CPU as we're clearing some other CPU's messages now but we may
   want to bring up additional CPUs on crash) as new messages won't be
   delivered till we consume what's pending. On boot we'll place message
   pages somewhere else and we won't be able to read stale messages.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v1:
- Use ini

Re: [PATCH 3/5] Drivers: hv: vmbus: Fix a bug in hv_need_to_signal_on_read()

2016-03-22 Thread Vitaly Kuznetsov
KY Srinivasan <k...@microsoft.com> writes:

>> -Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Monday, March 21, 2016 1:19 AM
>> To: KY Srinivasan <k...@microsoft.com>
>> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
>> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
>> jasow...@redhat.com; sta...@vger.kernel.org
>> Subject: Re: [PATCH 3/5] Drivers: hv: vmbus: Fix a bug in
>> hv_need_to_signal_on_read()
>> 
>> "K. Y. Srinivasan" <k...@microsoft.com> writes:
>> 
>> > We need to issue a full memory barrier prior making a signalling
>> > decision.
>> 
>> Any reason this should be mb()? This is pretty strong and will probably
>> lead to performace regression ... and, btw, we have another mb() in
>> hv_ringbuffer_read().
>> 
>> Could you please describe the scenarion you're trying to protect against
>> so we could search for a better solution?
>
> If the reading of the pend_sz (in the function hv_need_to_signal_on_read)
> were to be reordered and read before we commit the new read index we could 
> have a problem.


If these are two reads we can add a lightweight barrier just preventing
compiler from reordering (e.g. smp_rmb()), right?

> If the host were to set the pending_sz after we have sampled pending_sz
> and go to sleep before we commit the read index, we could miss sending
> the interrupt.

so write and then we read and we need to prevent reordering... not sure
how to get rid on mb() then ...

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

2016-03-22 Thread Vitaly Kuznetsov
KY Srinivasan <k...@microsoft.com> writes:

>> -Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Monday, March 21, 2016 12:52 AM
>> To: KY Srinivasan <k...@microsoft.com>
>> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; Haiyang
>> Zhang <haiya...@microsoft.com>; Alex Ng (LIS) <ale...@microsoft.com>;
>> Radim Krcmar <rkrc...@redhat.com>; Cathy Avery <cav...@redhat.com>
>> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> 
>> KY Srinivasan <k...@microsoft.com> writes:
>> 
>> >> -Original Message-
>> >> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> >> Sent: Friday, March 18, 2016 5:33 AM
>> >> To: de...@linuxdriverproject.org
>> >> Cc: linux-ker...@vger.kernel.org; KY Srinivasan <k...@microsoft.com>;
>> >> Haiyang Zhang <haiya...@microsoft.com>; Alex Ng (LIS)
>> >> <ale...@microsoft.com>; Radim Krcmar <rkrc...@redhat.com>; Cathy
>> >> Avery <cav...@redhat.com>
>> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> >>
>> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
>> always
>> >> delivered to CPU0 regardless of what CPU we're sending
>> >> CHANNELMSG_UNLOAD
>> >> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> >> we're crashing on some other CPU and CPU0 is still alive and operational
>> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> >> vmbus_connection.unload_event, our wait on the current CPU will never
>> >> end.
>> >
>> > What was the host you were testing on?
>> >
>> 
>> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
>> by forcing crash on a secondary CPU, e.g.:
>
> Prior to 2012R2, all messages would be delivered on CPU0 and this includes 
> CHANNELMSG_UNLOAD_RESPONSE.
> For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. From 
> 2012 R2 on, all vmbus 
> messages (responses) will be delivered on  the CPU that we initially set up - 
> look at the code in
> vmbus_negotiate_version().

Ok, missed that. In that case we need to remember which CPU it was --
I'll add this in v2.

> So on post 2012 R2 hosts, the response to CHANNELMSG_UNLOAD_RESPONSE
> will be delivered on the CPU where we initiate the contact with the host - 
> CHANNELMSG_INITIATE_CONTACT message.
> So, maybe we can stash away the CPU on which we made the initial contact and 
> poll the state on that CPU
> to make forward progress in the case of crash.

Yes, we can't have any expectation about other CPUs on crash as they can
be in any state (crashing also, hanging on some mutex/spinlock/...,) so
we need to use current CPU only. I'll fix and resend.

Thanks!

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 5/5] Drivers: hv: vmbus: Implement copy-free read APIs

2016-03-21 Thread Vitaly Kuznetsov
"K. Y. Srinivasan"  writes:

> Implement copy-free read APIs.
>
> Signed-off-by: K. Y. Srinivasan 
> ---
>  drivers/hv/ring_buffer.c |   55 
> ++
>  include/linux/hyperv.h   |6 +
>  2 files changed, 61 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index c2c2b2e..c80e1f3 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -444,3 +444,58 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info 
> *inring_info,
>
>   return ret;
>  }
> +
> +/*
> + * In-place read functions.
> + */
> +bool get_next_pkt_raw(struct vmbus_channel *channel,
> +   struct vmpacket_descriptor **desc)

I suggest we make these functions return pointers and use
ERR_PTR()/IS_ERR() when we need to return an error.

> +{
> + struct hv_ring_buffer_info *ring_info = >inbound;
> + u32 read_loc = ring_info->ring_buffer->read_index;
> + void *ring_buffer = hv_get_ring_buffer(ring_info);
> + struct vmpacket_descriptor *cur_desc;
> + u32 packetlen;
> + u32 dsize = ring_info->ring_datasize;
> + u32 bytes_avail_toread = hv_get_bytes_to_read(ring_info);
> +
> + if (bytes_avail_toread < sizeof(struct vmpacket_descriptor))
> + return false;
> +
> + if ((read_loc + sizeof(*desc)) > dsize)
> + return false;
> +
> + cur_desc = ring_buffer + read_loc;
> + packetlen = cur_desc->len8 << 3;
> +
> + if ((read_loc + packetlen + 8) > (dsize - 1))

Could you please add a define for this '8'?


> + return false;
> +
> + *desc = cur_desc;
> + return true;
> +}
> +EXPORT_SYMBOL_GPL(get_next_pkt_raw);
> +
> +void put_pkt_raw(struct vmbus_channel *channel,
> +  struct vmpacket_descriptor *desc)
> +{
> + struct hv_ring_buffer_info *ring_info = >inbound;
> + u32 read_loc = ring_info->ring_buffer->read_index;
> + u32 packetlen = desc->len8 << 3;
> + u32 dsize = ring_info->ring_datasize;
> +
> + if ((read_loc + packetlen + 8) > dsize)
> + BUG();
> +
> + /*
> +  * Make sure all reads are done before we update the read index since
> +  * the writer may start writing to the read area once the read index
> +  * is updated.
> +  */
> + virt_mb();
> + ring_info->ring_buffer->read_index += packetlen + 8;
> +
> + if (hv_need_to_signal_on_read(ring_info))
> + vmbus_set_event(channel);
> +}
> +EXPORT_SYMBOL_GPL(put_pkt_raw);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index a6b053c..455f3f0 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1035,6 +1035,12 @@ extern int vmbus_recvpacket_raw(struct vmbus_channel 
> *channel,
>u32 *buffer_actual_len,
>u64 *requestid);
>
> +bool get_next_pkt_raw(struct vmbus_channel *channel,
> +   struct vmpacket_descriptor **desc);
> +
> +void put_pkt_raw(struct vmbus_channel *channel,
> +  struct vmpacket_descriptor *desc);
> +
>
>  extern void vmbus_ontimer(unsigned long data);

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 3/5] Drivers: hv: vmbus: Fix a bug in hv_need_to_signal_on_read()

2016-03-21 Thread Vitaly Kuznetsov
"K. Y. Srinivasan"  writes:

> We need to issue a full memory barrier prior making a signalling
> decision.

Any reason this should be mb()? This is pretty strong and will probably
lead to performace regression ... and, btw, we have another mb() in
hv_ringbuffer_read().

Could you please describe the scenarion you're trying to protect against
so we could search for a better solution?

>
> Signed-off-by: K. Y. Srinivasan 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/hv/ring_buffer.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 2919395..67dc245 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -104,6 +104,7 @@ static bool hv_need_to_signal_on_read(struct 
> hv_ring_buffer_info *rbi)
>   u32 cur_write_sz;
>   u32 pending_sz;
>
> + mb();
>   pending_sz = READ_ONCE(rbi->ring_buffer->pending_send_sz);
>   /* If the other end is not blocked on write don't bother. */
>   if (pending_sz == 0)

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

2016-03-21 Thread Vitaly Kuznetsov
KY Srinivasan <k...@microsoft.com> writes:

>> -Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Friday, March 18, 2016 5:33 AM
>> To: de...@linuxdriverproject.org
>> Cc: linux-ker...@vger.kernel.org; KY Srinivasan <k...@microsoft.com>;
>> Haiyang Zhang <haiya...@microsoft.com>; Alex Ng (LIS)
>> <ale...@microsoft.com>; Radim Krcmar <rkrc...@redhat.com>; Cathy
>> Avery <cav...@redhat.com>
>> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> 
>> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
>> delivered to CPU0 regardless of what CPU we're sending
>> CHANNELMSG_UNLOAD
>> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> we're crashing on some other CPU and CPU0 is still alive and operational
>> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> vmbus_connection.unload_event, our wait on the current CPU will never
>> end.
>
> What was the host you were testing on?
>

I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
by forcing crash on a secondary CPU, e.g.:

# cat crash.sh
#! /bin/sh
echo c > /proc/sysrq-trigger

# taskset -c 1 ./crash.sh

>> 
>> Do the following:
>> 1) Check for completion_done() in the loop. In case interrupt handler is
>>still alive we'll get the confirmation we need.
>> 
>> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE
>> will be
>>delivered there. We can race with still-alive interrupt handler doing
>>the same but we don't care as we're checking completion_done() now.
>> 
>> 3) Cleanup message pages on all CPUs. This is required (at least for the
>>current CPU as we're clearing CPU0 messages now but we may want to
>> bring
>>up additional CPUs on crash) as new messages won't be delivered till we
>>consume what's pending. On boot we'll place message pages somewhere
>> else
>>and we won't be able to read stale messages.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> ---
>>  drivers/hv/channel_mgmt.c | 30 +-
>>  1 file changed, 25 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index b10e8f74..5f37057 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel
>> *channel, const uuid_le *type_gui
>> 
>>  static void vmbus_wait_for_unload(void)
>>  {
>> -int cpu = smp_processor_id();
>> -void *page_addr = hv_context.synic_message_page[cpu];
>> +int cpu;
>> +void *page_addr = hv_context.synic_message_page[0];
>>  struct hv_message *msg = (struct hv_message *)page_addr +
>>VMBUS_MESSAGE_SINT;
>>  struct vmbus_channel_message_header *hdr;
>>  bool unloaded = false;
>> 
>> -while (1) {
>> +/*
>> + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0.
>> When we're
>> + * crashing on a different CPU let's hope that IRQ handler on CPU0 is
>> + * still functional and vmbus_unload_response() will complete
>> + * vmbus_connection.unload_event. If not, the last thing we can do
>> is
>> + * read message page for CPU0 regardless of what CPU we're on.
>> + */
>> +while (!unloaded) {
>> +if (completion_done(_connection.unload_event)) {
>> +unloaded = true;
>> +break;
>> +}
>> +
>>  if (READ_ONCE(msg->header.message_type) ==
>> HVMSG_NONE) {
>>  mdelay(10);
>>  continue;
>> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
>>  unloaded = true;
>> 
>>  vmbus_signal_eom(msg);
>> +}
>> 
>> -if (unloaded)
>> -break;
>> +/*
>> + * We're crashing and already got the UNLOAD_RESPONSE, cleanup
>> all
>> + * maybe-pending messages on all CPUs to be able to receive new
>> + * messages after we reconnect.
>> + */
>> +for_each_online_cpu(cpu) {
>> +page_addr = hv_context.synic_message_page[cpu];
>> +msg = (struct hv_message *)page_addr +
>> VMBUS_MESSAGE_SINT;
>> +msg->header.message_type = HVMSG_NONE;
>>  }
>>  }
>> 
>> --
>> 2.5.0

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios

2016-03-20 Thread Vitaly Kuznetsov
Radim Krcmar <rkrc...@redhat.com> writes:

> 2016-03-18 13:33+0100, Vitaly Kuznetsov:
>> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
>> delivered to CPU0 regardless of what CPU we're sending CHANNELMSG_UNLOAD
>> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> we're crashing on some other CPU and CPU0 is still alive and operational
>> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> vmbus_connection.unload_event, our wait on the current CPU will never
>> end.
>
> (Any chance of learning about this behavior from the spec?)
>
>> Do the following:
>> 1) Check for completion_done() in the loop. In case interrupt handler is
>>still alive we'll get the confirmation we need.
>> 
>> 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE will be
>>delivered there. We can race with still-alive interrupt handler doing
>>the same but we don't care as we're checking completion_done() now.
>
> (Yeah, seems better than hv_setup_vmbus_irq(NULL) or other hacks.)
>
>> 3) Cleanup message pages on all CPUs. This is required (at least for the
>>current CPU as we're clearing CPU0 messages now but we may want to bring
>>up additional CPUs on crash) as new messages won't be delivered till we
>>consume what's pending. On boot we'll place message pages somewhere else
>>and we won't be able to read stale messages.
>
> What if HV already set the pending message bit on current message,
> do we get any guarantees that clearing once after UNLOAD_RESPONSE is
> enough?

I think so but I'd like to get a confirmation from K.Y./Alex/Haiyang.

>
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> ---
>
> I had a question about NULL below.  (Parenthesised rants aren't related
> to r-b tag. ;)
>
>>  drivers/hv/channel_mgmt.c | 30 +-
>>  1 file changed, 25 insertions(+), 5 deletions(-)
>> 
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index b10e8f74..5f37057 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel 
>> *channel, const uuid_le *type_gui
>>  
>>  static void vmbus_wait_for_unload(void)
>>  {
>> -int cpu = smp_processor_id();
>> -void *page_addr = hv_context.synic_message_page[cpu];
>> +int cpu;
>> +void *page_addr = hv_context.synic_message_page[0];
>>  struct hv_message *msg = (struct hv_message *)page_addr +
>>VMBUS_MESSAGE_SINT;
>>  struct vmbus_channel_message_header *hdr;
>>  bool unloaded = false;
>>  
>> -while (1) {
>> +/*
>> + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. When we're
>> + * crashing on a different CPU let's hope that IRQ handler on CPU0 is
>> + * still functional and vmbus_unload_response() will complete
>> + * vmbus_connection.unload_event. If not, the last thing we can do is
>> + * read message page for CPU0 regardless of what CPU we're on.
>> + */
>> +while (!unloaded) {
>
> (I'd feel a bit safer if this was bounded by some timeout, but all
>  scenarios where this would make a difference are unplausible ...
>  queue_work() not working while the rest is fine is the best one.)
>
>> +if (completion_done(_connection.unload_event)) {
>> +unloaded = true;
>
> (No need to set unloaded when you break.)
>
>> +break;
>> +}
>> +
>>  if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
>>  mdelay(10);
>>  continue;
>> @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
>
> (I'm not a huge fan of the unloaded variable; what about remembering the
>  header/msgtype here ...
>
>>  unloaded = true;
>>  
>>  vmbus_signal_eom(msg);
>
>  and checking its value here?)
>

Sure, but we'll have to use a variable for that ... why would it be
better than 'unloaded'?

>> +}
>>  
>> -if (unloaded)
>> -break;
>> +/*
>> + * We're crashing and already got the UNLOAD_RESPONSE, cleanup all
>> + * maybe-pending messages on all CPUs to be able to receive new
>> + * messages after we reconnect.
>> + */
>> +for_each_online_cpu(cpu) {
>> +page_addr = hv_context.synic_

[PATCH] Drivers: hv: vmbus: handle various crash scenarios

2016-03-18 Thread Vitaly Kuznetsov
Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
delivered to CPU0 regardless of what CPU we're sending CHANNELMSG_UNLOAD
from. vmbus_wait_for_unload() doesn't account for the fact that in case
we're crashing on some other CPU and CPU0 is still alive and operational
CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
vmbus_connection.unload_event, our wait on the current CPU will never
end.

Do the following:
1) Check for completion_done() in the loop. In case interrupt handler is
   still alive we'll get the confirmation we need.

2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE will be
   delivered there. We can race with still-alive interrupt handler doing
   the same but we don't care as we're checking completion_done() now.

3) Cleanup message pages on all CPUs. This is required (at least for the
   current CPU as we're clearing CPU0 messages now but we may want to bring
   up additional CPUs on crash) as new messages won't be delivered till we
   consume what's pending. On boot we'll place message pages somewhere else
   and we won't be able to read stale messages.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel_mgmt.c | 30 +-
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index b10e8f74..5f37057 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel *channel, 
const uuid_le *type_gui
 
 static void vmbus_wait_for_unload(void)
 {
-   int cpu = smp_processor_id();
-   void *page_addr = hv_context.synic_message_page[cpu];
+   int cpu;
+   void *page_addr = hv_context.synic_message_page[0];
struct hv_message *msg = (struct hv_message *)page_addr +
  VMBUS_MESSAGE_SINT;
struct vmbus_channel_message_header *hdr;
bool unloaded = false;
 
-   while (1) {
+   /*
+* CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. When we're
+* crashing on a different CPU let's hope that IRQ handler on CPU0 is
+* still functional and vmbus_unload_response() will complete
+* vmbus_connection.unload_event. If not, the last thing we can do is
+* read message page for CPU0 regardless of what CPU we're on.
+*/
+   while (!unloaded) {
+   if (completion_done(_connection.unload_event)) {
+   unloaded = true;
+   break;
+   }
+
if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
mdelay(10);
continue;
@@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void)
unloaded = true;
 
vmbus_signal_eom(msg);
+   }
 
-   if (unloaded)
-   break;
+   /*
+* We're crashing and already got the UNLOAD_RESPONSE, cleanup all
+* maybe-pending messages on all CPUs to be able to receive new
+* messages after we reconnect.
+*/
+   for_each_online_cpu(cpu) {
+   page_addr = hv_context.synic_message_page[cpu];
+   msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
+   msg->header.message_type = HVMSG_NONE;
}
 }
 
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] scsi: storvsc: fix SRB_STATUS_ABORTED handling

2016-03-07 Thread Vitaly Kuznetsov
Commit 3209f9d780d1 ("scsi: storvsc: Fix a bug in the handling of SRB
status flags") filtered SRB_STATUS_AUTOSENSE_VALID out effectively making
the (SRB_STATUS_ABORTED | SRB_STATUS_AUTOSENSE_VALID) case a dead code. The
logic from this branch (e.g. storvsc_device_scan() call) is still required,
fix the check.

Fixes: 3209f9d780d1 ("scsi: storvsc: Fix a bug in the handling of SRB status 
flags")
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/scsi/storvsc_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 292c04e..3ddcabb 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -914,8 +914,9 @@ static void storvsc_handle_error(struct vmscsi_request 
*vm_srb,
do_work = true;
process_err_fn = storvsc_remove_lun;
break;
-   case (SRB_STATUS_ABORTED | SRB_STATUS_AUTOSENSE_VALID):
-   if ((asc == 0x2a) && (ascq == 0x9)) {
+   case SRB_STATUS_ABORTED:
+   if (vm_srb->srb_status & SRB_STATUS_AUTOSENSE_VALID &&
+   (asc == 0x2a) && (ascq == 0x9)) {
do_work = true;
process_err_fn = storvsc_device_scan;
/*
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] Drivers: hv: kvp: fix IP Failover

2016-03-02 Thread Vitaly Kuznetsov
Hyper-V VMs can be replicated to another hosts and there is a feature to
set different IP for replicas, it is called 'Failover TCP/IP'. When
such guest starts Hyper-V host sends it KVP_OP_SET_IP_INFO message as soon
as we finish negotiation procedure. The problem is that it can happen (and
it actually happens) before userspace daemon connects and we reply with
HV_E_FAIL to the message. As there are no repetitions we fail to set the
requested IP.

Solve the issue by postponing our reply to the negotiation message till
userspace daemon is connected. We can't wait too long as there is a
host-side timeout (cca. 75 seconds) and if we fail to reply in this time
frame the whole KVP service will become inactive. The solution is not
ideal - if it takes userspace daemon more than 60 seconds to connect
IP Failover will still fail but I don't see a solution with our current
separation between kernel and userspace parts.

Other two modules (VSS and FCOPY) don't require such delay, leave them
untouched.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_kvp.c   | 30 ++
 drivers/hv/hyperv_vmbus.h |  5 +
 2 files changed, 35 insertions(+)

diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index 9b9b370..0d3fcd6 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -78,9 +78,11 @@ static void kvp_send_key(struct work_struct *dummy);
 
 static void kvp_respond_to_host(struct hv_kvp_msg *msg, int error);
 static void kvp_timeout_func(struct work_struct *dummy);
+static void kvp_host_handshake_func(struct work_struct *dummy);
 static void kvp_register(int);
 
 static DECLARE_DELAYED_WORK(kvp_timeout_work, kvp_timeout_func);
+static DECLARE_DELAYED_WORK(kvp_host_handshake_work, kvp_host_handshake_func);
 static DECLARE_WORK(kvp_sendkey_work, kvp_send_key);
 
 static const char kvp_devname[] = "vmbus/hv_kvp";
@@ -130,6 +132,11 @@ static void kvp_timeout_func(struct work_struct *dummy)
hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
 }
 
+static void kvp_host_handshake_func(struct work_struct *dummy)
+{
+   hv_poll_channel(kvp_transaction.recv_channel, hv_kvp_onchannelcallback);
+}
+
 static int kvp_handle_handshake(struct hv_kvp_msg *msg)
 {
switch (msg->kvp_hdr.operation) {
@@ -154,6 +161,12 @@ static int kvp_handle_handshake(struct hv_kvp_msg *msg)
pr_debug("KVP: userspace daemon ver. %d registered\n",
 KVP_OP_REGISTER);
kvp_register(dm_reg_value);
+
+   /*
+* If we're still negotiating with the host cancel the timeout
+* work to not poll the channel twice.
+*/
+   cancel_delayed_work_sync(_host_handshake_work);
hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
 
return 0;
@@ -594,7 +607,22 @@ void hv_kvp_onchannelcallback(void *context)
struct icmsg_negotiate *negop = NULL;
int util_fw_version;
int kvp_srv_version;
+   static enum {NEGO_NOT_STARTED,
+NEGO_IN_PROGRESS,
+NEGO_FINISHED} host_negotiatied = NEGO_NOT_STARTED;
 
+   if (host_negotiatied == NEGO_NOT_STARTED &&
+   kvp_transaction.state < HVUTIL_READY) {
+   /*
+* If userspace daemon is not connected and host is asking
+* us to negotiate we need to delay to not lose messages.
+* This is important for Failover IP setting.
+*/
+   host_negotiatied = NEGO_IN_PROGRESS;
+   schedule_delayed_work(_host_handshake_work,
+ HV_UTIL_NEGO_TIMEOUT * HZ);
+   return;
+   }
if (kvp_transaction.state > HVUTIL_READY)
return;
 
@@ -672,6 +700,8 @@ void hv_kvp_onchannelcallback(void *context)
vmbus_sendpacket(channel, recv_buffer,
   recvlen, requestid,
   VM_PKT_DATA_INBAND, 0);
+
+   host_negotiatied = NEGO_FINISHED;
}
 
 }
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 4ebc796..5e70632 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -36,6 +36,11 @@
 #define HV_UTIL_TIMEOUT 30
 
 /*
+ * Timeout for guest-host handshake for services.
+ */
+#define HV_UTIL_NEGO_TIMEOUT 60
+
+/*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
  * is set by CPUID(HVCPUID_VERSION_FEATURES).
  */
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: util: fix a race with daemons startup

2016-02-29 Thread Vitaly Kuznetsov
KY Srinivasan <k...@microsoft.com> writes:

>> -Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Thursday, February 18, 2016 5:35 AM
>> To: de...@linuxdriverproject.org
>> Cc: KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
>> <haiya...@microsoft.com>; Olaf Hering <o...@aepfle.de>; Cathy Avery
>> <cav...@redhat.com>
>> Subject: [PATCH] Drivers: hv: util: fix a race with daemons startup
>> 
>> Commit 3cace4a61610 ("Drivers: hv: utils: run polling callback always in
>> interrupt context") removed direct *_transaction.state = HVUTIL_READY
>> assignments from *_handle_handshake() functions introducing the following
>> race: if a userspace daemon connects before we get first non-negotiation
>> request from the server hv_poll_channel() won't set transaction state to
>> HVUTIL_READY as (!channel) condition will fail, we set it to non-NULL on
>> the first real request from the server. Solve the issue by transferring
>> the transaction state to HVUTIL_READY directly from all handshake
>> functions.
>> 
>> Fixes: 3cace4a61610 ("Drivers: hv: utils: run polling callback always in
>> interrupt context")
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> ---
>>  drivers/hv/hv_fcopy.c| 1 +
>>  drivers/hv/hv_kvp.c  | 1 +
>>  drivers/hv/hv_snapshot.c | 1 +
>>  3 files changed, 3 insertions(+)
>> 
>> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
>> index c37a71e..e18b85b 100644
>> --- a/drivers/hv/hv_fcopy.c
>> +++ b/drivers/hv/hv_fcopy.c
>> @@ -108,6 +108,7 @@ static int fcopy_handle_handshake(u32 version)
>>  return -EINVAL;
>>  }
>>  pr_debug("FCP: userspace daemon ver. %d registered\n", version);
>> +fcopy_transaction.state = HVUTIL_READY;
>>  hv_poll_channel(fcopy_transaction.recv_channel,
>> fcopy_poll_wrapper);
>>  return 0;
>>  }
>> diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
>> index d4ab81b..1162afb 100644
>> --- a/drivers/hv/hv_kvp.c
>> +++ b/drivers/hv/hv_kvp.c
>> @@ -154,6 +154,7 @@ static int kvp_handle_handshake(struct hv_kvp_msg
>> *msg)
>>  pr_debug("KVP: userspace daemon ver. %d registered\n",
>>   KVP_OP_REGISTER);
>>  kvp_register(dm_reg_value);
>> +kvp_transaction.state = HVUTIL_READY;
>>  hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
>> 
>>  return 0;
>> diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
>> index 67def4a..489203b 100644
>> --- a/drivers/hv/hv_snapshot.c
>> +++ b/drivers/hv/hv_snapshot.c
>> @@ -113,6 +113,7 @@ static int vss_handle_handshake(struct hv_vss_msg
>> *vss_msg)
>>  default:
>>  return -EINVAL;
>>  }
>> +vss_transaction.state = HVUTIL_READY;
>>  hv_poll_channel(vss_transaction.recv_channel, vss_poll_wrapper);
>>  pr_debug("VSS: userspace daemon ver. %d registered\n",
>> dm_reg_value);
>>  return 0;
>
> Vitaly, there is a simpler and a safer fix to this issue - stash away the 
> channel pointer early on and so we don't
> need to set it on each transaction. This would address the issue on
> hand here.

Yes, thanks, this is also an option.

> This would also eliminate
> setting the state only in the interrupt handler. I am putting together a 
> batch of patches to send and if it is
> ok with you, I will drop this patch and include a patch on the lines 
> described above.
>

Thanks!

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] Drivers: hv: util: fix a race with daemons startup

2016-02-18 Thread Vitaly Kuznetsov
Commit 3cace4a61610 ("Drivers: hv: utils: run polling callback always in
interrupt context") removed direct *_transaction.state = HVUTIL_READY
assignments from *_handle_handshake() functions introducing the following
race: if a userspace daemon connects before we get first non-negotiation
request from the server hv_poll_channel() won't set transaction state to
HVUTIL_READY as (!channel) condition will fail, we set it to non-NULL on
the first real request from the server. Solve the issue by transferring
the transaction state to HVUTIL_READY directly from all handshake
functions.

Fixes: 3cace4a61610 ("Drivers: hv: utils: run polling callback always in
interrupt context")
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_fcopy.c| 1 +
 drivers/hv/hv_kvp.c  | 1 +
 drivers/hv/hv_snapshot.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index c37a71e..e18b85b 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -108,6 +108,7 @@ static int fcopy_handle_handshake(u32 version)
return -EINVAL;
}
pr_debug("FCP: userspace daemon ver. %d registered\n", version);
+   fcopy_transaction.state = HVUTIL_READY;
hv_poll_channel(fcopy_transaction.recv_channel, fcopy_poll_wrapper);
return 0;
 }
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c
index d4ab81b..1162afb 100644
--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -154,6 +154,7 @@ static int kvp_handle_handshake(struct hv_kvp_msg *msg)
pr_debug("KVP: userspace daemon ver. %d registered\n",
 KVP_OP_REGISTER);
kvp_register(dm_reg_value);
+   kvp_transaction.state = HVUTIL_READY;
hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
 
return 0;
diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
index 67def4a..489203b 100644
--- a/drivers/hv/hv_snapshot.c
+++ b/drivers/hv/hv_snapshot.c
@@ -113,6 +113,7 @@ static int vss_handle_handshake(struct hv_vss_msg *vss_msg)
default:
return -EINVAL;
}
+   vss_transaction.state = HVUTIL_READY;
hv_poll_channel(vss_transaction.recv_channel, vss_poll_wrapper);
pr_debug("VSS: userspace daemon ver. %d registered\n", dm_reg_value);
return 0;
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH net-next] hv_netvsc: Increase delay for RNDIS_STATUS_NETWORK_CHANGE

2016-02-17 Thread Vitaly Kuznetsov
David Miller  writes:

> From: Haiyang Zhang 
> Date: Tue, 9 Feb 2016 15:31:34 +
>
>> 1) I share your concern as well. Is there a universal way to immediately 
>> trigger 
>> DHCP renew of all current and future daemons with a single event from 
>> kernel? 
>> If not, can we put the delay (RNDIS_STATUS_NETWORK_CHANGE only) into a 
>> tunable variable of this driver?
>> 
>> 2) We used to have the call_usermodehelper "/etc/init.d/network restart" to 
>> trigger DHCP renew. In commit 27a70af3f4, Vitaly has replaced it with the 
>> current 
>> code that updates the link status with at least 2 seconds interval, so that 
>> the 
>> "link_watch infrastructure" can send notification out. link_watch 
>> infrastructure 
>> only sends one notification per second.
>
> If the daemon is waiting for the link state change properly, there should be
> no delay necessary at all.

The daemon won't get 2 state change notifications if they happen within
1 second, it will get the last state only so our link will transition
from 'UP' to 'UP'. Why is that? To signal link state change we call
netif_carrier_on()/netif_carrier_off() from the driver. These functions
do linkwatch_fire_event() which has the following code for non-urgent events:

if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, >state)) {
linkwatch_add_event(dev);
} else if (!urgent)
return;

linkwatch_schedule_work(urgent);

So we'll add just one event (because of test_and_set_bit) for a pair of
consequent netif_carrier_off()/netif_carrier_on() calls.

linkwatch_schedule_work() does the following:
unsigned long delay = linkwatch_nextevent - jiffies;



/* If we wrap around we'll delay it by at most HZ. */
if (delay > HZ)
delay = 0;

so here is where mandatory ' > 1s' wait comes from.

schedule_delayed_work(_work, delay);

linkwatch_work is linkwatch_event() which calls __linkwatch_run_queue()
which does linkwatch_do_dev() for the list of events we have. But
linkwatch_do_dev() checks current carrier status (which in our case if
'UP' if we didn't wait for > 1s before doing /netif_carrier_on()).

Hyper-V driver is not the only one which has this delay. e1000 driver,
for example, has the following:

...
 * Need to wait a few seconds after link up to get diagnostic information from
 * the phy
 */
static void e1000_update_phy_info_task(struct work_struct *work)
...

which we schedule with
  schedule_delayed_work(>phy_info_task, 2 * HZ);

To my understanding this code serves the same purpose, so even if you're
super fast with unplugging and plugging back your cable you'll have 2
seconds between 'down' and 'up'. I don't think there is something wrong
with linkwatch, the '1s' protection against drivers going mad is fine so
'2s' workarounds in drivers seem legit.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 2/4] Drivers: hv: vmbus: avoid wait_for_completion() on crash

2016-02-12 Thread Vitaly Kuznetsov
wait_for_completion() may sleep, it enables interrupts and this
is something we really want to avoid on crashes because interrupt
handlers can cause other crashes. Switch to the recently introduced
vmbus_wait_for_unload() doing busy wait instead.

Reported-by: Radim Krcmar <rkrc...@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel_mgmt.c | 4 ++--
 drivers/hv/connection.c   | 2 +-
 drivers/hv/hyperv_vmbus.h | 2 +-
 drivers/hv/vmbus_drv.c| 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 763d0c1..bd60207 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -556,7 +556,7 @@ static void vmbus_unload_response(struct 
vmbus_channel_message_header *hdr)
complete(_connection.unload_event);
 }
 
-void vmbus_initiate_unload(void)
+void vmbus_initiate_unload(bool crash)
 {
struct vmbus_channel_message_header hdr;
 
@@ -573,7 +573,7 @@ void vmbus_initiate_unload(void)
 * vmbus_initiate_unload() is also called on crash and the crash can be
 * happening in an interrupt context, where scheduling is impossible.
 */
-   if (!in_interrupt())
+   if (!crash)
wait_for_completion(_connection.unload_event);
else
vmbus_wait_for_unload();
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 3dc5a9c..fb63d30 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -236,7 +236,7 @@ void vmbus_disconnect(void)
/*
 * First send the unload request to the host.
 */
-   vmbus_initiate_unload();
+   vmbus_initiate_unload(false);
 
if (vmbus_connection.work_queue) {
drain_workqueue(vmbus_connection.work_queue);
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 4ebc796..e6160a0 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -665,7 +665,7 @@ void hv_vss_onchannelcallback(void *);
 int hv_fcopy_init(struct hv_util_service *);
 void hv_fcopy_deinit(void);
 void hv_fcopy_onchannelcallback(void *);
-void vmbus_initiate_unload(void);
+void vmbus_initiate_unload(bool crash);
 
 static inline void hv_poll_channel(struct vmbus_channel *channel,
   void (*cb)(void *))
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 895eeed..08f98a1 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1266,7 +1266,7 @@ static void hv_kexec_handler(void)
int cpu;
 
hv_synic_clockevents_cleanup();
-   vmbus_initiate_unload();
+   vmbus_initiate_unload(false);
for_each_online_cpu(cpu)
smp_call_function_single(cpu, hv_synic_cleanup, NULL, 1);
hv_cleanup();
@@ -1274,7 +1274,7 @@ static void hv_kexec_handler(void)
 
 static void hv_crash_handler(struct pt_regs *regs)
 {
-   vmbus_initiate_unload();
+   vmbus_initiate_unload(true);
/*
 * In crash handler we can't schedule synic cleanup for all CPUs,
 * doing the cleanup for current CPU only. This should be sufficient
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 0/4] Drivers: hv: vmbus: bugfix and improvements for message handling

2016-02-12 Thread Vitaly Kuznetsov
First patch of the series addresses a potentially important issue,
second patch tries to make crash path more stable, the rest is just
a cleanup.

Please review.

Vitaly Kuznetsov (4):
  Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages
  Drivers: hv: vmbus: avoid wait_for_completion() on crash
  Drivers: hv: vmbus: remove code duplication in message handling
  Drivers: hv: vmbus: avoid unneeded compiler optimizations in
vmbus_wait_for_unload()

 drivers/hv/channel_mgmt.c | 16 +++--
 drivers/hv/connection.c   |  2 +-
 drivers/hv/hyperv_vmbus.h | 26 ++-
 drivers/hv/vmbus_drv.c| 82 +--
 4 files changed, 52 insertions(+), 74 deletions(-)

-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 3/4] Drivers: hv: vmbus: remove code duplication in message handling

2016-02-12 Thread Vitaly Kuznetsov
We have 3 functions dealing with messages and they all implement
the same logic to finalize reads, move it to vmbus_signal_eom().

Suggested-by: Radim Krcmar <rkrc...@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel_mgmt.c | 10 +-
 drivers/hv/hyperv_vmbus.h | 24 
 drivers/hv/vmbus_drv.c| 40 ++--
 3 files changed, 27 insertions(+), 47 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index bd60207..1e43967 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -529,15 +529,7 @@ static void vmbus_wait_for_unload(void)
if (hdr->msgtype == CHANNELMSG_UNLOAD_RESPONSE)
unloaded = true;
 
-   msg->header.message_type = HVMSG_NONE;
-   /*
-* header.message_type needs to be written before we do
-* wrmsrl() below.
-*/
-   mb();
-
-   if (msg->header.message_flags.msg_pending)
-   wrmsrl(HV_X64_MSR_EOM, 0);
+   vmbus_signal_eom(msg);
 
if (unloaded)
break;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index e6160a0..86049a3 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -626,6 +626,30 @@ struct vmbus_channel_message_table_entry {
 extern struct vmbus_channel_message_table_entry
channel_message_table[CHANNELMSG_COUNT];
 
+/* Free the message slot and signal end-of-message if required */
+static inline void vmbus_signal_eom(struct hv_message *msg)
+{
+   msg->header.message_type = HVMSG_NONE;
+
+   /*
+* Make sure the write to MessageType (ie set to
+* HVMSG_NONE) happens before we read the
+* MessagePending and EOMing. Otherwise, the EOMing
+* will not deliver any more messages since there is
+* no empty slot
+*/
+   mb();
+
+   if (msg->header.message_flags.msg_pending) {
+   /*
+* This will cause message queue rescan to
+* possibly deliver another msg from the
+* hypervisor
+*/
+   wrmsrl(HV_X64_MSR_EOM, 0);
+   }
+}
+
 /* General vmbus interface */
 
 struct hv_device *vmbus_device_create(const uuid_le *type,
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 08f98a1..c9204c0 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -685,25 +685,7 @@ static void hv_process_timer_expiration(struct hv_message 
*msg, int cpu)
if (dev->event_handler)
dev->event_handler(dev);
 
-   msg->header.message_type = HVMSG_NONE;
-
-   /*
-* Make sure the write to MessageType (ie set to
-* HVMSG_NONE) happens before we read the
-* MessagePending and EOMing. Otherwise, the EOMing
-* will not deliver any more messages since there is
-* no empty slot
-*/
-   mb();
-
-   if (msg->header.message_flags.msg_pending) {
-   /*
-* This will cause message queue rescan to
-* possibly deliver another msg from the
-* hypervisor
-*/
-   wrmsrl(HV_X64_MSR_EOM, 0);
-   }
+   vmbus_signal_eom(msg);
 }
 
 static void vmbus_on_msg_dpc(unsigned long data)
@@ -741,25 +723,7 @@ static void vmbus_on_msg_dpc(unsigned long data)
entry->message_handler(hdr);
 
 msg_handled:
-   msg->header.message_type = HVMSG_NONE;
-
-   /*
-* Make sure the write to MessageType (ie set to
-* HVMSG_NONE) happens before we read the
-* MessagePending and EOMing. Otherwise, the EOMing
-* will not deliver any more messages since there is
-* no empty slot
-*/
-   mb();
-
-   if (msg->header.message_flags.msg_pending) {
-   /*
-* This will cause message queue rescan to
-* possibly deliver another msg from the
-* hypervisor
-*/
-   wrmsrl(HV_X64_MSR_EOM, 0);
-   }
+   vmbus_signal_eom(msg);
 }
 
 static void vmbus_isr(void)
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 4/4] Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload()

2016-02-12 Thread Vitaly Kuznetsov
Message header is modified by the hypervisor and we read it in a loop,
we need to prevent compilers from optimizing accesses. There are no such
optimizations at this moment, this is just a future proof.

Suggested-by: Radim Krcmar <rkrc...@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel_mgmt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 1e43967..b10e8f74 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -520,7 +520,7 @@ static void vmbus_wait_for_unload(void)
bool unloaded = false;
 
while (1) {
-   if (msg->header.message_type == HVMSG_NONE) {
+   if (READ_ONCE(msg->header.message_type) == HVMSG_NONE) {
mdelay(10);
continue;
}
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 1/4] Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages

2016-02-12 Thread Vitaly Kuznetsov
We must handle HVMSG_TIMER_EXPIRED messages in the interrupt context
and we offload all the rest to vmbus_on_msg_dpc() tasklet. This functions
loops to see if there are new messages pending. In case we'll ever see
HVMSG_TIMER_EXPIRED message there we're going to lose it as we can't
handle it from there. Avoid looping in vmbus_on_msg_dpc(), we're OK
with handling one message per interrupt.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/vmbus_drv.c | 68 --
 1 file changed, 33 insertions(+), 35 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 328e4c3..895eeed 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -716,51 +716,49 @@ static void vmbus_on_msg_dpc(unsigned long data)
struct vmbus_channel_message_table_entry *entry;
struct onmessage_work_context *ctx;
 
-   while (1) {
-   if (msg->header.message_type == HVMSG_NONE)
-   /* no msg */
-   break;
+   if (msg->header.message_type == HVMSG_NONE)
+   /* no msg */
+   return;
 
-   hdr = (struct vmbus_channel_message_header *)msg->u.payload;
+   hdr = (struct vmbus_channel_message_header *)msg->u.payload;
 
-   if (hdr->msgtype >= CHANNELMSG_COUNT) {
-   WARN_ONCE(1, "unknown msgtype=%d\n", hdr->msgtype);
-   goto msg_handled;
-   }
+   if (hdr->msgtype >= CHANNELMSG_COUNT) {
+   WARN_ONCE(1, "unknown msgtype=%d\n", hdr->msgtype);
+   goto msg_handled;
+   }
 
-   entry = _message_table[hdr->msgtype];
-   if (entry->handler_type == VMHT_BLOCKING) {
-   ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC);
-   if (ctx == NULL)
-   continue;
+   entry = _message_table[hdr->msgtype];
+   if (entry->handler_type == VMHT_BLOCKING) {
+   ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC);
+   if (ctx == NULL)
+   return;
 
-   INIT_WORK(>work, vmbus_onmessage_work);
-   memcpy(>msg, msg, sizeof(*msg));
+   INIT_WORK(>work, vmbus_onmessage_work);
+   memcpy(>msg, msg, sizeof(*msg));
 
-   queue_work(vmbus_connection.work_queue, >work);
-   } else
-   entry->message_handler(hdr);
+   queue_work(vmbus_connection.work_queue, >work);
+   } else
+   entry->message_handler(hdr);
 
 msg_handled:
-   msg->header.message_type = HVMSG_NONE;
+   msg->header.message_type = HVMSG_NONE;
+
+   /*
+* Make sure the write to MessageType (ie set to
+* HVMSG_NONE) happens before we read the
+* MessagePending and EOMing. Otherwise, the EOMing
+* will not deliver any more messages since there is
+* no empty slot
+*/
+   mb();
 
+   if (msg->header.message_flags.msg_pending) {
/*
-* Make sure the write to MessageType (ie set to
-* HVMSG_NONE) happens before we read the
-* MessagePending and EOMing. Otherwise, the EOMing
-* will not deliver any more messages since there is
-* no empty slot
+* This will cause message queue rescan to
+* possibly deliver another msg from the
+* hypervisor
 */
-   mb();
-
-   if (msg->header.message_flags.msg_pending) {
-   /*
-* This will cause message queue rescan to
-* possibly deliver another msg from the
-* hypervisor
-*/
-   wrmsrl(HV_X64_MSR_EOM, 0);
-   }
+   wrmsrl(HV_X64_MSR_EOM, 0);
}
 }
 
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH net] hv_netvsc: Restore needed_headroom request

2016-02-10 Thread Vitaly Kuznetsov
Vitaly Kuznetsov <vkuzn...@redhat.com> writes:

> Commit c0eb454034aa ("hv_netvsc: Don't ask for additional head room in the
> skb") got rid of needed_headroom setting for the driver. With the change I
> hit the following issue trying to use ptkgen module:
>
> [   57.522021] kernel BUG at net/core/skbuff.c:1128!
> [   57.522021] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
> ...
> [   58.721068] Call Trace:
> [   58.721068]  [] netvsc_start_xmit+0x4c6/0x8e0 [hv_netvsc]
> ...
> [   58.721068]  [] ? pktgen_finalize_skb+0x25c/0x2a0 
> [pktgen]
> [   58.721068]  [] ? __netdev_alloc_skb+0xc0/0x100
> [   58.721068]  [] pktgen_thread_worker+0x257/0x1920 
> [pktgen]
>
> Basically, we're calling skb_cow_head(skb, RNDIS_AND_PPI_SIZE) and crash on
> if (skb_shared(skb))
> BUG();
>
> We probably need to restore needed_headroom setting (but shrunk to
> RNDIS_AND_PPI_SIZE as we don't need more) to request the required headroom
> space. In theory, it should not give us performance penalty.

I'm sorry for the ping but this is kind of a regression and it would be
nice to have it fixed in 4.5.

Here is a script to trigger kernel crash:
#! /bin/sh
modprobe pktgen

echo "rem_device_all" > /proc/net/pktgen/kpktgend_0
echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
echo "pkt_size 60" > /proc/net/pktgen/eth0
echo "clone_skb 100" > /proc/net/pktgen/eth0
echo "count 10" > /proc/net/pktgen/eth0
echo "dst " > /proc/net/pktgen/eth0
echo "dst_mac " > /proc/net/pktgen/eth0
echo "start" > /proc/net/pktgen/pgctrl
cat /proc/net/pktgen/eth0

Please review.

>
> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
> ---
>  drivers/net/hyperv/netvsc_drv.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 1d3a665..98e34fe 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -1089,6 +1089,9 @@ static int netvsc_probe(struct hv_device *dev,
>   net->ethtool_ops = _ops;
>   SET_NETDEV_DEV(net, >device);
>
> + /* We always need headroom for rndis header */
> + net->needed_headroom = RNDIS_AND_PPI_SIZE;
> +
>   /* Notify the netvsc driver of the new device */
>   memset(_info, 0, sizeof(device_info));
>   device_info.ring_size = ring_size;

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH net] hv_netvsc: Restore needed_headroom request

2016-02-05 Thread Vitaly Kuznetsov
Commit c0eb454034aa ("hv_netvsc: Don't ask for additional head room in the
skb") got rid of needed_headroom setting for the driver. With the change I
hit the following issue trying to use ptkgen module:

[   57.522021] kernel BUG at net/core/skbuff.c:1128!
[   57.522021] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
...
[   58.721068] Call Trace:
[   58.721068]  [] netvsc_start_xmit+0x4c6/0x8e0 [hv_netvsc]
...
[   58.721068]  [] ? pktgen_finalize_skb+0x25c/0x2a0 [pktgen]
[   58.721068]  [] ? __netdev_alloc_skb+0xc0/0x100
[   58.721068]  [] pktgen_thread_worker+0x257/0x1920 [pktgen]

Basically, we're calling skb_cow_head(skb, RNDIS_AND_PPI_SIZE) and crash on
if (skb_shared(skb))
BUG();

We probably need to restore needed_headroom setting (but shrunk to
RNDIS_AND_PPI_SIZE as we don't need more) to request the required headroom
space. In theory, it should not give us performance penalty.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1d3a665..98e34fe 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1089,6 +1089,9 @@ static int netvsc_probe(struct hv_device *dev,
net->ethtool_ops = _ops;
SET_NETDEV_DEV(net, >device);
 
+   /* We always need headroom for rndis header */
+   net->needed_headroom = RNDIS_AND_PPI_SIZE;
+
/* Notify the netvsc driver of the new device */
memset(_info, 0, sizeof(device_info));
device_info.ring_size = ring_size;
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH net-next] hv_netvsc: Increase delay for RNDIS_STATUS_NETWORK_CHANGE

2016-02-03 Thread Vitaly Kuznetsov
Haiyang Zhang  writes:

> We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE message to
> trigger DHCP renew. User daemons may need multiple seconds to trigger the
> link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
> this link down period to 10 sec to properly trigger DHCP renew.
>

I probably don't follow: why do we need sucha a delay? If (with real
hardware) you plug network cable out and in one second you plug it in
you'll get DHCP renewed, right?

When I introduced RNDIS_STATUS_NETWORK_CHANGE handling by emulating a
pair of up/down events I put 2 second delay to make link_watch happy (as
we only handle 1 event per second there) but 10 seconds sounds to much
to me.

> Signed-off-by: Haiyang Zhang 
> ---
>  drivers/net/hyperv/netvsc_drv.c |   10 --
>  1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 1d3a665..6f23973 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -43,6 +43,8 @@
>
>  #define RING_SIZE_MIN 64
>  #define LINKCHANGE_INT (2 * HZ)
> +/* Extra delay for RNDIS_STATUS_NETWORK_CHANGE: */
> +#define LINKCHANGE_DELAY (8 * HZ)
>  static int ring_size = 128;
>  module_param(ring_size, int, S_IRUGO);
>  MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
> @@ -964,6 +966,7 @@ static void netvsc_link_change(struct work_struct *w)
>   return;
>   }
>   ndev_ctx->last_reconfig = jiffies;
> + delay = LINKCHANGE_INT;
>
>   spin_lock_irqsave(_ctx->lock, flags);
>   if (!list_empty(_ctx->reconfig_events)) {
> @@ -1009,8 +1012,11 @@ static void netvsc_link_change(struct work_struct *w)
>   netif_tx_stop_all_queues(net);
>   event->event = RNDIS_STATUS_MEDIA_CONNECT;
>   spin_lock_irqsave(_ctx->lock, flags);
> - list_add_tail(>list, _ctx->reconfig_events);
> + list_add(>list, _ctx->reconfig_events);

Why? Adding to tail was here to not screw the order of events...

>   spin_unlock_irqrestore(_ctx->lock, flags);
> +
> + ndev_ctx->last_reconfig += LINKCHANGE_DELAY;
> + delay = LINKCHANGE_INT + LINKCHANGE_DELAY;
>   reschedule = true;
>   }
>   break;
> @@ -1025,7 +1031,7 @@ static void netvsc_link_change(struct work_struct *w)
>* second, handle next reconfig event in 2 seconds.
>*/
>   if (reschedule)
> - schedule_delayed_work(_ctx->dwork, LINKCHANGE_INT);
> + schedule_delayed_work(_ctx->dwork, delay);
>  }
>
>  static void netvsc_free_netdev(struct net_device *netdev)

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH net-next] hv_netvsc: Increase delay for RNDIS_STATUS_NETWORK_CHANGE

2016-02-03 Thread Vitaly Kuznetsov
Haiyang Zhang <haiya...@microsoft.com> writes:

>> -Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Wednesday, February 3, 2016 8:06 AM
>> To: Haiyang Zhang <haiya...@microsoft.com>
>> Cc: da...@davemloft.net; net...@vger.kernel.org; KY Srinivasan
>> <k...@microsoft.com>; o...@aepfle.de; linux-ker...@vger.kernel.org;
>> driverdev-devel@linuxdriverproject.org
>> Subject: Re: [PATCH net-next] hv_netvsc: Increase delay for
>> RNDIS_STATUS_NETWORK_CHANGE
>> 
>> Haiyang Zhang <haiya...@microsoft.com> writes:
>> 
>> > We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE
>> message to
>> > trigger DHCP renew. User daemons may need multiple seconds to trigger
>> the
>> > link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
>> > this link down period to 10 sec to properly trigger DHCP renew.
>> >
>> 
>> I probably don't follow: why do we need sucha a delay? If (with real
>> hardware) you plug network cable out and in one second you plug it in
>> you'll get DHCP renewed, right?
>> 
>> When I introduced RNDIS_STATUS_NETWORK_CHANGE handling by
>> emulating a
>> pair of up/down events I put 2 second delay to make link_watch happy (as
>> we only handle 1 event per second there) but 10 seconds sounds to much
>> to me.
>
> In the test on Hyper-V, our software on host side  wants to trigger DHCP 
> renew by sending only a RNDIS_STATUS_NETWORK_CHANGE message to 
> a guest without physically unplug the cable. But, this message didn't trigger 
> DHCP renew within 2 seconds. The VM is Centos 7.1 using Networkmanager, 
> which needs 4 seconds after link down to renew IP. Some daemon, like 
> ifplugd, needs 5 sec to renew. That's why we increase the simulated link 
> down time for RNDIS_STATUS_NETWORK_CHANGE message.

Yes, I understand the motivation but sorry if I was unclear with my
question. I meant to say that with physical network adapters it's
possible to trigger same two events by plugging your cable out and then
plugging it back in and it is certailnly doable in less than 10
seconds. NetworkManager or whoever is supposed to handle these events
and we don't really care how fast -- I think that 4 or 5 seconds
mentioned above is just an observation.

>
>> > @@ -1009,8 +1012,11 @@ static void netvsc_link_change(struct
>> work_struct *w)
>> >netif_tx_stop_all_queues(net);
>> >event->event = RNDIS_STATUS_MEDIA_CONNECT;
>> >spin_lock_irqsave(_ctx->lock, flags);
>> > -  list_add_tail(>list, _ctx-
>> >reconfig_events);
>> > +  list_add(>list, _ctx->reconfig_events);
>> 
>> Why? Adding to tail was here to not screw the order of events...
>
> The RNDIS_STATUS_MEDIA_CONNECT message triggers two "half" events -- 
> link down & up. After "link down", we want the paired "link up" to be the 
> immediate next event. Since the function picks the next event from the list 
> head, so it should be inserted to list head. Otherwise, the possible existing 
> events in the list will be processed in the middle of these two "half events" 
> -- link down & up.

Ah, yes, you're probably right.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH net-next] hv_netvsc: use skb_get_hash() instead of a homegrown implementation

2016-01-25 Thread Vitaly Kuznetsov
Recent changes to 'struct flow_keys' (e.g commit d34af823ff40 ("net: Add
VLAN ID to flow_keys")) introduced a performance regression in netvsc
driver. Is problem is, however, not the above mentioned commit but the
fact that netvsc_set_hash() function did some assumptions on the struct
flow_keys data layout and this is wrong.

Get rid of netvsc_set_hash() by switching to skb_get_hash(). This change
will also imply switching to Jenkins hash from the currently used Toeplitz
but it seems there is no good excuse for Toeplitz to stay.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
- This patch is an alternative to the previosely sent "hv_netvsc: don't make
 assumptions on struct flow_keys layout" and Haiyang's "hv_netvsc: Use simple
 parser for IPv4 and v6 headers".
---
 drivers/net/hyperv/netvsc_drv.c | 67 ++---
 1 file changed, 3 insertions(+), 64 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1c8db9a..1d3a665 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -196,65 +196,6 @@ static void *init_ppi_data(struct rndis_message *msg, u32 
ppi_size,
return ppi;
 }
 
-union sub_key {
-   u64 k;
-   struct {
-   u8 pad[3];
-   u8 kb;
-   u32 ka;
-   };
-};
-
-/* Toeplitz hash function
- * data: network byte order
- * return: host byte order
- */
-static u32 comp_hash(u8 *key, int klen, void *data, int dlen)
-{
-   union sub_key subk;
-   int k_next = 4;
-   u8 dt;
-   int i, j;
-   u32 ret = 0;
-
-   subk.k = 0;
-   subk.ka = ntohl(*(u32 *)key);
-
-   for (i = 0; i < dlen; i++) {
-   subk.kb = key[k_next];
-   k_next = (k_next + 1) % klen;
-   dt = ((u8 *)data)[i];
-   for (j = 0; j < 8; j++) {
-   if (dt & 0x80)
-   ret ^= subk.ka;
-   dt <<= 1;
-   subk.k <<= 1;
-   }
-   }
-
-   return ret;
-}
-
-static bool netvsc_set_hash(u32 *hash, struct sk_buff *skb)
-{
-   struct flow_keys flow;
-   int data_len;
-
-   if (!skb_flow_dissect_flow_keys(skb, , 0) ||
-   !(flow.basic.n_proto == htons(ETH_P_IP) ||
- flow.basic.n_proto == htons(ETH_P_IPV6)))
-   return false;
-
-   if (flow.basic.ip_proto == IPPROTO_TCP)
-   data_len = 12;
-   else
-   data_len = 8;
-
-   *hash = comp_hash(netvsc_hash_key, HASH_KEYLEN, , data_len);
-
-   return true;
-}
-
 static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
void *accel_priv, select_queue_fallback_t fallback)
 {
@@ -267,11 +208,9 @@ static u16 netvsc_select_queue(struct net_device *ndev, 
struct sk_buff *skb,
if (nvsc_dev == NULL || ndev->real_num_tx_queues <= 1)
return 0;
 
-   if (netvsc_set_hash(, skb)) {
-   q_idx = nvsc_dev->send_table[hash % VRSS_SEND_TAB_SIZE] %
-   ndev->real_num_tx_queues;
-   skb_set_hash(skb, hash, PKT_HASH_TYPE_L3);
-   }
+   hash = skb_get_hash(skb);
+   q_idx = nvsc_dev->send_table[hash % VRSS_SEND_TAB_SIZE] %
+   ndev->real_num_tx_queues;
 
if (!nvsc_dev->chn_table[q_idx])
q_idx = 0;
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 3/3] Drivers: hv: vmbus: don't manipulate with clocksources on crash

2016-01-18 Thread Vitaly Kuznetsov
clocksource_change_rating() involves mutex usage and can't be called
in interrupt context. It also makes sense to avoid doing redundant work
on crash.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 11bca51..257de21 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -295,8 +295,14 @@ void hv_cleanup(void)
 * Cleanup the TSC page based CS.
 */
if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
-   clocksource_change_rating(_cs_tsc, 10);
-   clocksource_unregister(_cs_tsc);
+   /*
+* Crash can happen in an interrupt context and unregistering
+* a clocksource is impossible and redundant in this case.
+*/
+   if (!oops_in_progress) {
+   clocksource_change_rating(_cs_tsc, 10);
+   clocksource_unregister(_cs_tsc);
+   }
 
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_REFERENCE_TSC, hypercall_msr.as_uint64);
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 2/3] Drivers: hv: vmbus: avoid scheduling in interrupt context in vmbus_initiate_unload()

2016-01-18 Thread Vitaly Kuznetsov
We have to call vmbus_initiate_unload() on crash to make kdump work but
the crash can also be happening in interrupt (e.g. Sysrq + c results in
such) where we can't schedule or the following will happen:

[  314.905786] bad: scheduling from the idle thread!

Just skipping the wait (and even adding some random wait here) won't help:
to make host-side magic working we're supposed to receive CHANNELMSG_UNLOAD
(and actually confirm the fact that we received it) but we can't use
interrupt-base path (vmbus_isr()-> vmbus_on_msg_dpc()). Implement a simple
busy wait ignoring all the other messages and use it if we're in an
interrupt context.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel_mgmt.c | 44 +++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 82c6860..763d0c1 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "hyperv_vmbus.h"
@@ -509,6 +510,40 @@ static void init_vp_index(struct vmbus_channel *channel, 
const uuid_le *type_gui
channel->target_vp = hv_context.vp_index[cur_cpu];
 }
 
+static void vmbus_wait_for_unload(void)
+{
+   int cpu = smp_processor_id();
+   void *page_addr = hv_context.synic_message_page[cpu];
+   struct hv_message *msg = (struct hv_message *)page_addr +
+ VMBUS_MESSAGE_SINT;
+   struct vmbus_channel_message_header *hdr;
+   bool unloaded = false;
+
+   while (1) {
+   if (msg->header.message_type == HVMSG_NONE) {
+   mdelay(10);
+   continue;
+   }
+
+   hdr = (struct vmbus_channel_message_header *)msg->u.payload;
+   if (hdr->msgtype == CHANNELMSG_UNLOAD_RESPONSE)
+   unloaded = true;
+
+   msg->header.message_type = HVMSG_NONE;
+   /*
+* header.message_type needs to be written before we do
+* wrmsrl() below.
+*/
+   mb();
+
+   if (msg->header.message_flags.msg_pending)
+   wrmsrl(HV_X64_MSR_EOM, 0);
+
+   if (unloaded)
+   break;
+   }
+}
+
 /*
  * vmbus_unload_response - Handler for the unload response.
  */
@@ -534,7 +569,14 @@ void vmbus_initiate_unload(void)
hdr.msgtype = CHANNELMSG_UNLOAD;
vmbus_post_msg(, sizeof(struct vmbus_channel_message_header));
 
-   wait_for_completion(_connection.unload_event);
+   /*
+* vmbus_initiate_unload() is also called on crash and the crash can be
+* happening in an interrupt context, where scheduling is impossible.
+*/
+   if (!in_interrupt())
+   wait_for_completion(_connection.unload_event);
+   else
+   vmbus_wait_for_unload();
 }
 
 /*
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 1/3] Drivers: hv: vmbus: avoid infinite loop in init_vp_index()

2016-01-18 Thread Vitaly Kuznetsov
When we pick a CPU to use for a new subchannel we try find a non-used one
on the appropriate NUMA node, we keep track of them with the
primary->alloced_cpus_in_node mask. Under normal circumstances we don't run
out of available CPUs but it is possible when we we don't initialize some
cpus in Linux, e.g. when we boot with 'nr_cpus=' limitation.

Avoid the infinite loop in init_vp_index() by checking that we still have
non-used CPUs in the alloced_cpus_in_node mask and resetting it in case
we don't.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel_mgmt.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 1c1ad47..82c6860 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -469,6 +469,17 @@ static void init_vp_index(struct vmbus_channel *channel, 
const uuid_le *type_gui
cpumask_of_node(primary->numa_node));
 
cur_cpu = -1;
+
+   /*
+* Normally Hyper-V host doesn't create more subchannels than there
+* are VCPUs on the node but it is possible when not all present VCPUs
+* on the node are initialized by guest. Clear the alloced_cpus_in_node
+* to start over.
+*/
+   if (cpumask_equal(>alloced_cpus_in_node,
+ cpumask_of_node(primary->numa_node)))
+   cpumask_clear(>alloced_cpus_in_node);
+
while (true) {
cur_cpu = cpumask_next(cur_cpu, _mask);
if (cur_cpu >= nr_cpu_ids) {
-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 0/3] Drivers: hv: fix kdump (again) and more

2016-01-18 Thread Vitaly Kuznetsov
Kdump on crash is reportedly broken for Hyper-V guests and I was able to
find two reasons:
1) Inability to boot with nr_cpus=1 (PATCH 1)
2) Sleeping in interrupt context on crash (PATCH 2 and 3)

Vitaly Kuznetsov (3):
  Drivers: hv: vmbus: avoid infinite loop in init_vp_index()
  Drivers: hv: vmbus: avoid scheduling in interrupt context in
vmbus_initiate_unload()
  Drivers: hv: vmbus: don't manipulate with clocksources on crash

 drivers/hv/channel_mgmt.c | 55 ++-
 drivers/hv/hv.c   | 10 +++--
 2 files changed, 62 insertions(+), 3 deletions(-)

-- 
2.5.0

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH V5 9/9] hvsock: introduce Hyper-V VM Sockets feature

2016-01-05 Thread Vitaly Kuznetsov
Dexuan Cui  writes:

Just some minor nitpicks below -- I have to admit I didn't test the feature.

[..skip..] 

> +
> + if (sk->sk_err) {
> + ret = -sk->sk_err;
> + goto out_wait_error;
> + } else {
> + ret = 0;
> + }
> +
> +out_wait:
> + finish_wait(sk_sleep(sk), );
> +out:
> + release_sock(sk);
> + return ret;
> +
> +out_wait_error:
> + sk->sk_state = SS_UNCONNECTED;
> + sock->state = SS_UNCONNECTED;
> + goto out_wait;
> +}

Why not just place out_wait_error label before out_wait (and do 'goto
out_wait' in ret = 0 case instead of 'goto out_wait_error' in the error
case)?

[..skip..]

> +
> +static int __init hvsock_init(void)
> +{
> + int ret;
> +
> + /* Hyper-V socket requires at least VMBus 4.0 */
> + if ((vmbus_proto_version >> 16) < 4) {
> + pr_err("failed to load: VMBus 4 or later is required\n");
> + return -ENODEV;

(Let me pretend I'm Dan :-) So here we return ...

> + }
> +
> + ret = vmbus_driver_register(_drv);
> + if (ret) {
> + pr_err("failed to register hv_sock driver\n");
> + goto out;

... and here we goto where we just return. I suggest we bring some
consistency by directly returning ret here and eliminating 'out' label. 

> + }
> +
> + ret = proto_register(_proto, 0);
> + if (ret) {
> + pr_err("failed to register protocol\n");
> + goto unreg_hvsock_drv;
> + }
> +
> + ret = sock_register(_family_ops);
> + if (ret) {
> + pr_err("failed to register address family\n");
> + goto unreg_proto;
> + }
> +
> + return 0;
> +
> +unreg_proto:
> + proto_unregister(_proto);
> +unreg_hvsock_drv:
> + vmbus_driver_unregister(_drv);
> +out:
> + return ret;
> +}
> +
> +static void __exit hvsock_exit(void)
> +{
> + sock_unregister(AF_HYPERV);
> + proto_unregister(_proto);
> + vmbus_driver_unregister(_drv);
> +}
> +
> +module_init(hvsock_init);
> +module_exit(hvsock_exit);
> +
> +MODULE_DESCRIPTION("Microsoft Hyper-V Virtual Socket Family");
> +MODULE_VERSION("0.1");

Do we really need it? When the driver is commited we won't probably be
updating it with v0.2 as a whole, we'll be sending patches addressing
issues and there always will be a question when to swtich to 0.2, 0.3,
... And we don't have MODULE_VERSION for other Hyper-V drivers.

> +MODULE_LICENSE("Dual BSD/GPL");

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH V5 7/9] Drivers: hv: vmbus: add a mechanism to pass hvsock events to the hvsock driver

2016-01-05 Thread Vitaly Kuznetsov
Dexuan Cui  writes:

> For now only 1 event is defined: HVSOCK_RESCIND_CHANNEL.
> We'll have more events in the future.
>
> Signed-off-by: Dexuan Cui 
> ---
>  drivers/hv/channel_mgmt.c | 18 ++
>  include/linux/hyperv.h| 17 +
>  2 files changed, 35 insertions(+)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 4611b50..87fc7d2 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -608,6 +608,16 @@ static void vmbus_onoffer_rescind(struct 
> vmbus_channel_message_header *hdr)
>   spin_unlock_irqrestore(>lock, flags);
>
>   if (channel->device_obj) {
> + if (is_hvsock_channel(channel) &&
> + channel->hvsock_event_callback) {
> + channel->hvsock_event_callback(channel,
> +HVSOCK_RESCIND_CHANNEL);
> + /*
> +  * We can't invoke vmbus_device_unregister()
> +  * until the socket fd is closed.
> +  */
> + return;
> + }
>   /*
>* We will have to unregister this device from the
>* driver core.
> @@ -977,3 +987,11 @@ bool vmbus_are_subchannels_present(struct vmbus_channel 
> *primary)
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(vmbus_are_subchannels_present);
> +
> +void vmbus_set_hvsock_event_callback(struct vmbus_channel *channel,
> + void (*hvsock_event_callback)(struct vmbus_channel *,
> +   enum hvsock_event))
> +{
> + channel->hvsock_event_callback = hvsock_event_callback;
> +}
> +EXPORT_SYMBOL_GPL(vmbus_set_hvsock_event_callback);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index b4cc44c..7e507bb 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -645,6 +645,12 @@ enum hv_signal_policy {
>   HV_SIGNAL_POLICY_EXPLICIT,
>  };
>
> +/* hvsock related definitions */
> +enum hvsock_event {
> + /* The host application is close()-ing the connection */
> + HVSOCK_RESCIND_CHANNEL,
> +};
> +
>  struct vmbus_channel {
>   /* Unique channel id */
>   int id;
> @@ -740,6 +746,13 @@ struct vmbus_channel {
>   void (*sc_creation_callback)(struct vmbus_channel *new_sc);
>
>   /*
> +  * hvsock event callback.
> +  * For now only 1 event is defined: HVSOCK_RESCIND_CHANNEL.
> +  */
> + void (*hvsock_event_callback)(struct vmbus_channel *channel,
> +   enum hvsock_event event);

Would it make sense to rename it to something more general,
e.g. sc_rescind_callback and call it for all drivers (even if we don't
need it now) intead of introducing enum hvsock_event? When new events
arrive we'll just add new callbacks (or, alternatively, we could unify
it to 'channel_event_callback' and merging with sc_creation_callback()
but I'd say it is uglier).

> +
> + /*
>* The spinlock to protect the structure. It is being used to protect
>* test-and-set access to various attributes of the structure as well
>* as all sc_list operations.
> @@ -825,6 +838,10 @@ int vmbus_request_offers(void);
>  void vmbus_set_sc_create_callback(struct vmbus_channel *primary_channel,
>   void (*sc_cr_cb)(struct vmbus_channel *new_sc));
>
> +void vmbus_set_hvsock_event_callback(struct vmbus_channel *channel,
> + void (*hvsock_event_callback)(struct vmbus_channel *,
> +   enum hvsock_event));
> +
>  /*
>   * Retrieve the (sub) channel on which to send an outgoing request.
>   * When a primary channel has multiple sub-channels, we choose a

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH V5 4/9] Drivers: hv: ring_buffer: enhance hv_ringbuffer_read() to support hvsock

2016-01-05 Thread Vitaly Kuznetsov
Dexuan Cui <de...@microsoft.com> writes:

> To get the payload of hvsock, we need raw=0 to skip the level-1 header
> (i.e., struct vmpacket_descriptor desc) and we also need to skip the
> level-2 header (i.e., struct vmpipe_proto_header pipe_hdr).
>
> NB: if the length of the hvsock payload is not aligned with the 8-byte
> boundeary, at most 7 padding bytes are appended, so the real hvsock
> payload's length must be retrieved by the pipe_hdr.data_size field.
>
> I 'upgrade' the 'raw' parameter of hv_ringbuffer_read() to a
> 'read_flags', trying to share the logic of the function.

When I was touching this code last time I was actually thinking about
eliminating 'raw' flag by making all ring reads raw and moving this
header filtering job to the upper layer (as we already have
vmbus_recvpacket()/vmbus_recvpacket_raw()) but for some reason I didn't
do it. I believe you have more or less the same reasoing for introducing
new read type instead of parsing this at a higher level. Some comments
below ...

>
> This patch is required by the next patch, which will introduce the hvsock
> send/recv APIs.
>
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> ---
>  drivers/hv/channel.c  | 10 +
>  drivers/hv/hyperv_vmbus.h | 13 +++-
>  drivers/hv/ring_buffer.c  | 54 
> ---
>  include/linux/hyperv.h| 12 +++
>  4 files changed, 72 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index eaaa066..cc49966 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -940,13 +940,14 @@ EXPORT_SYMBOL_GPL(vmbus_sendpacket_multipagebuffer);
>  static inline int
>  __vmbus_recvpacket(struct vmbus_channel *channel, void *buffer,
>  u32 bufferlen, u32 *buffer_actual_len, u64 *requestid,
> -bool raw)
> +u32 read_flags)
>  {
>   int ret;
>   bool signal = false;
>
>   ret = hv_ringbuffer_read(>inbound, buffer, bufferlen,
> -  buffer_actual_len, requestid, , raw);
> +  buffer_actual_len, requestid, ,
> +  read_flags);
>
>   if (signal)
>   vmbus_setevent(channel);
> @@ -959,7 +960,7 @@ int vmbus_recvpacket(struct vmbus_channel *channel, void 
> *buffer,
>u64 *requestid)
>  {
>   return __vmbus_recvpacket(channel, buffer, bufferlen,
> -   buffer_actual_len, requestid, false);
> +   buffer_actual_len, requestid, 0);
>  }
>  EXPORT_SYMBOL(vmbus_recvpacket);
>
> @@ -971,6 +972,7 @@ int vmbus_recvpacket_raw(struct vmbus_channel *channel, 
> void *buffer,
> u64 *requestid)
>  {
>   return __vmbus_recvpacket(channel, buffer, bufferlen,
> -   buffer_actual_len, requestid, true);
> +   buffer_actual_len, requestid,
> +   HV_RINGBUFFER_READ_FLAG_RAW);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_recvpacket_raw);
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 0411b7b..46206b6 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -619,9 +619,20 @@ int hv_ringbuffer_write(struct hv_ring_buffer_info 
> *ring_info,
>   struct kvec *kv_list,
>   u32 kv_count, bool *signal);
>
> +/*
> + * By default, a read_flags of 0 means: the payload offset is
> + * sizeof(struct vmpacket_descriptor).
> + *
> + * If HV_RINGBUFFER_READ_FLAG_RAW is used, the payload offset is 0.
> + *
> + * If HV_RINGBUFFER_READ_FLAG_HVSOCK is used, the payload offset is
> + * sizeof(struct vmpacket_descriptor) + sizeof(struct
> vmpipe_proto_header).

So these are mutually exclusive, right? Should we introduce 'int
payload_offset' parameter instead of flags? 

> + */
> +#define HV_RINGBUFFER_READ_FLAG_RAW  (1 << 0)
> +#define HV_RINGBUFFER_READ_FLAG_HVSOCK   (1 << 1)
>  int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
>  void *buffer, u32 buflen, u32 *buffer_actual_len,
> -u64 *requestid, bool *signal, bool raw);
> +u64 *requestid, bool *signal, u32 read_flags);
>
>  void hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
>   struct hv_ring_buffer_debug_info *debug_info);
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index b53702c..03a509c 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c

Re: [PATCH V5 5/9] Drivers: hv: vmbus: add APIs to send/recv hvsock packets

2016-01-05 Thread Vitaly Kuznetsov
Dexuan Cui  writes:

> This will be used by the coming net/hvsock driver.
>
> Signed-off-by: Dexuan Cui 
> ---
>  drivers/hv/channel.c   | 59 
> ++
>  include/linux/hyperv.h |  9 
>  2 files changed, 68 insertions(+)
>
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index cc49966..ce1b885 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -924,6 +924,52 @@ int vmbus_sendpacket_multipagebuffer(struct 
> vmbus_channel *channel,
>  }
>  EXPORT_SYMBOL_GPL(vmbus_sendpacket_multipagebuffer);
>
> +/*
> + * vmbus_sendpacket_hvsock - Send the hvsock payload 'buf' of a length 'len'
> + */
> +int vmbus_sendpacket_hvsock(struct vmbus_channel *channel, void *buf, u32 
> len)
> +{
> + struct vmpipe_proto_header pipe_hdr;
> + struct vmpacket_descriptor desc;
> + struct kvec bufferlist[4];
> + u32 packetlen_aligned;
> + u32 packetlen;
> + u64 aligned_data = 0;
> + bool signal = false;
> + int ret;
> +
> + packetlen = HVSOCK_HEADER_LEN + len;
> + packetlen_aligned = ALIGN(packetlen, sizeof(u64));
> +
> + /* Setup the descriptor */
> + desc.type = VM_PKT_DATA_INBAND;
> + /* in 8-bytes granularity */
> + desc.offset8 = sizeof(struct vmpacket_descriptor) >> 3;
> + desc.len8 = (u16)(packetlen_aligned >> 3);
> + desc.flags = 0;
> + desc.trans_id = 0;
> +
> + pipe_hdr.pkt_type = 1;
> + pipe_hdr.data_size = len;
> +
> + bufferlist[0].iov_base = 
> + bufferlist[0].iov_len  = sizeof(struct vmpacket_descriptor);
> + bufferlist[1].iov_base = _hdr;
> + bufferlist[1].iov_len  = sizeof(struct vmpipe_proto_header);
> + bufferlist[2].iov_base = buf;
> + bufferlist[2].iov_len  = len;
> + bufferlist[3].iov_base = _data;
> + bufferlist[3].iov_len  = packetlen_aligned - packetlen;
> +
> + ret = hv_ringbuffer_write(>outbound, bufferlist, 4,
> );

Using ARRAY_SIZE(bufferlist) instead of 4 would allow us to keep this
line untouched when we decide to add something (and compiler will
optimize it to 4 anyway).

> +
> + if (ret == 0 && signal)
> + vmbus_setevent(channel);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(vmbus_sendpacket_hvsock);
> +
>  /**
>   * vmbus_recvpacket() - Retrieve the user packet on the specified channel
>   * @channel: Pointer to vmbus_channel structure.
> @@ -976,3 +1022,16 @@ int vmbus_recvpacket_raw(struct vmbus_channel *channel, 
> void *buffer,
> HV_RINGBUFFER_READ_FLAG_RAW);
>  }
>  EXPORT_SYMBOL_GPL(vmbus_recvpacket_raw);
> +
> +/*
> + * vmbus_recvpacket_hvsock - Receive the hvsock payload from the vmbus
> + * ringbuffer into the 'buffer'.
> + */
> +int vmbus_recvpacket_hvsock(struct vmbus_channel *channel, void *buffer,
> + u32 bufferlen, u32 *buffer_actual_len)
> +{
> + return __vmbus_recvpacket(channel, buffer, bufferlen,
> +   buffer_actual_len, NULL,
> +   HV_RINGBUFFER_READ_FLAG_HVSOCK);
> +}
> +EXPORT_SYMBOL_GPL(vmbus_recvpacket_hvsock);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index e005223..646c20d 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -908,6 +908,9 @@ extern int vmbus_sendpacket_ctl(struct vmbus_channel 
> *channel,
> u32 flags,
> bool kick_q);
>
> +extern int vmbus_sendpacket_hvsock(struct vmbus_channel *channel,
> +void *buf, u32 len);
> +
>  extern int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
>   struct hv_page_buffer pagebuffers[],
>   u32 pagecount,
> @@ -958,6 +961,9 @@ extern int vmbus_recvpacket_raw(struct vmbus_channel 
> *channel,
>u64 *requestid);
>
> +extern int vmbus_recvpacket_hvsock(struct vmbus_channel *channel, void 
> *buffer,
> +u32 bufferlen, u32 *buffer_actual_len);
> +
>  extern void vmbus_ontimer(unsigned long data);
>
>  /* Base driver object */
> @@ -1280,4 +1286,7 @@ struct vmpipe_proto_header {
>   };
>  } __packed;
>
> +#define HVSOCK_HEADER_LEN(sizeof(struct vmpacket_descriptor) + \
> +  sizeof(struct vmpipe_proto_header))
> +
>  #endif /* _HYPERV_H */

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/3] Drivers: hv: utils: fix hvt_op_poll() return value on transport destroy

2015-12-16 Thread Vitaly Kuznetsov
Dexuan Cui <de...@microsoft.com> writes:

>> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On Behalf
>> Of K. Y. Srinivasan
>> Sent: Wednesday, December 16, 2015 8:27
>> To: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
>> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
>> vkuzn...@redhat.com; jasow...@redhat.com
>> Subject: [PATCH 1/3] Drivers: hv: utils: fix hvt_op_poll() return value on 
>> transport
>> destroy
>> 
>> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> 
>> The return type of hvt_op_poll() is unsigned int and -EBADF is
>> inappropriate, poll functions return POLL* statuses.
>> 
>> Reported-by: Dexuan Cui <de...@microsoft.com>
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
>> ---
>>  drivers/hv/hv_utils_transport.c |2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>> 
>> diff --git a/drivers/hv/hv_utils_transport.c 
>> b/drivers/hv/hv_utils_transport.c
>> index ee20b50..4f42c0e 100644
>> --- a/drivers/hv/hv_utils_transport.c
>> +++ b/drivers/hv/hv_utils_transport.c
>> @@ -109,7 +109,7 @@ static unsigned int hvt_op_poll(struct file *file,
>> poll_table *wait)
>>  poll_wait(file, >outmsg_q, wait);
>> 
>>  if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
>> -return -EBADF;
>> +return POLLERR | POLLHUP;
>> 
>>  if (hvt->outmsg_len > 0)
>>  return POLLIN | POLLRDNORM;
>> --
>
> Hi Vitaly,
> The daemon only polls on POLLIN.
> I'm not sure returning "POLLERR | POLLHUP" here can wake up the daemon or not.
>

I tested this patch with hv_kvp_daemon which does poll() and it
works: we wake up all pollers from hvutil_transport_destroy(). Here
we just need to return proper value. Actually, the return value doesn't
really matter -- we do read() after poll() and get -EBADF there. But
let's be consistent.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 3/9] Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode

2015-12-15 Thread Vitaly Kuznetsov
Dexuan Cui <de...@microsoft.com> writes:

>> -Original Message-
>> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On Behalf
>> Of K. Y. Srinivasan
>> Sent: Tuesday, December 15, 2015 11:02
>> To: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
>> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
>> vkuzn...@redhat.com; jasow...@redhat.com
>> Subject: [PATCH 3/9] Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY
>> mode
>> 
>> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> 
>> When Hyper-V host asks us to remove some util driver by closing the
>> appropriate channel there is no easy way to force the current file
>> descriptor holder to hang up but we can start to respond -EBADF to all
>> operations asking it to exit gracefully.
>> 
>> As we're setting hvt->mode from two separate contexts now we need to use
>> a proper locking.
>> 
>> ...
>> @@ -99,6 +107,10 @@ static unsigned int hvt_op_poll(struct file *file,
>> poll_table *wait)
>>  hvt = container_of(file->f_op, struct hvutil_transport, fops);
>> 
>>  poll_wait(file, >outmsg_q, wait);
>> +
>> +if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
>> +return -EBADF;
>> +
>>  if (hvt->outmsg_len > 0)
>>  return POLLIN | POLLRDNORM;
>
> Hi Vitaly, 
> Should hvt_op_poll() return -EBADF -- I think it probably
> should return POLLERR or POLLHUP?

Oh, sorry, my bad -- hvt_op_poll() returns unsigned int and -EBADF is
definitely inappropriate. I see this patch was already merged to
char-misc-testing so I'll send a follow-up patch to fix things up.

Thanks!

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] Drivers: hv: utils: fix hvt_op_poll() return value on transport destroy

2015-12-15 Thread Vitaly Kuznetsov
The return type of hvt_op_poll() is unsigned int and -EBADF is
inappropriate, poll functions return POLL* statuses.

Reported-by: Dexuan Cui <de...@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
- This is a follow-up to the previously sent 'Drivers: hv: utils: introduce
  HVUTIL_TRANSPORT_DESTROY mode' patch which is already in
  char-misc-testing tree (a15025660d47).
---
 drivers/hv/hv_utils_transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index ee20b50..4f42c0e 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -109,7 +109,7 @@ static unsigned int hvt_op_poll(struct file *file, 
poll_table *wait)
poll_wait(file, >outmsg_q, wait);
 
if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
-   return -EBADF;
+   return POLLERR | POLLHUP;
 
if (hvt->outmsg_len > 0)
return POLLIN | POLLRDNORM;
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 RESEND 2/4] Drivers: hv: utils: rename outmsg_lock

2015-12-14 Thread Vitaly Kuznetsov
As a preparation to reusing outmsg_lock to protect test-and-set openrations
on 'mode' rename it the more general 'lock'.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_utils_transport.c | 14 +++---
 drivers/hv/hv_utils_transport.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 40abe44..59c6f3d 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -27,11 +27,11 @@ static struct list_head hvt_list = LIST_HEAD_INIT(hvt_list);
 
 static void hvt_reset(struct hvutil_transport *hvt)
 {
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
kfree(hvt->outmsg);
hvt->outmsg = NULL;
hvt->outmsg_len = 0;
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
if (hvt->on_reset)
hvt->on_reset();
 }
@@ -47,7 +47,7 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0))
return -EINTR;
 
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
if (!hvt->outmsg) {
ret = -EAGAIN;
goto out_unlock;
@@ -68,7 +68,7 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
hvt->outmsg_len = 0;
 
 out_unlock:
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
return ret;
 }
 
@@ -197,7 +197,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, 
void *msg, int len)
return ret;
}
/* HVUTIL_TRANSPORT_CHARDEV */
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
if (hvt->outmsg) {
/* Previous message wasn't received */
ret = -EFAULT;
@@ -211,7 +211,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, 
void *msg, int len)
} else
ret = -ENOMEM;
 out_unlock:
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
return ret;
 }
 
@@ -242,7 +242,7 @@ struct hvutil_transport *hvutil_transport_init(const char 
*name,
hvt->mdev.fops = >fops;
 
init_waitqueue_head(>outmsg_q);
-   mutex_init(>outmsg_lock);
+   mutex_init(>lock);
 
spin_lock(_list_lock);
list_add(>list, _list);
diff --git a/drivers/hv/hv_utils_transport.h b/drivers/hv/hv_utils_transport.h
index 314c76c..bff4c92 100644
--- a/drivers/hv/hv_utils_transport.h
+++ b/drivers/hv/hv_utils_transport.h
@@ -38,7 +38,7 @@ struct hvutil_transport {
u8 *outmsg; /* message to the userspace */
int outmsg_len; /* its length */
wait_queue_head_t outmsg_q; /* poll/read wait queue */
-   struct mutex outmsg_lock;   /* protects outmsg */
+   struct mutex lock;  /* protects struct members */
 };
 
 struct hvutil_transport *hvutil_transport_init(const char *name,
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 RESEND 0/4] Drivers: hv: utils: prevent crash when a utility driver is disabled host side

2015-12-14 Thread Vitaly Kuznetsov
Changes since v2:
- No changes, this is just a resend.

Changes since v1:
- Don't re-introduce memory leak in hvt_op_write(), preserve the error
  code from on_msg() [Dan Carpenter]

Original description:

I'm observing a crash when a utility driver is disabled host side (e.g.
'Guest services' is disabled live) when we have userspace daemon
connected:

[   90.244859] general protection fault:  [#1] SMP
...
[   90.800082] CPU: 3 PID: 473 Comm: hypervfcopyd Not tainted 
4.3.0-rc7_netvsc_noalloc+ #1053
...
[   90.800082] Call Trace:
[   90.800082]  [] __fput+0xc8/0x1f0
[   90.800082]  [] fput+0xe/0x10
[   90.800082]  [] task_work_run+0x73/0x90
[   90.800082]  [] do_exit+0x335/0xa90
[   90.800082]  [] do_group_exit+0x3f/0xc0
[   90.800082]  [] get_signal+0x25e/0x5e0
[   90.800082]  [] do_signal+0x28/0x580
[   90.800082]  [] ? finish_task_switch+0xa6/0x180
[   90.800082]  [] ? __schedule+0x28f/0x870
[   90.800082]  [] ? hvt_op_read+0x12a/0x140 [hv_utils]
[   90.800082]  [] ? wake_atomic_t_function+0x70/0x70
...
[   90.800082] RIP  [] module_put+0x16/0x90
[   90.800082]  RSP 
[   95.734261] ---[ end trace 0e09af6a6294a668 ]---

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure
asking the daemon to exit gracefully by responding -EBADF to all
operations on unload.

Patch 1 fixes an unrelated issue I spotted, patch 2 renames outmsg_lock to
'lock' as we're gonna use it to protect test-and-set operations on 'mode',
patch 3 introduces HVUTIL_TRANSPORT_DESTROY mode of operation, patch 4
fixes the issue itself.

Patches are rebased on previously sent Olaf's fixes.

Vitaly Kuznetsov (4):
  Drivers: hv: utils: fix memory leak on on_msg() failure
  Drivers: hv: utils: rename outmsg_lock
  Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode
  Drivers: hv: utils: fix crash when device is removed from host side

 drivers/hv/hv_utils_transport.c | 112 +++-
 drivers/hv/hv_utils_transport.h |   3 +-
 2 files changed, 89 insertions(+), 26 deletions(-)

-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 RESEND 1/4] Drivers: hv: utils: fix memory leak on on_msg() failure

2015-12-14 Thread Vitaly Kuznetsov
inmsg should be freed in case of on_msg() failure to avoid memory leak.
Preserve the error code from on_msg().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v1:
- Preserve the error code from on_msg(). [Dan Carpenter]
---
 drivers/hv/hv_utils_transport.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 24b2766..40abe44 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -77,6 +77,7 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
 {
struct hvutil_transport *hvt;
u8 *inmsg;
+   int ret;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
@@ -84,11 +85,11 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
if (IS_ERR(inmsg))
return PTR_ERR(inmsg);
 
-   if (hvt->on_msg(inmsg, count))
-   return -EFAULT;
+   ret = hvt->on_msg(inmsg, count);
+
kfree(inmsg);
 
-   return count;
+   return ret ? ret : count;
 }
 
 static unsigned int hvt_op_poll(struct file *file, poll_table *wait)
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 RESEND 3/4] Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode

2015-12-14 Thread Vitaly Kuznetsov
When Hyper-V host asks us to remove some util driver by closing the
appropriate channel there is no easy way to force the current file
descriptor holder to hang up but we can start to respond -EBADF to all
operations asking it to exit gracefully.

As we're setting hvt->mode from two separate contexts now we need to use
a proper locking.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v1:
- Don't re-introduce memory leak in hvt_op_write() [Dan Carpenter]
---
 drivers/hv/hv_utils_transport.c | 71 -
 drivers/hv/hv_utils_transport.h |  1 +
 2 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 59c6f3d..31c2f86 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -27,11 +27,9 @@ static struct list_head hvt_list = LIST_HEAD_INIT(hvt_list);
 
 static void hvt_reset(struct hvutil_transport *hvt)
 {
-   mutex_lock(>lock);
kfree(hvt->outmsg);
hvt->outmsg = NULL;
hvt->outmsg_len = 0;
-   mutex_unlock(>lock);
if (hvt->on_reset)
hvt->on_reset();
 }
@@ -44,10 +42,17 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0))
+   if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0 ||
+hvt->mode != HVUTIL_TRANSPORT_CHARDEV))
return -EINTR;
 
mutex_lock(>lock);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY) {
+   ret = -EBADF;
+   goto out_unlock;
+   }
+
if (!hvt->outmsg) {
ret = -EAGAIN;
goto out_unlock;
@@ -85,7 +90,10 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
if (IS_ERR(inmsg))
return PTR_ERR(inmsg);
 
-   ret = hvt->on_msg(inmsg, count);
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
+   ret = -EBADF;
+   else
+   ret = hvt->on_msg(inmsg, count);
 
kfree(inmsg);
 
@@ -99,6 +107,10 @@ static unsigned int hvt_op_poll(struct file *file, 
poll_table *wait)
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
poll_wait(file, >outmsg_q, wait);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
+   return -EBADF;
+
if (hvt->outmsg_len > 0)
return POLLIN | POLLRDNORM;
 
@@ -108,26 +120,39 @@ static unsigned int hvt_op_poll(struct file *file, 
poll_table *wait)
 static int hvt_op_open(struct inode *inode, struct file *file)
 {
struct hvutil_transport *hvt;
+   int ret = 0;
+   bool issue_reset = false;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   /*
-* Switching to CHARDEV mode. We switch bach to INIT when device
-* gets released.
-*/
-   if (hvt->mode == HVUTIL_TRANSPORT_INIT)
+   mutex_lock(>lock);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY) {
+   ret = -EBADF;
+   } else if (hvt->mode == HVUTIL_TRANSPORT_INIT) {
+   /*
+* Switching to CHARDEV mode. We switch bach to INIT when
+* device gets released.
+*/
hvt->mode = HVUTIL_TRANSPORT_CHARDEV;
+   }
else if (hvt->mode == HVUTIL_TRANSPORT_NETLINK) {
/*
 * We're switching from netlink communication to using char
 * device. Issue the reset first.
 */
-   hvt_reset(hvt);
+   issue_reset = true;
hvt->mode = HVUTIL_TRANSPORT_CHARDEV;
-   } else
-   return -EBUSY;
+   } else {
+   ret = -EBUSY;
+   }
 
-   return 0;
+   if (issue_reset)
+   hvt_reset(hvt);
+
+   mutex_unlock(>lock);
+
+   return ret;
 }
 
 static int hvt_op_release(struct inode *inode, struct file *file)
@@ -136,12 +161,15 @@ static int hvt_op_release(struct inode *inode, struct 
file *file)
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   hvt->mode = HVUTIL_TRANSPORT_INIT;
+   mutex_lock(>lock);
+   if (hvt->mode != HVUTIL_TRANSPORT_DESTROY)
+   hvt->mode = HVUTIL_TRANSPORT_INIT;
/*
 * Cleanup message buffers to avoid spurious messages when the daemon
 * connects back.
 */
hvt_reset(hvt);
+   mutex_unlock(>lock);
 
return 0;
 }
@@ -168,6 +196,7 @@ static void hvt_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
 * Switching to NETLINK mode. Switching to CHARDEV happens when someone
 

[PATCH v2 RESEND 4/4] Drivers: hv: utils: fix crash when device is removed from host side

2015-12-14 Thread Vitaly Kuznetsov
The crash is observed when a service is being disabled host side while
userspace daemon is connected to the device:

[   90.244859] general protection fault:  [#1] SMP
...
[   90.800082] Call Trace:
[   90.800082]  [] __fput+0xc8/0x1f0
[   90.800082]  [] fput+0xe/0x10
...
[   90.800082]  [] do_signal+0x28/0x580
[   90.800082]  [] ? finish_task_switch+0xa6/0x180
[   90.800082]  [] ? __schedule+0x28f/0x870
[   90.800082]  [] ? hvt_op_read+0x12a/0x140 [hv_utils]
...

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_utils_transport.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 31c2f86..ee20b50 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -155,13 +155,22 @@ static int hvt_op_open(struct inode *inode, struct file 
*file)
return ret;
 }
 
+static void hvt_transport_free(struct hvutil_transport *hvt)
+{
+   misc_deregister(>mdev);
+   kfree(hvt->outmsg);
+   kfree(hvt);
+}
+
 static int hvt_op_release(struct inode *inode, struct file *file)
 {
struct hvutil_transport *hvt;
+   int mode_old;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
mutex_lock(>lock);
+   mode_old = hvt->mode;
if (hvt->mode != HVUTIL_TRANSPORT_DESTROY)
hvt->mode = HVUTIL_TRANSPORT_INIT;
/*
@@ -171,6 +180,9 @@ static int hvt_op_release(struct inode *inode, struct file 
*file)
hvt_reset(hvt);
mutex_unlock(>lock);
 
+   if (mode_old == HVUTIL_TRANSPORT_DESTROY)
+   hvt_transport_free(hvt);
+
return 0;
 }
 
@@ -304,17 +316,25 @@ err_free_hvt:
 
 void hvutil_transport_destroy(struct hvutil_transport *hvt)
 {
+   int mode_old;
+
mutex_lock(>lock);
+   mode_old = hvt->mode;
hvt->mode = HVUTIL_TRANSPORT_DESTROY;
wake_up_interruptible(>outmsg_q);
mutex_unlock(>lock);
 
+   /*
+* In case we were in 'chardev' mode we still have an open fd so we
+* have to defer freeing the device. Netlink interface can be freed
+* now.
+*/
spin_lock(_list_lock);
list_del(>list);
spin_unlock(_list_lock);
if (hvt->cn_id.idx > 0 && hvt->cn_id.val > 0)
cn_del_callback(>cn_id);
-   misc_deregister(>mdev);
-   kfree(hvt->outmsg);
-   kfree(hvt);
+
+   if (mode_old != HVUTIL_TRANSPORT_CHARDEV)
+   hvt_transport_free(hvt);
 }
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH net-next] hv_netvsc: Fix race condition on Multi-Send Data field

2015-12-11 Thread Vitaly Kuznetsov
Haiyang Zhang <haiya...@microsoft.com> writes:

> In commit 2a04ae8acb14 ("hv_netvsc: remove locking in netvsc_send()"), the
> locking for MSD (Multi-Send Data) field was removed. This could cause a
> race condition between RNDIS control messages and data packets processing,
> because these two types of traffic are not synchronized.
> This patch fixes this issue by sending control messages out directly
> without reading MSD field.
>
> Signed-off-by: Haiyang Zhang <haiya...@microsoft.com>
> Reviewed-by: K. Y. Srinivasan <k...@microsoft.com>
> ---
>  drivers/net/hyperv/netvsc.c |9 +
>  1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 02bab9a..059fc52 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -867,6 +867,14 @@ int netvsc_send(struct hv_device *device,
>   packet->send_buf_index = NETVSC_INVALID_INDEX;
>   packet->cp_partial = false;
>  
> + /* Send control message directly without accessing msd (Multi-Send
> +  * Data) field which may be changed during data packet processing.
> +  */
> + if (!skb) {
> + cur_send = packet;
> + goto send_now;
> + }
> +

Is is supposed to be applied on top of some other patches? It doesn't
compile on top of current net-next:

drivers/net/hyperv/netvsc.c: In function ‘netvsc_send’:
drivers/net/hyperv/netvsc.c:865:7: error: ‘skb’ undeclared (first use in
this function)
  if (!skb) {
 ^
 
Did you mean to check rndis_msg instead (as skb is not defined here)?

>   msdp = _device->msd[q_idx];
>  
>   /* batch packets in send buffer if possible */
> @@ -939,6 +947,7 @@ int netvsc_send(struct hv_device *device,
>   }
>   }
>  
> +send_now:
>   if (cur_send)
>   ret = netvsc_send_pkt(cur_send, net_device, pb, skb);

I suppose we untangle these two pathes completely: let
rndis_filter_send_request() call netvsc_send_pkt() directly. Please see
my patch attached (note: it should be split in 3 patches if
submitted). If you like the idea I can send it.

-- 
  Vitaly

commit b2784f827d2b7b19d3af6025bbe8be5b1450b88c
Author: Vitaly Kuznetsov <vkuzn...@redhat.com>
Date:   Fri Dec 11 13:29:40 2015 +0100

netvsc: untangle rndis_filter_send_request and netvsc_send

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index f5b2145..03da20c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -189,6 +189,8 @@ int netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct hv_device *device,
 		struct hv_netvsc_packet *packet,
 		struct rndis_message *rndis_msg);
+int netvsc_send_pkt(struct hv_netvsc_packet *packet,
+		struct netvsc_device *net_device);
 void netvsc_linkstatus_callback(struct hv_device *device_obj,
 struct rndis_message *resp);
 void netvsc_xmit_completion(void *context);
@@ -211,6 +213,8 @@ int rndis_filter_receive(struct hv_device *dev,
 int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
 int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac);
 
+struct netvsc_device *netvsc_outbound_net_device(struct hv_device *device);
+
 #define NVSP_INVALID_PROTOCOL_VERSION	((u32)0x)
 
 #define NVSP_PROTOCOL_VERSION_1		2
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 081f14f..eadbb03 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -67,7 +67,7 @@ static void free_netvsc_device(struct netvsc_device *nvdev)
 	kfree(nvdev);
 }
 
-static struct netvsc_device *get_outbound_net_device(struct hv_device *device)
+struct netvsc_device *netvsc_outbound_net_device(struct hv_device *device)
 {
 	struct netvsc_device *net_device;
 
@@ -226,7 +226,7 @@ static int netvsc_init_buf(struct hv_device *device)
 	struct net_device *ndev;
 	int node;
 
-	net_device = get_outbound_net_device(device);
+	net_device = netvsc_outbound_net_device(device);
 	if (!net_device)
 		return -ENODEV;
 	ndev = net_device->ndev;
@@ -478,7 +478,7 @@ static int netvsc_connect_vsp(struct hv_device *device)
 		NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5 };
 	int i, num_ver = 4; /* number of different NVSP versions */
 
-	net_device = get_outbound_net_device(device);
+	net_device = netvsc_outbound_net_device(device);
 	if (!net_device)
 		return -ENODEV;
 	ndev = net_device->ndev;
@@ -740,7 +740,7 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device *net_device,
 	return msg_size;
 }
 
-static inline int netvsc_send_pkt(
+int netvsc_send_pkt(
 	struct hv_netvsc_packet *packet,
 	struct netvsc_device *net_device)
 {
@@ -850,7 +850,7 @@ int netvsc_s

Re: [PATCH net-next] hv_netvsc: Fix race condition on Multi-Send Data field

2015-12-11 Thread Vitaly Kuznetsov
Vitaly Kuznetsov <vkuzn...@redhat.com> writes:

> Haiyang Zhang <haiya...@microsoft.com> writes:
>
>> In commit 2a04ae8acb14 ("hv_netvsc: remove locking in netvsc_send()"), the
>> locking for MSD (Multi-Send Data) field was removed. This could cause a
>> race condition between RNDIS control messages and data packets processing,
>> because these two types of traffic are not synchronized.
>> This patch fixes this issue by sending control messages out directly
>> without reading MSD field.
>>
>> Signed-off-by: Haiyang Zhang <haiya...@microsoft.com>
>> Reviewed-by: K. Y. Srinivasan <k...@microsoft.com>
>> ---
>>  drivers/net/hyperv/netvsc.c |9 +
>>  1 files changed, 9 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
>> index 02bab9a..059fc52 100644
>> --- a/drivers/net/hyperv/netvsc.c
>> +++ b/drivers/net/hyperv/netvsc.c
>> @@ -867,6 +867,14 @@ int netvsc_send(struct hv_device *device,
>>  packet->send_buf_index = NETVSC_INVALID_INDEX;
>>  packet->cp_partial = false;
>>  
>> +/* Send control message directly without accessing msd (Multi-Send
>> + * Data) field which may be changed during data packet processing.
>> + */
>> +if (!skb) {
>> +cur_send = packet;
>> +goto send_now;
>> +}
>> +
>
> Is is supposed to be applied on top of some other patches? It doesn't
> compile on top of current net-next:
>
> drivers/net/hyperv/netvsc.c: In function ‘netvsc_send’:
> drivers/net/hyperv/netvsc.c:865:7: error: ‘skb’ undeclared (first use in
> this function)
>   if (!skb) {
>  ^
>
> Did you mean to check rndis_msg instead (as skb is not defined here)?

Oh, my bad, it was me who was looking at the wrong branch... Sorry for
the noise.

>
>>  msdp = _device->msd[q_idx];
>>  
>>  /* batch packets in send buffer if possible */
>> @@ -939,6 +947,7 @@ int netvsc_send(struct hv_device *device,
>>  }
>>  }
>>  
>> +send_now:
>>  if (cur_send)
>>  ret = netvsc_send_pkt(cur_send, net_device, pb, skb);
>
> I suppose we untangle these two pathes completely: let
> rndis_filter_send_request() call netvsc_send_pkt() directly. Please see
> my patch attached (note: it should be split in 3 patches if
> submitted). If you like the idea I can send it.

This patch will need some changes but the suggestion still stands: let's
untangle sending data and control messages.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 1/5] Drivers: hv: ring_buffer.c: fix comment style

2015-12-08 Thread Vitaly Kuznetsov
Convert 6+-string comments repeating function names to normal kernel-style
comments and fix a couple of other comment style issues. No textual or
functional changes intended.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/ring_buffer.c | 135 +--
 1 file changed, 26 insertions(+), 109 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 70a1a9a..7bca513 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -112,9 +112,7 @@ static bool hv_need_to_signal_on_read(u32 prev_write_sz,
u32 read_loc = rbi->ring_buffer->read_index;
u32 pending_sz = rbi->ring_buffer->pending_send_sz;
 
-   /*
-* If the other end is not blocked on write don't bother.
-*/
+   /* If the other end is not blocked on write don't bother. */
if (pending_sz == 0)
return false;
 
@@ -128,12 +126,7 @@ static bool hv_need_to_signal_on_read(u32 prev_write_sz,
return false;
 }
 
-/*
- * hv_get_next_write_location()
- *
- * Get the next write location for the specified ring buffer
- *
- */
+/* Get the next write location for the specified ring buffer. */
 static inline u32
 hv_get_next_write_location(struct hv_ring_buffer_info *ring_info)
 {
@@ -142,12 +135,7 @@ hv_get_next_write_location(struct hv_ring_buffer_info 
*ring_info)
return next;
 }
 
-/*
- * hv_set_next_write_location()
- *
- * Set the next write location for the specified ring buffer
- *
- */
+/* Set the next write location for the specified ring buffer. */
 static inline void
 hv_set_next_write_location(struct hv_ring_buffer_info *ring_info,
 u32 next_write_location)
@@ -155,11 +143,7 @@ hv_set_next_write_location(struct hv_ring_buffer_info 
*ring_info,
ring_info->ring_buffer->write_index = next_write_location;
 }
 
-/*
- * hv_get_next_read_location()
- *
- * Get the next read location for the specified ring buffer
- */
+/* Get the next read location for the specified ring buffer. */
 static inline u32
 hv_get_next_read_location(struct hv_ring_buffer_info *ring_info)
 {
@@ -169,10 +153,8 @@ hv_get_next_read_location(struct hv_ring_buffer_info 
*ring_info)
 }
 
 /*
- * hv_get_next_readlocation_withoffset()
- *
  * Get the next read location + offset for the specified ring buffer.
- * This allows the caller to skip
+ * This allows the caller to skip.
  */
 static inline u32
 hv_get_next_readlocation_withoffset(struct hv_ring_buffer_info *ring_info,
@@ -186,13 +168,7 @@ hv_get_next_readlocation_withoffset(struct 
hv_ring_buffer_info *ring_info,
return next;
 }
 
-/*
- *
- * hv_set_next_read_location()
- *
- * Set the next read location for the specified ring buffer
- *
- */
+/* Set the next read location for the specified ring buffer. */
 static inline void
 hv_set_next_read_location(struct hv_ring_buffer_info *ring_info,
u32 next_read_location)
@@ -201,12 +177,7 @@ hv_set_next_read_location(struct hv_ring_buffer_info 
*ring_info,
 }
 
 
-/*
- *
- * hv_get_ring_buffer()
- *
- * Get the start of the ring buffer
- */
+/* Get the start of the ring buffer. */
 static inline void *
 hv_get_ring_buffer(struct hv_ring_buffer_info *ring_info)
 {
@@ -214,25 +185,14 @@ hv_get_ring_buffer(struct hv_ring_buffer_info *ring_info)
 }
 
 
-/*
- *
- * hv_get_ring_buffersize()
- *
- * Get the size of the ring buffer
- */
+/* Get the size of the ring buffer. */
 static inline u32
 hv_get_ring_buffersize(struct hv_ring_buffer_info *ring_info)
 {
return ring_info->ring_datasize;
 }
 
-/*
- *
- * hv_get_ring_bufferindices()
- *
- * Get the read and write indices as u64 of the specified ring buffer
- *
- */
+/* Get the read and write indices as u64 of the specified ring buffer. */
 static inline u64
 hv_get_ring_bufferindices(struct hv_ring_buffer_info *ring_info)
 {
@@ -240,12 +200,8 @@ hv_get_ring_bufferindices(struct hv_ring_buffer_info 
*ring_info)
 }
 
 /*
- *
- * hv_copyfrom_ringbuffer()
- *
  * Helper routine to copy to source from ring buffer.
  * Assume there is enough room. Handles wrap-around in src case only!!
- *
  */
 static u32 hv_copyfrom_ringbuffer(
struct hv_ring_buffer_info  *ring_info,
@@ -277,12 +233,8 @@ static u32 hv_copyfrom_ringbuffer(
 
 
 /*
- *
- * hv_copyto_ringbuffer()
- *
  * Helper routine to copy from source to ring buffer.
  * Assume there is enough room. Handles wrap-around in dest case only!!
- *
  */
 static u32 hv_copyto_ringbuffer(
struct hv_ring_buffer_info  *ring_info,
@@ -308,13 +260,7 @@ static u32 hv_copyto_ringbuffer(
return start_write_offset;
 }
 
-/*
- *
- * hv_ringbuffer_get_debuginfo()
- *
- * Get various debug metrics for the specified ring buffer
- *
- */
+/* Get various debug metrics for the specified ring buffer. */
 void hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
struct hv_ring_bu

[PATCH v2 5/5] Drivers: hv: ring_buffer: eliminate hv_ringbuffer_peek()

2015-12-08 Thread Vitaly Kuznetsov
Currently, there is only one user for hv_ringbuffer_read()/
hv_ringbuffer_peak() functions and the usage of these functions is:
- insecure as we drop ring_lock between them, someone else (in theory
  only) can acquire it in between;
- non-optimal as we do a number of things (acquire/release the above
  mentioned lock, calculate available space on the ring, ...) twice and
  this path is performance-critical.

Remove hv_ringbuffer_peek() moving the logic from __vmbus_recvpacket() to
hv_ringbuffer_read().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel.c  | 30 ++
 drivers/hv/hyperv_vmbus.h | 11 +++-
 drivers/hv/ring_buffer.c  | 65 +++
 3 files changed, 42 insertions(+), 64 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index c72271d..a4c800d 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -886,37 +886,11 @@ __vmbus_recvpacket(struct vmbus_channel *channel, void 
*buffer,
   u32 bufferlen, u32 *buffer_actual_len, u64 *requestid,
   bool raw)
 {
-   struct vmpacket_descriptor desc;
-   u32 packetlen;
-   u32 userlen;
int ret;
bool signal = false;
 
-   *buffer_actual_len = 0;
-   *requestid = 0;
-
-
-   ret = hv_ringbuffer_peek(>inbound, ,
-sizeof(struct vmpacket_descriptor));
-   if (ret != 0)
-   return 0;
-
-   packetlen = desc.len8 << 3;
-   if (!raw)
-   userlen = packetlen - (desc.offset8 << 3);
-   else
-   userlen = packetlen;
-
-   *buffer_actual_len = userlen;
-
-   if (userlen > bufferlen)
-   return -ENOBUFS;
-
-   *requestid = desc.trans_id;
-
-   /* Copy over the packet to the user buffer */
-   ret = hv_ringbuffer_read(>inbound, buffer, userlen,
-raw ? 0 : desc.offset8 << 3, );
+   ret = hv_ringbuffer_read(>inbound, buffer, bufferlen,
+buffer_actual_len, requestid, , raw);
 
if (signal)
vmbus_setevent(channel);
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 3782636..d5ee081 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -614,14 +614,9 @@ int hv_ringbuffer_write(struct hv_ring_buffer_info 
*ring_info,
struct kvec *kv_list,
u32 kv_count, bool *signal);
 
-int hv_ringbuffer_peek(struct hv_ring_buffer_info *ring_info, void *buffer,
-  u32 buflen);
-
-int hv_ringbuffer_read(struct hv_ring_buffer_info *ring_info,
-  void *buffer,
-  u32 buflen,
-  u32 offset, bool *signal);
-
+int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
+  void *buffer, u32 buflen, u32 *buffer_actual_len,
+  u64 *requestid, bool *signal, bool raw);
 
 void hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
struct hv_ring_buffer_debug_info *debug_info);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 07f9408..b53702c 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -380,30 +380,59 @@ int hv_ringbuffer_write(struct hv_ring_buffer_info 
*outring_info,
return 0;
 }
 
-static inline int __hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
-  void *buffer, u32 buflen, u32 offset,
-  bool *signal, bool advance)
+int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
+  void *buffer, u32 buflen, u32 *buffer_actual_len,
+  u64 *requestid, bool *signal, bool raw)
 {
u32 bytes_avail_towrite;
u32 bytes_avail_toread;
u32 next_read_location = 0;
u64 prev_indices = 0;
unsigned long flags;
+   struct vmpacket_descriptor desc;
+   u32 offset;
+   u32 packetlen;
+   int ret = 0;
 
if (buflen <= 0)
return -EINVAL;
 
spin_lock_irqsave(_info->ring_lock, flags);
 
+   *buffer_actual_len = 0;
+   *requestid = 0;
+
hv_get_ringbuffer_availbytes(inring_info,
_avail_toread,
_avail_towrite);
 
/* Make sure there is something to read */
-   if (bytes_avail_toread < buflen) {
-   spin_unlock_irqrestore(_info->ring_lock, flags);
+   if (bytes_avail_toread < sizeof(desc)) {
+   /*
+* No error is set when there is even no header, drivers are
+* supposed to analyze buffer_actual_len.
+*/
+   goto out_unlock;
+   }
 
-   return -EAGAIN;
+   next_read_location = hv_get_next_rea

[PATCH v2 3/5] Drivers: hv: ring_buffer: remove code duplication from hv_ringbuffer_peek/read()

2015-12-08 Thread Vitaly Kuznetsov
hv_ringbuffer_peek() does the same as hv_ringbuffer_read() without
advancing the read index. The only functional change this patch brings
is moving hv_need_to_signal_on_read() call under the ring_lock but this
function is just a couple of comparisons.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/ring_buffer.c | 68 ++--
 1 file changed, 25 insertions(+), 43 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 7bca513..07f9408 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -380,47 +380,9 @@ int hv_ringbuffer_write(struct hv_ring_buffer_info 
*outring_info,
return 0;
 }
 
-
-/* Read without advancing the read index. */
-int hv_ringbuffer_peek(struct hv_ring_buffer_info *Inring_info,
-  void *Buffer, u32 buflen)
-{
-   u32 bytes_avail_towrite;
-   u32 bytes_avail_toread;
-   u32 next_read_location = 0;
-   unsigned long flags;
-
-   spin_lock_irqsave(_info->ring_lock, flags);
-
-   hv_get_ringbuffer_availbytes(Inring_info,
-   _avail_toread,
-   _avail_towrite);
-
-   /* Make sure there is something to read */
-   if (bytes_avail_toread < buflen) {
-
-   spin_unlock_irqrestore(_info->ring_lock, flags);
-
-   return -EAGAIN;
-   }
-
-   /* Convert to byte offset */
-   next_read_location = hv_get_next_read_location(Inring_info);
-
-   next_read_location = hv_copyfrom_ringbuffer(Inring_info,
-   Buffer,
-   buflen,
-   next_read_location);
-
-   spin_unlock_irqrestore(_info->ring_lock, flags);
-
-   return 0;
-}
-
-
-/* Read and advance the read index. */
-int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info, void *buffer,
-  u32 buflen, u32 offset, bool *signal)
+static inline int __hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
+  void *buffer, u32 buflen, u32 offset,
+  bool *signal, bool advance)
 {
u32 bytes_avail_towrite;
u32 bytes_avail_toread;
@@ -452,6 +414,9 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info 
*inring_info, void *buffer,
buflen,
next_read_location);
 
+   if (!advance)
+   goto out_unlock;
+
next_read_location = hv_copyfrom_ringbuffer(inring_info,
_indices,
sizeof(u64),
@@ -467,9 +432,26 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info 
*inring_info, void *buffer,
/* Update the read index */
hv_set_next_read_location(inring_info, next_read_location);
 
-   spin_unlock_irqrestore(_info->ring_lock, flags);
-
*signal = hv_need_to_signal_on_read(bytes_avail_towrite, inring_info);
 
+out_unlock:
+   spin_unlock_irqrestore(_info->ring_lock, flags);
return 0;
 }
+
+/* Read from ring buffer without advancing the read index. */
+int hv_ringbuffer_peek(struct hv_ring_buffer_info *inring_info,
+  void *buffer, u32 buflen)
+{
+   return __hv_ringbuffer_read(inring_info, buffer, buflen,
+   0, NULL, false);
+}
+
+/* Read from ring buffer and advance the read index. */
+int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
+  void *buffer, u32 buflen, u32 offset,
+  bool *signal)
+{
+   return __hv_ringbuffer_read(inring_info, buffer, buflen,
+   offset, signal, true);
+}
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 4/5] Drivers: hv: remove code duplication between vmbus_recvpacket()/vmbus_recvpacket_raw()

2015-12-08 Thread Vitaly Kuznetsov
vmbus_recvpacket() and vmbus_recvpacket_raw() are almost identical but
there are two discrepancies:
1) vmbus_recvpacket() doesn't propagate errors from hv_ringbuffer_read()
   which looks like it is not desired.
2) There is an error message printed in packetlen > bufferlen case in
   vmbus_recvpacket(). I'm removing it as it is usless for users to see
   such messages and /vmbus_recvpacket_raw() doesn't have it.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/channel.c | 65 ++--
 1 file changed, 22 insertions(+), 43 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index c4dcab0..c72271d 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -881,8 +881,10 @@ EXPORT_SYMBOL_GPL(vmbus_sendpacket_multipagebuffer);
  *
  * Mainly used by Hyper-V drivers.
  */
-int vmbus_recvpacket(struct vmbus_channel *channel, void *buffer,
-   u32 bufferlen, u32 *buffer_actual_len, u64 *requestid)
+static inline int
+__vmbus_recvpacket(struct vmbus_channel *channel, void *buffer,
+  u32 bufferlen, u32 *buffer_actual_len, u64 *requestid,
+  bool raw)
 {
struct vmpacket_descriptor desc;
u32 packetlen;
@@ -900,27 +902,34 @@ int vmbus_recvpacket(struct vmbus_channel *channel, void 
*buffer,
return 0;
 
packetlen = desc.len8 << 3;
-   userlen = packetlen - (desc.offset8 << 3);
+   if (!raw)
+   userlen = packetlen - (desc.offset8 << 3);
+   else
+   userlen = packetlen;
 
*buffer_actual_len = userlen;
 
-   if (userlen > bufferlen) {
-
-   pr_err("Buffer too small - got %d needs %d\n",
-  bufferlen, userlen);
-   return -ETOOSMALL;
-   }
+   if (userlen > bufferlen)
+   return -ENOBUFS;
 
*requestid = desc.trans_id;
 
/* Copy over the packet to the user buffer */
ret = hv_ringbuffer_read(>inbound, buffer, userlen,
-(desc.offset8 << 3), );
+raw ? 0 : desc.offset8 << 3, );
 
if (signal)
vmbus_setevent(channel);
 
-   return 0;
+   return ret;
+}
+
+int vmbus_recvpacket(struct vmbus_channel *channel, void *buffer,
+u32 bufferlen, u32 *buffer_actual_len,
+u64 *requestid)
+{
+   return __vmbus_recvpacket(channel, buffer, bufferlen,
+ buffer_actual_len, requestid, false);
 }
 EXPORT_SYMBOL(vmbus_recvpacket);
 
@@ -931,37 +940,7 @@ int vmbus_recvpacket_raw(struct vmbus_channel *channel, 
void *buffer,
  u32 bufferlen, u32 *buffer_actual_len,
  u64 *requestid)
 {
-   struct vmpacket_descriptor desc;
-   u32 packetlen;
-   int ret;
-   bool signal = false;
-
-   *buffer_actual_len = 0;
-   *requestid = 0;
-
-
-   ret = hv_ringbuffer_peek(>inbound, ,
-sizeof(struct vmpacket_descriptor));
-   if (ret != 0)
-   return 0;
-
-
-   packetlen = desc.len8 << 3;
-
-   *buffer_actual_len = packetlen;
-
-   if (packetlen > bufferlen)
-   return -ENOBUFS;
-
-   *requestid = desc.trans_id;
-
-   /* Copy over the entire packet to the user buffer */
-   ret = hv_ringbuffer_read(>inbound, buffer, packetlen, 0,
-);
-
-   if (signal)
-   vmbus_setevent(channel);
-
-   return ret;
+   return __vmbus_recvpacket(channel, buffer, bufferlen,
+ buffer_actual_len, requestid, true);
 }
 EXPORT_SYMBOL_GPL(vmbus_recvpacket_raw);
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 2/5] Drivers: hv: ring_buffer: remove stray smp_read_barrier_depends()

2015-12-08 Thread Vitaly Kuznetsov
smp_read_barrier_depends() does nothing on almost all arcitectures
including x86 and having it in the beginning of
hv_get_ringbuffer_availbytes() does not provide any guarantees anyway.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 include/linux/hyperv.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 8fdc17b..1777dc8 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -141,8 +141,6 @@ hv_get_ringbuffer_availbytes(struct hv_ring_buffer_info 
*rbi,
 {
u32 read_loc, write_loc, dsize;
 
-   smp_read_barrier_depends();
-
/* Capture the read/write indices before they changed */
read_loc = rbi->ring_buffer->read_index;
write_loc = rbi->ring_buffer->write_index;
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 0/5] Drivers: hv: ring_buffer: do some code cleanup and optimization

2015-12-08 Thread Vitaly Kuznetsov
Changes since v1: added PATCH 4/5 and 5/5.

1) Fix comment style in ring_buffer.c
2) Remove unneeded smp_read_barrier_depends() from hyperv.h
3) Unify hv_ringbuffer_peek/read() functions
4) Unify vmbus_recvpacket()/vmbus_recvpacket_raw() functions
5) Eliminate hv_ringbuffer_peek().

Vitaly Kuznetsov (5):
  Drivers: hv: ring_buffer.c: fix comment style
  Drivers: hv: ring_buffer: remove stray smp_read_barrier_depends()
  Drivers: hv: ring_buffer: remove code duplication from
hv_ringbuffer_peek/read()
  Drivers: hv: remove code duplication between
vmbus_recvpacket()/vmbus_recvpacket_raw()
  Drivers: hv: ring_buffer: eliminate hv_ringbuffer_peek()

 drivers/hv/channel.c  |  81 -
 drivers/hv/hyperv_vmbus.h |  11 +--
 drivers/hv/ring_buffer.c  | 218 ++
 include/linux/hyperv.h|   2 -
 4 files changed, 83 insertions(+), 229 deletions(-)

-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] storvsc: add more logging for error and warning messages

2015-12-04 Thread Vitaly Kuznetsov
Long Li  writes:

> Introduce a logging level for storvsc to log certain error/warning
> messages. Those messages are helpful in some environments,
> e.g. Microsoft Azure, for customer support and troubleshooting
> purposes.

I have an alternative suggestion: let's use dynamic debug! Basically, we
need to convert all non-error logging to using dev_dbg() and this can be
enabled dynamically when needed, even reboot won't be required.

>
> Signed-off-by: Long Li 
> ---
>  drivers/scsi/storvsc_drv.c | 30 +-
>  1 file changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 40c43ae..afa1647 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -164,6 +164,21 @@ static int sense_buffer_size = 
> PRE_WIN8_STORVSC_SENSE_BUFFER_SIZE;
>  */
>  static int vmstor_proto_version;
>
> +#define STORVSC_LOGGING_NONE 0
> +#define STORVSC_LOGGING_ERROR1
> +#define STORVSC_LOGGING_WARN 2
> +
> +static int logging_level = STORVSC_LOGGING_ERROR;
> +module_param(logging_level, int, S_IRUGO|S_IWUSR);
> +MODULE_PARM_DESC(logging_level,
> + "Logging level, 0 - None, 1 - Error (default), 2 - Warning.");
> +
> +static inline bool do_logging(int level)
> +{
> + return (logging_level >= level) ? true : false;
> +}
> +
> +
>  struct vmscsi_win8_extension {
>   /*
>* The following were added in Windows 8
> @@ -1183,7 +1198,7 @@ static void storvsc_command_completion(struct 
> storvsc_cmd_request *cmd_request)
>
>   scmnd->result = vm_srb->scsi_status;
>
> - if (scmnd->result) {
> + if (scmnd->result && do_logging(STORVSC_LOGGING_ERROR)) {
>   if (scsi_normalize_sense(scmnd->sense_buffer,
>   SCSI_SENSE_BUFFERSIZE, _hdr))
>   scsi_print_sense_hdr(scmnd->device, "storvsc",
> @@ -1239,12 +1254,25 @@ static void storvsc_on_io_completion(struct hv_device 
> *device,
>   stor_pkt->vm_srb.sense_info_length =
>   vstor_packet->vm_srb.sense_info_length;
>
> + if (vstor_packet->vm_srb.scsi_status != 0 ||
> + vstor_packet->vm_srb.srb_status != SRB_STATUS_SUCCESS)
> + if (do_logging(STORVSC_LOGGING_WARN))
> + dev_warn(>device,
> + "cmd 0x%x scsi status 0x%x srb status 0x%x\n",
> + stor_pkt->vm_srb.cdb[0],
> + vstor_packet->vm_srb.scsi_status,
> + vstor_packet->vm_srb.srb_status);
>
>   if ((vstor_packet->vm_srb.scsi_status & 0xFF) == 0x02) {
>   /* CHECK_CONDITION */
>   if (vstor_packet->vm_srb.srb_status &
>   SRB_STATUS_AUTOSENSE_VALID) {
>   /* autosense data available */
> + if (do_logging(STORVSC_LOGGING_WARN))
> + dev_warn(>device,
> + "stor pkt %p autosense data valid - len 
> %d\n",
> + request,
> + vstor_packet->vm_srb.sense_info_length);
>
>   memcpy(request->cmd->sense_buffer,
>  vstor_packet->vm_srb.sense_data,

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH net-next] hv_netvsc: rework link status change handling

2015-11-27 Thread Vitaly Kuznetsov
There are several issues in hv_netvsc driver with regards to link status
change handling:
- RNDIS_STATUS_NETWORK_CHANGE results in calling userspace helper doing
  '/etc/init.d/network restart' and this is inappropriate and broken for
  many reasons.
- link_watch infrastructure only sends one notification per second and
  in case of e.g. paired disconnect/connect events we get only one
  notification with last status. This makes it impossible to handle such
  situations in userspace.

Redo link status changes handling in the following way:
- Create a list of reconfig events in network device context.
- On a reconfig event add it to the list of events and schedule
  netvsc_link_change().
- In netvsc_link_change() ensure 2-second delay between link status
  changes.
- Handle RNDIS_STATUS_NETWORK_CHANGE as a paired disconnect/connect event.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h |  14 +++-
 drivers/net/hyperv/netvsc_drv.c | 142 +++-
 2 files changed, 108 insertions(+), 48 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 5fa98f5..7661a12 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -177,7 +177,6 @@ struct rndis_device {
 
enum rndis_device_state state;
bool link_state;
-   bool link_change;
atomic_t new_req_id;
 
spinlock_t request_lock;
@@ -644,11 +643,24 @@ struct netvsc_stats {
struct u64_stats_sync syncp;
 };
 
+struct netvsc_reconfig {
+   struct list_head list;
+   u32 event;
+};
+
 /* The context of the netvsc device  */
 struct net_device_context {
/* point back to our device context */
struct hv_device *device_ctx;
+   /* reconfigure work */
struct delayed_work dwork;
+   /* last reconfig time */
+   unsigned long last_reconfig;
+   /* reconfig events */
+   struct list_head reconfig_events;
+   /* list protection */
+   spinlock_t lock;
+
struct work_struct work;
u32 msg_enable; /* debug level */
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 409b48e..268a058 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -42,6 +42,7 @@
 
 
 #define RING_SIZE_MIN 64
+#define LINKCHANGE_INT (2 * HZ)
 static int ring_size = 128;
 module_param(ring_size, int, S_IRUGO);
 MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
@@ -647,37 +648,33 @@ void netvsc_linkstatus_callback(struct hv_device 
*device_obj,
struct net_device *net;
struct net_device_context *ndev_ctx;
struct netvsc_device *net_device;
-   struct rndis_device *rdev;
-
-   net_device = hv_get_drvdata(device_obj);
-   rdev = net_device->extension;
+   struct netvsc_reconfig *event;
+   unsigned long flags;
 
-   switch (indicate->status) {
-   case RNDIS_STATUS_MEDIA_CONNECT:
-   rdev->link_state = false;
-   break;
-   case RNDIS_STATUS_MEDIA_DISCONNECT:
-   rdev->link_state = true;
-   break;
-   case RNDIS_STATUS_NETWORK_CHANGE:
-   rdev->link_change = true;
-   break;
-   default:
+   /* Handle link change statuses only */
+   if (indicate->status != RNDIS_STATUS_NETWORK_CHANGE &&
+   indicate->status != RNDIS_STATUS_MEDIA_CONNECT &&
+   indicate->status != RNDIS_STATUS_MEDIA_DISCONNECT)
return;
-   }
 
+   net_device = hv_get_drvdata(device_obj);
net = net_device->ndev;
 
if (!net || net->reg_state != NETREG_REGISTERED)
return;
 
ndev_ctx = netdev_priv(net);
-   if (!rdev->link_state) {
-   schedule_delayed_work(_ctx->dwork, 0);
-   schedule_delayed_work(_ctx->dwork, msecs_to_jiffies(20));
-   } else {
-   schedule_delayed_work(_ctx->dwork, 0);
-   }
+
+   event = kzalloc(sizeof(*event), GFP_ATOMIC);
+   if (!event)
+   return;
+   event->event = indicate->status;
+
+   spin_lock_irqsave(_ctx->lock, flags);
+   list_add_tail(>list, _ctx->reconfig_events);
+   spin_unlock_irqrestore(_ctx->lock, flags);
+
+   schedule_delayed_work(_ctx->dwork, 0);
 }
 
 /*
@@ -1009,12 +1006,9 @@ static const struct net_device_ops device_ops = {
 };
 
 /*
- * Send GARP packet to network peers after migrations.
- * After Quick Migration, the network is not immediately operational in the
- * current context when receiving RNDIS_STATUS_MEDIA_CONNECT event. So, add
- * another netif_notify_peers() into a delayed work, otherwise GARP packet
- * will not be sent after quick migration, and cause network disconnection.
- * Also, we update the carrier status here.
+ * Handle 

Re: [PATCH net-next 01/10] hv_netvsc: Resize some of the variables in hv_netvsc_packet

2015-11-24 Thread Vitaly Kuznetsov
"K. Y. Srinivasan"  writes:

> As part of reducing the size of the hv_netvsc_packet, resize some of the
> variables based on their usage.
>
> Signed-off-by: K. Y. Srinivasan 
> Reviewed-by: Haiyang Zhang 
> ---
>  drivers/net/hyperv/hyperv_net.h |   14 +++---
>  1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 5fa98f5..972e562 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -127,11 +127,11 @@ struct ndis_tcp_ip_checksum_info;
>   */
>  struct hv_netvsc_packet {
>   /* Bookkeeping stuff */
> - u32 status;
> + u8 status;
>
> - bool is_data_pkt;
> - bool xmit_more; /* from skb */
> - bool cp_partial; /* partial copy into send buffer */
> + u8 is_data_pkt;
> + u8 xmit_more; /* from skb */
> + u8 cp_partial; /* partial copy into send buffer */

These are flags so we can make them bitfields and save even more space.

>
>   u16 vlan_tci;
>
> @@ -147,13 +147,13 @@ struct hv_netvsc_packet {
>   /* This points to the memory after page_buf */
>   struct rndis_message *rndis_msg;
>
> - u32 rmsg_size; /* RNDIS header and PPI size */
> - u32 rmsg_pgcnt; /* page count of RNDIS header and PPI */
> + u8 rmsg_size; /* RNDIS header and PPI size */
> + u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */
>
>   u32 total_data_buflen;
>   /* Points to the send/receive buffer where the ethernet frame is */
>   void *data;
> - u32 page_buf_cnt;
> + u8 page_buf_cnt;
>   struct hv_page_buffer *page_buf;
>  };

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH net-next 08/10] hv_netvsc: Don't ask for additional head room in the skb

2015-11-24 Thread Vitaly Kuznetsov
"K. Y. Srinivasan"  writes:

> The rndis header is 116 bytes big and can be placed in the default
> head room that will be available in the skb.

We have the following in include/linux/netdevice.h:

#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
# if defined(CONFIG_MAC80211_MESH)
#  define LL_MAX_HEADER 128
# else
#  define LL_MAX_HEADER 96
# endif
#else
# define LL_MAX_HEADER 32
#endif

In case someone disables MAC80211_MESH in his kernel config we're in
troubles again. I suggest we add additional patch here making sure it is
128 bytes for Hyper-V:

#if defined(CONFIG_HYPERV_NET)
# define LL_MAX_HEADER 128
#elseif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
...

> Since the netvsc packet
> is less than 48 bytes, we can use the skb control buffer
> for the netvsc packet. With these changes we don't need to
> ask for additional head room.
>
> Signed-off-by: K. Y. Srinivasan 
> Reviewed-by: Haiyang Zhang 
> ---
>  drivers/net/hyperv/hyperv_net.h |3 +++
>  drivers/net/hyperv/netvsc_drv.c |   28 +---
>  2 files changed, 12 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 9504ca9..e15dc2c 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -124,6 +124,9 @@ struct ndis_tcp_ip_checksum_info;
>  /*
>   * Represent netvsc packet which contains 1 RNDIS and 1 ethernet frame
>   * within the RNDIS
> + *
> + * The size of this structure is less than 48 bytes and we can now
> + * place this structure in the skb->cb field.
>   */
>  struct hv_netvsc_packet {
>   /* Bookkeeping stuff */
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 947b778..9b6c507 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -432,7 +432,6 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
> net_device *net)
>   u32 net_trans_info;
>   u32 hash;
>   u32 skb_length;
> - u32 pkt_sz;
>   struct hv_page_buffer page_buf[MAX_PAGE_BUFFER_COUNT];
>   struct netvsc_stats *tx_stats = this_cpu_ptr(net_device_ctx->tx_stats);
>
> @@ -460,16 +459,19 @@ check_size:
>   goto check_size;
>   }
>
> - pkt_sz = sizeof(struct hv_netvsc_packet) + RNDIS_AND_PPI_SIZE;
> -
> - ret = skb_cow_head(skb, pkt_sz);
> + /*
> +  * Place the rndis header in the skb head room and
> +  * the skb->cb will be used for hv_netvsc_packet
> +  * structure.
> +  */
> + ret = skb_cow_head(skb, RNDIS_AND_PPI_SIZE);
>   if (ret) {
>   netdev_err(net, "unable to alloc hv_netvsc_packet\n");
>   ret = -ENOMEM;
>   goto drop;
>   }
> - /* Use the headroom for building up the packet */
> - packet = (struct hv_netvsc_packet *)skb->head;
> + /* Use the skb control buffer for building up the packet */
> + packet = (struct hv_netvsc_packet *)skb->cb;
>
>   packet->status = 0;
>   packet->xmit_more = skb->xmit_more;
> @@ -482,8 +484,7 @@ check_size:
>   packet->is_data_pkt = true;
>   packet->total_data_buflen = skb->len;
>
> - rndis_msg = (struct rndis_message *)((unsigned long)packet +
> - sizeof(struct hv_netvsc_packet));
> + rndis_msg = (struct rndis_message *)skb->head;
>
>   memset(rndis_msg, 0, RNDIS_AND_PPI_SIZE);
>
> @@ -1071,16 +1072,12 @@ static int netvsc_probe(struct hv_device *dev,
>   struct netvsc_device_info device_info;
>   struct netvsc_device *nvdev;
>   int ret;
> - u32 max_needed_headroom;
>
>   net = alloc_etherdev_mq(sizeof(struct net_device_context),
>   num_online_cpus());
>   if (!net)
>   return -ENOMEM;
>
> - max_needed_headroom = sizeof(struct hv_netvsc_packet) +
> -   RNDIS_AND_PPI_SIZE;
> -
>   netif_carrier_off(net);
>
>   net_device_ctx = netdev_priv(net);
> @@ -1116,13 +1113,6 @@ static int netvsc_probe(struct hv_device *dev,
>   net->ethtool_ops = _ops;
>   SET_NETDEV_DEV(net, >device);
>
> - /*
> -  * Request additional head room in the skb.
> -  * We will use this space to build the rndis
> -  * heaser and other state we need to maintain.
> -  */
> - net->needed_headroom = max_needed_headroom;
> -
>   /* Notify the netvsc driver of the new device */
>   memset(_info, 0, sizeof(device_info));
>   device_info.ring_size = ring_size;

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 3/3] Drivers: hv: ring_buffer: remove code duplication from hv_ringbuffer_peek/read()

2015-11-23 Thread Vitaly Kuznetsov
hv_ringbuffer_peek() does the same as hv_ringbuffer_read() without
advancing the read index. The only functional change this patch brings
is moving hv_need_to_signal_on_read() call under the ring_lock but this
function is just a couple of comparisons.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/ring_buffer.c | 68 ++--
 1 file changed, 25 insertions(+), 43 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 7bca513..07f9408 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -380,47 +380,9 @@ int hv_ringbuffer_write(struct hv_ring_buffer_info 
*outring_info,
return 0;
 }
 
-
-/* Read without advancing the read index. */
-int hv_ringbuffer_peek(struct hv_ring_buffer_info *Inring_info,
-  void *Buffer, u32 buflen)
-{
-   u32 bytes_avail_towrite;
-   u32 bytes_avail_toread;
-   u32 next_read_location = 0;
-   unsigned long flags;
-
-   spin_lock_irqsave(_info->ring_lock, flags);
-
-   hv_get_ringbuffer_availbytes(Inring_info,
-   _avail_toread,
-   _avail_towrite);
-
-   /* Make sure there is something to read */
-   if (bytes_avail_toread < buflen) {
-
-   spin_unlock_irqrestore(_info->ring_lock, flags);
-
-   return -EAGAIN;
-   }
-
-   /* Convert to byte offset */
-   next_read_location = hv_get_next_read_location(Inring_info);
-
-   next_read_location = hv_copyfrom_ringbuffer(Inring_info,
-   Buffer,
-   buflen,
-   next_read_location);
-
-   spin_unlock_irqrestore(_info->ring_lock, flags);
-
-   return 0;
-}
-
-
-/* Read and advance the read index. */
-int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info, void *buffer,
-  u32 buflen, u32 offset, bool *signal)
+static inline int __hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
+  void *buffer, u32 buflen, u32 offset,
+  bool *signal, bool advance)
 {
u32 bytes_avail_towrite;
u32 bytes_avail_toread;
@@ -452,6 +414,9 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info 
*inring_info, void *buffer,
buflen,
next_read_location);
 
+   if (!advance)
+   goto out_unlock;
+
next_read_location = hv_copyfrom_ringbuffer(inring_info,
_indices,
sizeof(u64),
@@ -467,9 +432,26 @@ int hv_ringbuffer_read(struct hv_ring_buffer_info 
*inring_info, void *buffer,
/* Update the read index */
hv_set_next_read_location(inring_info, next_read_location);
 
-   spin_unlock_irqrestore(_info->ring_lock, flags);
-
*signal = hv_need_to_signal_on_read(bytes_avail_towrite, inring_info);
 
+out_unlock:
+   spin_unlock_irqrestore(_info->ring_lock, flags);
return 0;
 }
+
+/* Read from ring buffer without advancing the read index. */
+int hv_ringbuffer_peek(struct hv_ring_buffer_info *inring_info,
+  void *buffer, u32 buflen)
+{
+   return __hv_ringbuffer_read(inring_info, buffer, buflen,
+   0, NULL, false);
+}
+
+/* Read from ring buffer and advance the read index. */
+int hv_ringbuffer_read(struct hv_ring_buffer_info *inring_info,
+  void *buffer, u32 buflen, u32 offset,
+  bool *signal)
+{
+   return __hv_ringbuffer_read(inring_info, buffer, buflen,
+   offset, signal, true);
+}
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 0/3] Drivers: hv: ring_buffer: do some code cleanup

2015-11-23 Thread Vitaly Kuznetsov
1) Fix comment style in ring_buffer.c
2) Remove unneeded smp_read_barrier_depends() from hyperv.h
3) Unify hv_ringbuffer_peek/read() functions.

Vitaly Kuznetsov (3):
  Drivers: hv: ring_buffer.c: fix comment style
  Drivers: hv: ring_buffer: remove stray smp_read_barrier_depends()
  Drivers: hv: ring_buffer: remove code duplication from
hv_ringbuffer_peek/read()

 drivers/hv/ring_buffer.c | 199 ---
 include/linux/hyperv.h   |   2 -
 2 files changed, 49 insertions(+), 152 deletions(-)

-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 1/3] Drivers: hv: ring_buffer.c: fix comment style

2015-11-23 Thread Vitaly Kuznetsov
Convert 6+-string comments repeating function names to normal kernel-style
comments and fix a couple of other comment style issues. No textual or
functional changes intended.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/ring_buffer.c | 135 +--
 1 file changed, 26 insertions(+), 109 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 70a1a9a..7bca513 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -112,9 +112,7 @@ static bool hv_need_to_signal_on_read(u32 prev_write_sz,
u32 read_loc = rbi->ring_buffer->read_index;
u32 pending_sz = rbi->ring_buffer->pending_send_sz;
 
-   /*
-* If the other end is not blocked on write don't bother.
-*/
+   /* If the other end is not blocked on write don't bother. */
if (pending_sz == 0)
return false;
 
@@ -128,12 +126,7 @@ static bool hv_need_to_signal_on_read(u32 prev_write_sz,
return false;
 }
 
-/*
- * hv_get_next_write_location()
- *
- * Get the next write location for the specified ring buffer
- *
- */
+/* Get the next write location for the specified ring buffer. */
 static inline u32
 hv_get_next_write_location(struct hv_ring_buffer_info *ring_info)
 {
@@ -142,12 +135,7 @@ hv_get_next_write_location(struct hv_ring_buffer_info 
*ring_info)
return next;
 }
 
-/*
- * hv_set_next_write_location()
- *
- * Set the next write location for the specified ring buffer
- *
- */
+/* Set the next write location for the specified ring buffer. */
 static inline void
 hv_set_next_write_location(struct hv_ring_buffer_info *ring_info,
 u32 next_write_location)
@@ -155,11 +143,7 @@ hv_set_next_write_location(struct hv_ring_buffer_info 
*ring_info,
ring_info->ring_buffer->write_index = next_write_location;
 }
 
-/*
- * hv_get_next_read_location()
- *
- * Get the next read location for the specified ring buffer
- */
+/* Get the next read location for the specified ring buffer. */
 static inline u32
 hv_get_next_read_location(struct hv_ring_buffer_info *ring_info)
 {
@@ -169,10 +153,8 @@ hv_get_next_read_location(struct hv_ring_buffer_info 
*ring_info)
 }
 
 /*
- * hv_get_next_readlocation_withoffset()
- *
  * Get the next read location + offset for the specified ring buffer.
- * This allows the caller to skip
+ * This allows the caller to skip.
  */
 static inline u32
 hv_get_next_readlocation_withoffset(struct hv_ring_buffer_info *ring_info,
@@ -186,13 +168,7 @@ hv_get_next_readlocation_withoffset(struct 
hv_ring_buffer_info *ring_info,
return next;
 }
 
-/*
- *
- * hv_set_next_read_location()
- *
- * Set the next read location for the specified ring buffer
- *
- */
+/* Set the next read location for the specified ring buffer. */
 static inline void
 hv_set_next_read_location(struct hv_ring_buffer_info *ring_info,
u32 next_read_location)
@@ -201,12 +177,7 @@ hv_set_next_read_location(struct hv_ring_buffer_info 
*ring_info,
 }
 
 
-/*
- *
- * hv_get_ring_buffer()
- *
- * Get the start of the ring buffer
- */
+/* Get the start of the ring buffer. */
 static inline void *
 hv_get_ring_buffer(struct hv_ring_buffer_info *ring_info)
 {
@@ -214,25 +185,14 @@ hv_get_ring_buffer(struct hv_ring_buffer_info *ring_info)
 }
 
 
-/*
- *
- * hv_get_ring_buffersize()
- *
- * Get the size of the ring buffer
- */
+/* Get the size of the ring buffer. */
 static inline u32
 hv_get_ring_buffersize(struct hv_ring_buffer_info *ring_info)
 {
return ring_info->ring_datasize;
 }
 
-/*
- *
- * hv_get_ring_bufferindices()
- *
- * Get the read and write indices as u64 of the specified ring buffer
- *
- */
+/* Get the read and write indices as u64 of the specified ring buffer. */
 static inline u64
 hv_get_ring_bufferindices(struct hv_ring_buffer_info *ring_info)
 {
@@ -240,12 +200,8 @@ hv_get_ring_bufferindices(struct hv_ring_buffer_info 
*ring_info)
 }
 
 /*
- *
- * hv_copyfrom_ringbuffer()
- *
  * Helper routine to copy to source from ring buffer.
  * Assume there is enough room. Handles wrap-around in src case only!!
- *
  */
 static u32 hv_copyfrom_ringbuffer(
struct hv_ring_buffer_info  *ring_info,
@@ -277,12 +233,8 @@ static u32 hv_copyfrom_ringbuffer(
 
 
 /*
- *
- * hv_copyto_ringbuffer()
- *
  * Helper routine to copy from source to ring buffer.
  * Assume there is enough room. Handles wrap-around in dest case only!!
- *
  */
 static u32 hv_copyto_ringbuffer(
struct hv_ring_buffer_info  *ring_info,
@@ -308,13 +260,7 @@ static u32 hv_copyto_ringbuffer(
return start_write_offset;
 }
 
-/*
- *
- * hv_ringbuffer_get_debuginfo()
- *
- * Get various debug metrics for the specified ring buffer
- *
- */
+/* Get various debug metrics for the specified ring buffer. */
 void hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
struct hv_ring_bu

Re: [PATCH 1/2] Drivers: hv: vmbus: Fix a Host signaling bug

2015-11-19 Thread Vitaly Kuznetsov
"K. Y. Srinivasan"  writes:

> Currently we have two policies for deciding when to signal the host:
> One based on the ring buffer state and the other based on what the
> VMBUS client driver wants to do. Consider the case when the client
> wants to explicitly control when to signal the host. In this case,
> if the client were to defer signaling, we will not be able to signal
> the host subsequently when the client does want to signal since the
> ring buffer state will prevent the signaling. Implement logic to
> have only one signaling policy in force for a given channel.
>
> Signed-off-by: K. Y. Srinivasan 
> Reviewed-by: Haiyang Zhang 
> Tested-by: Haiyang Zhang 
> Cc:  # v4.2+
> ---
>  drivers/hv/channel.c   |   18 ++
>  include/linux/hyperv.h |   12 
>  2 files changed, 30 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 77d2579..c6278c7 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -653,10 +653,19 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
> void *buffer,
>*on the ring. We will not signal if more data is
>*to be placed.
>*
> +  * Based on the channel signal state, we will decide
> +  * which signaling policy will be applied.
> +  *
>* If we cannot write to the ring-buffer; signal the host
>* even if we may not have written anything. This is a rare
>* enough condition that it should not matter.
>*/
> +
> + if (channel->signal_state)
> + signal = true;
> + else
> + kick_q = true;
> +
>   if (((ret == 0) && kick_q && signal) || (ret))
>   vmbus_setevent(channel);
>
> @@ -756,10 +765,19 @@ int vmbus_sendpacket_pagebuffer_ctl(struct 
> vmbus_channel *channel,
>*on the ring. We will not signal if more data is
>*to be placed.
>*
> +  * Based on the channel signal state, we will decide
> +  * which signaling policy will be applied.
> +  *
>* If we cannot write to the ring-buffer; signal the host
>* even if we may not have written anything. This is a rare
>* enough condition that it should not matter.
>*/
> +
> + if (channel->signal_state)
> + signal = true;
> + else
> + kick_q = true;
> +
>   if (((ret == 0) && kick_q && signal) || (ret))
>   vmbus_setevent(channel);
>
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 437c9c8..7b1af52 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -756,8 +756,20 @@ struct vmbus_channel {
>* link up channels based on their CPU affinity.
>*/
>   struct list_head percpu_list;
> + /*
> +  * Host signaling policy: The default policy will be
> +  * based on the ring buffer state. We will also support
> +  * a policy where the client driver can have explicit
> +  * signaling control.
> +  */
> + bool signal_state;

Making policy selector 'bool' is counter-intuitive: I suggest we rename
this to somethink like 'signal_explicit' if we're sure there won't be
new policies or (even better in my opinion) introduce a new enum with
these policies, e.g:

   enum hv_channel_policy signal_policy;

where

enum hv_channel_policy {
  HV_CHANNELPOLICY_DEFAULT,
  HV_CHANNELPOLICY_EXPLICIT,
};

>  };
>
> +static inline void set_channel_signal_state(struct vmbus_channel *c, bool 
> state)
> +{
> + c->signal_state = state;
> +}
> +
>  static inline void set_channel_read_state(struct vmbus_channel *c, bool 
> state)
>  {
>   c->batched_reading = state;

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 2/4] Drivers: hv: utils: rename outmsg_lock

2015-11-18 Thread Vitaly Kuznetsov
As a preparation to reusing outmsg_lock to protect test-and-set openrations
on 'mode' rename it the more general 'lock'.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_utils_transport.c | 14 +++---
 drivers/hv/hv_utils_transport.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 40abe44..59c6f3d 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -27,11 +27,11 @@ static struct list_head hvt_list = LIST_HEAD_INIT(hvt_list);
 
 static void hvt_reset(struct hvutil_transport *hvt)
 {
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
kfree(hvt->outmsg);
hvt->outmsg = NULL;
hvt->outmsg_len = 0;
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
if (hvt->on_reset)
hvt->on_reset();
 }
@@ -47,7 +47,7 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0))
return -EINTR;
 
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
if (!hvt->outmsg) {
ret = -EAGAIN;
goto out_unlock;
@@ -68,7 +68,7 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
hvt->outmsg_len = 0;
 
 out_unlock:
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
return ret;
 }
 
@@ -197,7 +197,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, 
void *msg, int len)
return ret;
}
/* HVUTIL_TRANSPORT_CHARDEV */
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
if (hvt->outmsg) {
/* Previous message wasn't received */
ret = -EFAULT;
@@ -211,7 +211,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, 
void *msg, int len)
} else
ret = -ENOMEM;
 out_unlock:
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
return ret;
 }
 
@@ -242,7 +242,7 @@ struct hvutil_transport *hvutil_transport_init(const char 
*name,
hvt->mdev.fops = >fops;
 
init_waitqueue_head(>outmsg_q);
-   mutex_init(>outmsg_lock);
+   mutex_init(>lock);
 
spin_lock(_list_lock);
list_add(>list, _list);
diff --git a/drivers/hv/hv_utils_transport.h b/drivers/hv/hv_utils_transport.h
index 314c76c..bff4c92 100644
--- a/drivers/hv/hv_utils_transport.h
+++ b/drivers/hv/hv_utils_transport.h
@@ -38,7 +38,7 @@ struct hvutil_transport {
u8 *outmsg; /* message to the userspace */
int outmsg_len; /* its length */
wait_queue_head_t outmsg_q; /* poll/read wait queue */
-   struct mutex outmsg_lock;   /* protects outmsg */
+   struct mutex lock;  /* protects struct members */
 };
 
 struct hvutil_transport *hvutil_transport_init(const char *name,
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 4/4] Drivers: hv: utils: fix crash when device is removed from host side

2015-11-18 Thread Vitaly Kuznetsov
The crash is observed when a service is being disabled host side while
userspace daemon is connected to the device:

[   90.244859] general protection fault:  [#1] SMP
...
[   90.800082] Call Trace:
[   90.800082]  [] __fput+0xc8/0x1f0
[   90.800082]  [] fput+0xe/0x10
...
[   90.800082]  [] do_signal+0x28/0x580
[   90.800082]  [] ? finish_task_switch+0xa6/0x180
[   90.800082]  [] ? __schedule+0x28f/0x870
[   90.800082]  [] ? hvt_op_read+0x12a/0x140 [hv_utils]
...

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_utils_transport.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 31c2f86..ee20b50 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -155,13 +155,22 @@ static int hvt_op_open(struct inode *inode, struct file 
*file)
return ret;
 }
 
+static void hvt_transport_free(struct hvutil_transport *hvt)
+{
+   misc_deregister(>mdev);
+   kfree(hvt->outmsg);
+   kfree(hvt);
+}
+
 static int hvt_op_release(struct inode *inode, struct file *file)
 {
struct hvutil_transport *hvt;
+   int mode_old;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
mutex_lock(>lock);
+   mode_old = hvt->mode;
if (hvt->mode != HVUTIL_TRANSPORT_DESTROY)
hvt->mode = HVUTIL_TRANSPORT_INIT;
/*
@@ -171,6 +180,9 @@ static int hvt_op_release(struct inode *inode, struct file 
*file)
hvt_reset(hvt);
mutex_unlock(>lock);
 
+   if (mode_old == HVUTIL_TRANSPORT_DESTROY)
+   hvt_transport_free(hvt);
+
return 0;
 }
 
@@ -304,17 +316,25 @@ err_free_hvt:
 
 void hvutil_transport_destroy(struct hvutil_transport *hvt)
 {
+   int mode_old;
+
mutex_lock(>lock);
+   mode_old = hvt->mode;
hvt->mode = HVUTIL_TRANSPORT_DESTROY;
wake_up_interruptible(>outmsg_q);
mutex_unlock(>lock);
 
+   /*
+* In case we were in 'chardev' mode we still have an open fd so we
+* have to defer freeing the device. Netlink interface can be freed
+* now.
+*/
spin_lock(_list_lock);
list_del(>list);
spin_unlock(_list_lock);
if (hvt->cn_id.idx > 0 && hvt->cn_id.val > 0)
cn_del_callback(>cn_id);
-   misc_deregister(>mdev);
-   kfree(hvt->outmsg);
-   kfree(hvt);
+
+   if (mode_old != HVUTIL_TRANSPORT_CHARDEV)
+   hvt_transport_free(hvt);
 }
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 0/4] Drivers: hv: utils: prevent crash when a utility driver is disabled host side

2015-11-18 Thread Vitaly Kuznetsov
Changes since v1:
- Don't re-introduce memory leak in hvt_op_write(), preserve the error
  code from on_msg() [Dan Carpenter]

Original description:

I'm observing a crash when a utility driver is disabled host side (e.g.
'Guest services' is disabled live) when we have userspace daemon
connected:

[   90.244859] general protection fault:  [#1] SMP
...
[   90.800082] CPU: 3 PID: 473 Comm: hypervfcopyd Not tainted 
4.3.0-rc7_netvsc_noalloc+ #1053
...
[   90.800082] Call Trace:
[   90.800082]  [] __fput+0xc8/0x1f0
[   90.800082]  [] fput+0xe/0x10
[   90.800082]  [] task_work_run+0x73/0x90
[   90.800082]  [] do_exit+0x335/0xa90
[   90.800082]  [] do_group_exit+0x3f/0xc0
[   90.800082]  [] get_signal+0x25e/0x5e0
[   90.800082]  [] do_signal+0x28/0x580
[   90.800082]  [] ? finish_task_switch+0xa6/0x180
[   90.800082]  [] ? __schedule+0x28f/0x870
[   90.800082]  [] ? hvt_op_read+0x12a/0x140 [hv_utils]
[   90.800082]  [] ? wake_atomic_t_function+0x70/0x70
...
[   90.800082] RIP  [] module_put+0x16/0x90
[   90.800082]  RSP 
[   95.734261] ---[ end trace 0e09af6a6294a668 ]---

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure
asking the daemon to exit gracefully by responding -EBADF to all
operations on unload.

Patch 1 fixes an unrelated issue I spotted, patch 2 renames outmsg_lock to
'lock' as we're gonna use it to protect test-and-set operations on 'mode',
patch 3 introduces HVUTIL_TRANSPORT_DESTROY mode of operation, patch 4
fixes the issue itself.

Patches are rebased on previously sent Olaf's fixes.

Vitaly Kuznetsov (4):
  Drivers: hv: utils: fix memory leak on on_msg() failure
  Drivers: hv: utils: rename outmsg_lock
  Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode
  Drivers: hv: utils: fix crash when device is removed from host side

 drivers/hv/hv_utils_transport.c | 112 +++-
 drivers/hv/hv_utils_transport.h |   3 +-
 2 files changed, 89 insertions(+), 26 deletions(-)

-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2 3/4] Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode

2015-11-18 Thread Vitaly Kuznetsov
When Hyper-V host asks us to remove some util driver by closing the
appropriate channel there is no easy way to force the current file
descriptor holder to hang up but we can start to respond -EBADF to all
operations asking it to exit gracefully.

As we're setting hvt->mode from two separate contexts now we need to use
a proper locking.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v1:
- Don't re-introduce memory leak in hvt_op_write() [Dan Carpenter]
---
 drivers/hv/hv_utils_transport.c | 71 -
 drivers/hv/hv_utils_transport.h |  1 +
 2 files changed, 57 insertions(+), 15 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 59c6f3d..31c2f86 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -27,11 +27,9 @@ static struct list_head hvt_list = LIST_HEAD_INIT(hvt_list);
 
 static void hvt_reset(struct hvutil_transport *hvt)
 {
-   mutex_lock(>lock);
kfree(hvt->outmsg);
hvt->outmsg = NULL;
hvt->outmsg_len = 0;
-   mutex_unlock(>lock);
if (hvt->on_reset)
hvt->on_reset();
 }
@@ -44,10 +42,17 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0))
+   if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0 ||
+hvt->mode != HVUTIL_TRANSPORT_CHARDEV))
return -EINTR;
 
mutex_lock(>lock);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY) {
+   ret = -EBADF;
+   goto out_unlock;
+   }
+
if (!hvt->outmsg) {
ret = -EAGAIN;
goto out_unlock;
@@ -85,7 +90,10 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
if (IS_ERR(inmsg))
return PTR_ERR(inmsg);
 
-   ret = hvt->on_msg(inmsg, count);
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
+   ret = -EBADF;
+   else
+   ret = hvt->on_msg(inmsg, count);
 
kfree(inmsg);
 
@@ -99,6 +107,10 @@ static unsigned int hvt_op_poll(struct file *file, 
poll_table *wait)
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
poll_wait(file, >outmsg_q, wait);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
+   return -EBADF;
+
if (hvt->outmsg_len > 0)
return POLLIN | POLLRDNORM;
 
@@ -108,26 +120,39 @@ static unsigned int hvt_op_poll(struct file *file, 
poll_table *wait)
 static int hvt_op_open(struct inode *inode, struct file *file)
 {
struct hvutil_transport *hvt;
+   int ret = 0;
+   bool issue_reset = false;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   /*
-* Switching to CHARDEV mode. We switch bach to INIT when device
-* gets released.
-*/
-   if (hvt->mode == HVUTIL_TRANSPORT_INIT)
+   mutex_lock(>lock);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY) {
+   ret = -EBADF;
+   } else if (hvt->mode == HVUTIL_TRANSPORT_INIT) {
+   /*
+* Switching to CHARDEV mode. We switch bach to INIT when
+* device gets released.
+*/
hvt->mode = HVUTIL_TRANSPORT_CHARDEV;
+   }
else if (hvt->mode == HVUTIL_TRANSPORT_NETLINK) {
/*
 * We're switching from netlink communication to using char
 * device. Issue the reset first.
 */
-   hvt_reset(hvt);
+   issue_reset = true;
hvt->mode = HVUTIL_TRANSPORT_CHARDEV;
-   } else
-   return -EBUSY;
+   } else {
+   ret = -EBUSY;
+   }
 
-   return 0;
+   if (issue_reset)
+   hvt_reset(hvt);
+
+   mutex_unlock(>lock);
+
+   return ret;
 }
 
 static int hvt_op_release(struct inode *inode, struct file *file)
@@ -136,12 +161,15 @@ static int hvt_op_release(struct inode *inode, struct 
file *file)
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   hvt->mode = HVUTIL_TRANSPORT_INIT;
+   mutex_lock(>lock);
+   if (hvt->mode != HVUTIL_TRANSPORT_DESTROY)
+   hvt->mode = HVUTIL_TRANSPORT_INIT;
/*
 * Cleanup message buffers to avoid spurious messages when the daemon
 * connects back.
 */
hvt_reset(hvt);
+   mutex_unlock(>lock);
 
return 0;
 }
@@ -168,6 +196,7 @@ static void hvt_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
 * Switching to NETLINK mode. Switching to CHARDEV happens when someone
 

[PATCH v2 1/4] Drivers: hv: utils: fix memory leak on on_msg() failure

2015-11-18 Thread Vitaly Kuznetsov
inmsg should be freed in case of on_msg() failure to avoid memory leak.
Preserve the error code from on_msg().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v1:
- Preserve the error code from on_msg(). [Dan Carpenter]
---
 drivers/hv/hv_utils_transport.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 24b2766..40abe44 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -77,6 +77,7 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
 {
struct hvutil_transport *hvt;
u8 *inmsg;
+   int ret;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
@@ -84,11 +85,11 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
if (IS_ERR(inmsg))
return PTR_ERR(inmsg);
 
-   if (hvt->on_msg(inmsg, count))
-   return -EFAULT;
+   ret = hvt->on_msg(inmsg, count);
+
kfree(inmsg);
 
-   return count;
+   return ret ? ret : count;
 }
 
 static unsigned int hvt_op_poll(struct file *file, poll_table *wait)
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/4] Drivers: hv: utils: fix memory leak on on_msg() failure

2015-11-16 Thread Vitaly Kuznetsov
Dan Carpenter <dan.carpen...@oracle.com> writes:

> On Thu, Nov 12, 2015 at 12:32:13PM +0100, Vitaly Kuznetsov wrote:
>> @@ -85,10 +86,10 @@ static ssize_t hvt_op_write(struct file *file, const 
>> char __user *buf,
>>  return PTR_ERR(inmsg);
>>  
>>  if (hvt->on_msg(inmsg, count))
>> -return -EFAULT;
>> +ret = -EFAULT;
>
> You fix this leak and then re-introduce it again directly in patch
> 3/4.

Such a shame ... Thanks for noticing!

> Also it might be nice to preserve the error code.
>
>   ret = hvt->on_msg(inmsg, count);
>
>   kfree(inmsg);
>
>   return ret ? ret : count;

It seems on_msg() hooks can only return -EINVAL now but why not
... let's do it in v2.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 0/4] Drivers: hv: utils: prevent crash when a utility driver is disabled host side

2015-11-12 Thread Vitaly Kuznetsov
I'm observing a crash when a utility driver is disabled host side (e.g.
'Guest services' is disabled live) when we have userspace daemon
connected:

[   90.244859] general protection fault:  [#1] SMP
...
[   90.800082] CPU: 3 PID: 473 Comm: hypervfcopyd Not tainted 
4.3.0-rc7_netvsc_noalloc+ #1053
...
[   90.800082] Call Trace:
[   90.800082]  [] __fput+0xc8/0x1f0
[   90.800082]  [] fput+0xe/0x10
[   90.800082]  [] task_work_run+0x73/0x90
[   90.800082]  [] do_exit+0x335/0xa90
[   90.800082]  [] do_group_exit+0x3f/0xc0
[   90.800082]  [] get_signal+0x25e/0x5e0
[   90.800082]  [] do_signal+0x28/0x580
[   90.800082]  [] ? finish_task_switch+0xa6/0x180
[   90.800082]  [] ? __schedule+0x28f/0x870
[   90.800082]  [] ? hvt_op_read+0x12a/0x140 [hv_utils]
[   90.800082]  [] ? wake_atomic_t_function+0x70/0x70
...
[   90.800082] RIP  [] module_put+0x16/0x90
[   90.800082]  RSP 
[   95.734261] ---[ end trace 0e09af6a6294a668 ]---

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure
asking the daemon to exit gracefully by responding -EBADF to all
operations on unload.

Patch 1 fixes an unrelated issue I spotted, patch 2 renames outmsg_lock to
'lock' as we're gonna use it to protect test-and-set operations on 'mode',
patch 3 introduces HVUTIL_TRANSPORT_DESTROY mode of operation, patch 4
fixes the issue itself.

Patches are rebased on previously sent Olaf's fixes.

Vitaly Kuznetsov (4):
  Drivers: hv: utils: fix memory leak on on_msg() failure
  Drivers: hv: utils: rename outmsg_lock
  Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode
  Drivers: hv: utils: fix crash when device is removed from host side

 drivers/hv/hv_utils_transport.c | 110 +++-
 drivers/hv/hv_utils_transport.h |   3 +-
 2 files changed, 88 insertions(+), 25 deletions(-)

-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 2/4] Drivers: hv: utils: rename outmsg_lock

2015-11-12 Thread Vitaly Kuznetsov
As a preparation to reusing outmsg_lock to protect test-and-set openrations
on 'mode' rename it the more general 'lock'.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_utils_transport.c | 14 +++---
 drivers/hv/hv_utils_transport.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 1d20451..5cce2d2 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -27,11 +27,11 @@ static struct list_head hvt_list = LIST_HEAD_INIT(hvt_list);
 
 static void hvt_reset(struct hvutil_transport *hvt)
 {
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
kfree(hvt->outmsg);
hvt->outmsg = NULL;
hvt->outmsg_len = 0;
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
if (hvt->on_reset)
hvt->on_reset();
 }
@@ -47,7 +47,7 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0))
return -EINTR;
 
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
if (!hvt->outmsg) {
ret = -EAGAIN;
goto out_unlock;
@@ -68,7 +68,7 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
hvt->outmsg_len = 0;
 
 out_unlock:
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
return ret;
 }
 
@@ -197,7 +197,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, 
void *msg, int len)
return ret;
}
/* HVUTIL_TRANSPORT_CHARDEV */
-   mutex_lock(>outmsg_lock);
+   mutex_lock(>lock);
if (hvt->outmsg) {
/* Previous message wasn't received */
ret = -EFAULT;
@@ -211,7 +211,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, 
void *msg, int len)
} else
ret = -ENOMEM;
 out_unlock:
-   mutex_unlock(>outmsg_lock);
+   mutex_unlock(>lock);
return ret;
 }
 
@@ -242,7 +242,7 @@ struct hvutil_transport *hvutil_transport_init(const char 
*name,
hvt->mdev.fops = >fops;
 
init_waitqueue_head(>outmsg_q);
-   mutex_init(>outmsg_lock);
+   mutex_init(>lock);
 
spin_lock(_list_lock);
list_add(>list, _list);
diff --git a/drivers/hv/hv_utils_transport.h b/drivers/hv/hv_utils_transport.h
index 314c76c..bff4c92 100644
--- a/drivers/hv/hv_utils_transport.h
+++ b/drivers/hv/hv_utils_transport.h
@@ -38,7 +38,7 @@ struct hvutil_transport {
u8 *outmsg; /* message to the userspace */
int outmsg_len; /* its length */
wait_queue_head_t outmsg_q; /* poll/read wait queue */
-   struct mutex outmsg_lock;   /* protects outmsg */
+   struct mutex lock;  /* protects struct members */
 };
 
 struct hvutil_transport *hvutil_transport_init(const char *name,
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 3/4] Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode

2015-11-12 Thread Vitaly Kuznetsov
When Hyper-V host asks us to remove some util driver by closing the
appropriate channel there is no easy way to force the current file
descriptor holder to hang up but we can start to respond -EBADF to all
operations asking it to exit gracefully.

As we're setting hvt->mode from two separate contexts now we need to use
a proper locking.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_utils_transport.c | 69 -
 drivers/hv/hv_utils_transport.h |  1 +
 2 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 5cce2d2..984cba5 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -27,11 +27,9 @@ static struct list_head hvt_list = LIST_HEAD_INIT(hvt_list);
 
 static void hvt_reset(struct hvutil_transport *hvt)
 {
-   mutex_lock(>lock);
kfree(hvt->outmsg);
hvt->outmsg = NULL;
hvt->outmsg_len = 0;
-   mutex_unlock(>lock);
if (hvt->on_reset)
hvt->on_reset();
 }
@@ -44,10 +42,17 @@ static ssize_t hvt_op_read(struct file *file, char __user 
*buf,
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0))
+   if (wait_event_interruptible(hvt->outmsg_q, hvt->outmsg_len > 0 ||
+hvt->mode != HVUTIL_TRANSPORT_CHARDEV))
return -EINTR;
 
mutex_lock(>lock);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY) {
+   ret = -EBADF;
+   goto out_unlock;
+   }
+
if (!hvt->outmsg) {
ret = -EAGAIN;
goto out_unlock;
@@ -85,6 +90,9 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
if (IS_ERR(inmsg))
return PTR_ERR(inmsg);
 
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
+   return -EBADF;
+
if (hvt->on_msg(inmsg, count))
ret = -EFAULT;
kfree(inmsg);
@@ -99,6 +107,10 @@ static unsigned int hvt_op_poll(struct file *file, 
poll_table *wait)
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
poll_wait(file, >outmsg_q, wait);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY)
+   return -EBADF;
+
if (hvt->outmsg_len > 0)
return POLLIN | POLLRDNORM;
 
@@ -108,26 +120,39 @@ static unsigned int hvt_op_poll(struct file *file, 
poll_table *wait)
 static int hvt_op_open(struct inode *inode, struct file *file)
 {
struct hvutil_transport *hvt;
+   int ret = 0;
+   bool issue_reset = false;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   /*
-* Switching to CHARDEV mode. We switch bach to INIT when device
-* gets released.
-*/
-   if (hvt->mode == HVUTIL_TRANSPORT_INIT)
+   mutex_lock(>lock);
+
+   if (hvt->mode == HVUTIL_TRANSPORT_DESTROY) {
+   ret = -EBADF;
+   } else if (hvt->mode == HVUTIL_TRANSPORT_INIT) {
+   /*
+* Switching to CHARDEV mode. We switch bach to INIT when
+* device gets released.
+*/
hvt->mode = HVUTIL_TRANSPORT_CHARDEV;
+   }
else if (hvt->mode == HVUTIL_TRANSPORT_NETLINK) {
/*
 * We're switching from netlink communication to using char
 * device. Issue the reset first.
 */
-   hvt_reset(hvt);
+   issue_reset = true;
hvt->mode = HVUTIL_TRANSPORT_CHARDEV;
-   } else
-   return -EBUSY;
+   } else {
+   ret = -EBUSY;
+   }
 
-   return 0;
+   if (issue_reset)
+   hvt_reset(hvt);
+
+   mutex_unlock(>lock);
+
+   return ret;
 }
 
 static int hvt_op_release(struct inode *inode, struct file *file)
@@ -136,12 +161,15 @@ static int hvt_op_release(struct inode *inode, struct 
file *file)
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
-   hvt->mode = HVUTIL_TRANSPORT_INIT;
+   mutex_lock(>lock);
+   if (hvt->mode != HVUTIL_TRANSPORT_DESTROY)
+   hvt->mode = HVUTIL_TRANSPORT_INIT;
/*
 * Cleanup message buffers to avoid spurious messages when the daemon
 * connects back.
 */
hvt_reset(hvt);
+   mutex_unlock(>lock);
 
return 0;
 }
@@ -168,6 +196,7 @@ static void hvt_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
 * Switching to NETLINK mode. Switching to CHARDEV happens when someone
 * opens the device.
 */
+   mutex_lock(>lock);
if (hvt->mode == HVUTIL_TRANSPORT_INIT)
hvt

[PATCH 1/4] Drivers: hv: utils: fix memory leak on on_msg() failure

2015-11-12 Thread Vitaly Kuznetsov
inmsg should be freed in case of on_msg() failure to avoid memory leak.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_utils_transport.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 24b2766..1d20451 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -77,6 +77,7 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
 {
struct hvutil_transport *hvt;
u8 *inmsg;
+   ssize_t ret = count;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
@@ -85,10 +86,10 @@ static ssize_t hvt_op_write(struct file *file, const char 
__user *buf,
return PTR_ERR(inmsg);
 
if (hvt->on_msg(inmsg, count))
-   return -EFAULT;
+   ret = -EFAULT;
kfree(inmsg);
 
-   return count;
+   return ret;
 }
 
 static unsigned int hvt_op_poll(struct file *file, poll_table *wait)
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 4/4] Drivers: hv: utils: fix crash when device is removed from host side

2015-11-12 Thread Vitaly Kuznetsov
The crash is observed when a service is being disabled host side while
userspace daemon is connected to the device:

[   90.244859] general protection fault:  [#1] SMP
...
[   90.800082] Call Trace:
[   90.800082]  [] __fput+0xc8/0x1f0
[   90.800082]  [] fput+0xe/0x10
...
[   90.800082]  [] do_signal+0x28/0x580
[   90.800082]  [] ? finish_task_switch+0xa6/0x180
[   90.800082]  [] ? __schedule+0x28f/0x870
[   90.800082]  [] ? hvt_op_read+0x12a/0x140 [hv_utils]
...

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/hv/hv_utils_transport.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
index 984cba5..174c72a 100644
--- a/drivers/hv/hv_utils_transport.c
+++ b/drivers/hv/hv_utils_transport.c
@@ -155,13 +155,22 @@ static int hvt_op_open(struct inode *inode, struct file 
*file)
return ret;
 }
 
+static void hvt_transport_free(struct hvutil_transport *hvt)
+{
+   misc_deregister(>mdev);
+   kfree(hvt->outmsg);
+   kfree(hvt);
+}
+
 static int hvt_op_release(struct inode *inode, struct file *file)
 {
struct hvutil_transport *hvt;
+   int mode_old;
 
hvt = container_of(file->f_op, struct hvutil_transport, fops);
 
mutex_lock(>lock);
+   mode_old = hvt->mode;
if (hvt->mode != HVUTIL_TRANSPORT_DESTROY)
hvt->mode = HVUTIL_TRANSPORT_INIT;
/*
@@ -171,6 +180,9 @@ static int hvt_op_release(struct inode *inode, struct file 
*file)
hvt_reset(hvt);
mutex_unlock(>lock);
 
+   if (mode_old == HVUTIL_TRANSPORT_DESTROY)
+   hvt_transport_free(hvt);
+
return 0;
 }
 
@@ -304,17 +316,25 @@ err_free_hvt:
 
 void hvutil_transport_destroy(struct hvutil_transport *hvt)
 {
+   int mode_old;
+
mutex_lock(>lock);
+   mode_old = hvt->mode;
hvt->mode = HVUTIL_TRANSPORT_DESTROY;
wake_up_interruptible(>outmsg_q);
mutex_unlock(>lock);
 
+   /*
+* In case we were in 'chardev' mode we still have an open fd so we
+* have to defer freeing the device. Netlink interface can be freed
+* now.
+*/
spin_lock(_list_lock);
list_del(>list);
spin_unlock(_list_lock);
if (hvt->cn_id.idx > 0 && hvt->cn_id.val > 0)
cn_del_callback(>cn_id);
-   misc_deregister(>mdev);
-   kfree(hvt->outmsg);
-   kfree(hvt);
+
+   if (mode_old != HVUTIL_TRANSPORT_CHARDEV)
+   hvt_transport_free(hvt);
 }
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH V2 10/12] Drivers: hv: vmbus: channge vmbus_connection.channel_lock to mutex

2015-11-12 Thread Vitaly Kuznetsov
"K. Y. Srinivasan"  writes:

> From: Dexuan Cui 
>
> spinlock is unnecessary here.
> mutex is enough.

Hm, mutex is usually required when we need to sleep and a spinlock is
enough otherwise :-)

Or are you trying to say we don't need to disable interrupts here? In
that can maybe we can just switch to spin_lock()/spin_unlock()?

>
> Signed-off-by: Dexuan Cui 
> Signed-off-by: K. Y. Srinivasan 
> ---
>  drivers/hv/channel_mgmt.c |   12 ++--
>  drivers/hv/connection.c   |7 +++
>  drivers/hv/hyperv_vmbus.h |2 +-
>  3 files changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 9c9da3a..d013171 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -206,9 +206,9 @@ void hv_process_channel_removal(struct vmbus_channel 
> *channel, u32 relid)
>   }
>
>   if (channel->primary_channel == NULL) {
> - spin_lock_irqsave(_connection.channel_lock, flags);
> + mutex_lock(_connection.channel_mutex);
>   list_del(>listentry);
> - spin_unlock_irqrestore(_connection.channel_lock, flags);
> + mutex_unlock(_connection.channel_mutex);
>
>   primary_channel = channel;
>   } else {
> @@ -253,7 +253,7 @@ static void vmbus_process_offer(struct vmbus_channel 
> *newchannel)
>   unsigned long flags;
>
>   /* Make sure this is a new offer */
> - spin_lock_irqsave(_connection.channel_lock, flags);
> + mutex_lock(_connection.channel_mutex);
>
>   list_for_each_entry(channel, _connection.chn_list, listentry) {
>   if (!uuid_le_cmp(channel->offermsg.offer.if_type,
> @@ -269,7 +269,7 @@ static void vmbus_process_offer(struct vmbus_channel 
> *newchannel)
>   list_add_tail(>listentry,
> _connection.chn_list);
>
> - spin_unlock_irqrestore(_connection.channel_lock, flags);
> + mutex_unlock(_connection.channel_mutex);
>
>   if (!fnew) {
>   /*
> @@ -341,9 +341,9 @@ static void vmbus_process_offer(struct vmbus_channel 
> *newchannel)
>  err_deq_chan:
>   vmbus_release_relid(newchannel->offermsg.child_relid);
>
> - spin_lock_irqsave(_connection.channel_lock, flags);
> + mutex_lock(_connection.channel_mutex);
>   list_del(>listentry);
> - spin_unlock_irqrestore(_connection.channel_lock, flags);
> + mutex_unlock(_connection.channel_mutex);
>
>   if (newchannel->target_cpu != get_cpu()) {
>   put_cpu();
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 4fc2e88..521f48e 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -146,7 +146,7 @@ int vmbus_connect(void)
>   spin_lock_init(_connection.channelmsg_lock);
>
>   INIT_LIST_HEAD(_connection.chn_list);
> - spin_lock_init(_connection.channel_lock);
> + mutex_init(_connection.channel_mutex);
>
>   /*
>* Setup the vmbus event connection for channel interrupt
> @@ -282,11 +282,10 @@ struct vmbus_channel *relid2channel(u32 relid)
>  {
>   struct vmbus_channel *channel;
>   struct vmbus_channel *found_channel  = NULL;
> - unsigned long flags;
>   struct list_head *cur, *tmp;
>   struct vmbus_channel *cur_sc;
>
> - spin_lock_irqsave(_connection.channel_lock, flags);
> + mutex_lock(_connection.channel_mutex);
>   list_for_each_entry(channel, _connection.chn_list, listentry) {
>   if (channel->offermsg.child_relid == relid) {
>   found_channel = channel;
> @@ -305,7 +304,7 @@ struct vmbus_channel *relid2channel(u32 relid)
>   }
>   }
>   }
> - spin_unlock_irqrestore(_connection.channel_lock, flags);
> + mutex_unlock(_connection.channel_mutex);
>
>   return found_channel;
>  }
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 64950d8..0af6dce 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -688,7 +688,7 @@ struct vmbus_connection {
>
>   /* List of channels */
>   struct list_head chn_list;
> - spinlock_t channel_lock;
> + struct mutex channel_mutex;
>
>   struct workqueue_struct *work_queue;
>  };

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH net-next] hyperv: Add handler for RNDIS_STATUS_NETWORK_CHANGE event

2015-10-30 Thread Vitaly Kuznetsov
Haiyang Zhang  writes:

>> -Original Message-
>> From: Richard Weinberger [mailto:richard.weinber...@gmail.com]
>> Sent: Tuesday, October 27, 2015 6:36 PM
>> To: David Miller 
>> Cc: Haiyang Zhang ; o...@aepfle.de; Greg Kroah-
>> Hartman ; net...@vger.kernel.org; jasow...@redhat.com;
>> driverdev-devel@linuxdriverproject.org; LKML > ker...@vger.kernel.org>
>> Subject: Re: [PATCH net-next] hyperv: Add handler for
>> RNDIS_STATUS_NETWORK_CHANGE event
>> 
>> On Mon, Jun 23, 2014 at 10:10 PM, David Miller 
>> wrote:
>> > From: Haiyang Zhang 
>> > Date: Mon, 23 Jun 2014 16:09:59 +
>> >
>> >> So, what's the equivalent or similar command to "network restart" on
>> SLES12? Could
>> >> you update the command line for the usermodehelper when porting this
>> patch to SLES
>> >> 12?
>> >
>> > No, you are not going to keep the usermodehelper invocation in your
>> driver
>> > please remove it.  It is absolutely inappropriate, and I strictly do
>> not want
>> > to keep it in there because other people will copy it and then we'll
>> have a
>> > real mess on our hands.
>> 
>> Sorry for digging up this old thread.
>> While talking with some guys about usermodehelper abuses I came across
>> this gem.
>> Mainline still contains that "/etc/init.d/network restart" code.
>> Haiyang, care to cleanup?
>
> Hi Richard and others,
>
> Thanks for the reminder. I will clean up the usermode helper.
>
> Do you have suggestions of trigger DHCP refresh from kernel mode? Any 
> sample code in the existing kernel code?
>

I think it's wrong to call dhcp refresh from kernel. What happens when
we reconnect normal hardware adapter to another network? Link goes down
and then up and userspace is supposed to react accordingly. I think we
should emulate something similar for RNDIS_STATUS_NETWORK_CHANGE.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] Drivers: hv: vss: run only on supported host versions

2015-10-19 Thread Vitaly Kuznetsov
Olaf Hering  writes:

> The Backup integration service on WS2012 has appearently trouble to
> negotiate with a guest which does not support the provided util version.
> Currently the VSS driver supports only version 5/0. A WS2012 offers only
> version 1/x and 3/x, and vmbus_prep_negotiate_resp correctly returns an
> empty icframe_vercnt/icmsg_vercnt. But the host ignores that and
> continues to send ICMSGTYPE_NEGOTIATE messages. The result are weird
> errors during boot and general misbehaviour.
>
> Check the Windows version to work around the host bug, skip hv_vss_init
> on WS2012 and older.
>
> Signed-off-by: Olaf Hering 
> ---
>  drivers/hv/hv_snapshot.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c
> index a548ae4..52f10ed 100644
> --- a/drivers/hv/hv_snapshot.c
> +++ b/drivers/hv/hv_snapshot.c
> @@ -331,6 +331,16 @@ static void vss_on_reset(void)
>  int
>  hv_vss_init(struct hv_util_service *srv)
>  {
> + switch (vmbus_proto_version) {
> + case VERSION_WS2008:
> + case VERSION_WIN7:
> + case VERSION_WIN8:


Wouldn't it be better to simplify it to
if (vmbus_proto_version < VERSION_WIN8_1) {
pr_warn();
return ...
}

E.g. there was a patch for storvsc some time ago:

commit 6ee5c61535a2df807069145970d3e7fa492a3fac
Author: Keith Mange 
Date:   Thu Aug 13 08:43:46 2015 -0700

storvsc: Rather than look for sets of specific protocol versions,
make decisions based on ranges.


> + pr_warn("Integration service 'Backup (volume snapshot)'"
> + " not supported on this host version.\n");
> + return -ENOTSUPP;
> + default:
> + break;
> + }
>   recv_buffer = srv->recv_buffer;
>
>   /*
> ___
> devel mailing list
> de...@linuxdriverproject.org
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 02/10] Drivers: hv: utils: run polling callback always in interrupt context

2015-10-09 Thread Vitaly Kuznetsov
Olaf Hering  writes:

> On Thu, Oct 08, KY Srinivasan wrote:
>
>> > yes, but after doing fcopy_respond_to_host(). I'd suggest we leave the
>> > check in place, better safe than sorry.
>> 
>> Agreed; Olaf, if it is ok with you, I can fix it up and send.
>
> I will retest with this part reverted. I think without two code paths
> entering hv_fcopy_callback it should be ok to leave this check in.

I think hv_fcopy_callback() is not involved here: we call fcopy_on_msg()
every time userspace daemon writes to the device and it is not anyhow
synchronized with host-guest communication. 

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 10/10] drivers:hv: Define the channel type for Hyper-V PCI Express pass-through

2015-10-08 Thread Vitaly Kuznetsov
"K. Y. Srinivasan"  writes:

> From: Jake Oshins 
>
> This defines the channel type for PCI front-ends in Hyper-V VMs.
>
> Signed-off-by: Jake Oshins 
> Signed-off-by: K. Y. Srinivasan 
> ---
>  drivers/hv/channel_mgmt.c |3 +++
>  include/linux/hyperv.h|   11 +++
>  2 files changed, 14 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 652afd1..1ae615b 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -358,6 +358,7 @@ enum {
>   SCSI,
>   NIC,
>   ND_NIC,
> + HV_PCIE,
>   MAX_PERF_CHN,
>  };

It seems all other members of the enum don't have HV_ prefix... should
we add it there or remove it from HV_PCIE?

>
> @@ -375,6 +376,8 @@ static const struct hv_vmbus_device_id hp_devs[] = {
>   { HV_NIC_GUID, },
>   /* NetworkDirect Guest RDMA */
>   { HV_ND_GUID, },
> + /* PCI Express Pass Through */
> + { HV_PCIE_GUID, },
>  };

And here everything is prefixed with HV_ ... so I'd say we add HV_ to
all members of the previously mentioned enum.

[...]

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 02/10] Drivers: hv: utils: run polling callback always in interrupt context

2015-10-08 Thread Vitaly Kuznetsov
"K. Y. Srinivasan"  writes:

> From: Olaf Hering 
>
> All channel interrupts are bound to specific VCPUs in the guest
> at the point channel is created. While currently, we invoke the
> polling function on the correct CPU (the CPU to which the channel
> is bound to) in some cases we may run the polling function in
> a non-interrupt context. This  potentially can cause an issue as the
> polling function can be interrupted by the channel callback function.
> Fix the issue by running the polling function on the appropriate CPU
> at interrupt level. Additional details of the issue being addressed by
> this patch are given below:
>
> Currently hv_fcopy_onchannelcallback is called from interrupts and also
> via the ->write function of hv_utils. Since the used global variables to
> maintain state are not thread safe the state can get out of sync.
> This affects the variable state as well as the channel inbound buffer.
>
> As suggested by KY adjust hv_poll_channel to always run the given
> callback on the cpu which the channel is bound to. This avoids the need
> for locking because all the util services are single threaded and only
> one transaction is active at any given point in time.
>
> Additionally, remove the context variable, they will always be the same as
> recv_channel.
>
> Signed-off-by: Olaf Hering 
> Signed-off-by: K. Y. Srinivasan 
> ---
>  drivers/hv/hv_fcopy.c |   37 +
>  drivers/hv/hv_kvp.c   |   28 ++--
>  drivers/hv/hv_snapshot.c  |   29 +++--
>  drivers/hv/hyperv_vmbus.h |6 +-
>  4 files changed, 35 insertions(+), 65 deletions(-)
>
> diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> index bbdec50..4eab465 100644
> --- a/drivers/hv/hv_fcopy.c
> +++ b/drivers/hv/hv_fcopy.c
> @@ -51,7 +51,6 @@ static struct {
>   struct hv_fcopy_hdr  *fcopy_msg; /* current message */
>   struct vmbus_channel *recv_channel; /* chn we got the request */
>   u64 recv_req_id; /* request ID. */
> - void *fcopy_context; /* for the channel callback */
>  } fcopy_transaction;
>
>  static void fcopy_respond_to_host(int error);
> @@ -67,6 +66,13 @@ static struct hvutil_transport *hvt;
>   */
>  static int dm_reg_value;
>
> +static void fcopy_poll_wrapper(void *channel)
> +{
> + /* Transaction is finished, reset the state here to avoid races. */
> + fcopy_transaction.state = HVUTIL_READY;
> + hv_fcopy_onchannelcallback(channel);
> +}
> +
>  static void fcopy_timeout_func(struct work_struct *dummy)
>  {
>   /*
> @@ -74,13 +80,7 @@ static void fcopy_timeout_func(struct work_struct *dummy)
>* process the pending transaction.
>*/
>   fcopy_respond_to_host(HV_E_FAIL);
> -
> - /* Transaction is finished, reset the state. */
> - if (fcopy_transaction.state > HVUTIL_READY)
> - fcopy_transaction.state = HVUTIL_READY;
> -
> - hv_poll_channel(fcopy_transaction.fcopy_context,
> - hv_fcopy_onchannelcallback);
> + hv_poll_channel(fcopy_transaction.recv_channel, fcopy_poll_wrapper);
>  }
>
>  static int fcopy_handle_handshake(u32 version)
> @@ -108,9 +108,9 @@ static int fcopy_handle_handshake(u32 version)
>   return -EINVAL;
>   }
>   pr_debug("FCP: userspace daemon ver. %d registered\n", version);
> + /* Forward state for hv_fcopy_onchannelcallback */
>   fcopy_transaction.state = HVUTIL_READY;
> - hv_poll_channel(fcopy_transaction.fcopy_context,
> - hv_fcopy_onchannelcallback);
> + hv_poll_channel(fcopy_transaction.recv_channel, fcopy_poll_wrapper);
>   return 0;
>  }
>
> @@ -227,15 +227,8 @@ void hv_fcopy_onchannelcallback(void *context)
>   int util_fw_version;
>   int fcopy_srv_version;
>
> - if (fcopy_transaction.state > HVUTIL_READY) {
> - /*
> -  * We will defer processing this callback once
> -  * the current transaction is complete.
> -  */
> - fcopy_transaction.fcopy_context = context;
> + if (fcopy_transaction.state > HVUTIL_READY)
>   return;
> - }
> - fcopy_transaction.fcopy_context = NULL;
>
>   vmbus_recvpacket(channel, recv_buffer, PAGE_SIZE * 2, ,
>);
> @@ -295,9 +288,6 @@ static int fcopy_on_msg(void *msg, int len)
>   if (fcopy_transaction.state == HVUTIL_DEVICE_INIT)
>   return fcopy_handle_handshake(*val);
>
> - if (fcopy_transaction.state != HVUTIL_USERSPACE_REQ)
> - return -EINVAL;
> -

This particular change seems unrelated and I'm unsure it's safe to
remove this check. It is meant to protect against daemon screwing the
protocol and writing to the device when it wasn't requested for an
action. It is correct to propagate -EINVAL in this case. Or am I missing
something and the check is redundant now?

Thanks,

[...]

-- 
  

Re: [PATCH 02/10] Drivers: hv: utils: run polling callback always in interrupt context

2015-10-08 Thread Vitaly Kuznetsov
Olaf Hering <o...@aepfle.de> writes:

> On Thu, Oct 08, Vitaly Kuznetsov wrote:
>
>> > @@ -295,9 +288,6 @@ static int fcopy_on_msg(void *msg, int len)
>> >if (fcopy_transaction.state == HVUTIL_DEVICE_INIT)
>> >return fcopy_handle_handshake(*val);
>> >
>> > -  if (fcopy_transaction.state != HVUTIL_USERSPACE_REQ)
>> > -  return -EINVAL;
>> > -
>> 
>> This particular change seems unrelated and I'm unsure it's safe to
>> remove this check. It is meant to protect against daemon screwing the
>> protocol and writing to the device when it wasn't requested for an
>> action. It is correct to propagate -EINVAL in this case. Or am I missing
>> something and the check is redundant now?
>
> What can happen if there is an odd write request?

I think we don't want to propagate misbehaving daemon's data to the
host -- let's cut it here. E.g. imagine there is no communication going
on and daemon starts writing something to the device. In case we remove
the check we'll be doing fcopy_respond_to_host() for each daemon's write
flooding the host.

> If there is a timeout
> scheduled some return value will be sent to the host. Then the state is
> set to RESET and eventually vmbus_recvpacket will receive something.
> That something will be processed and passed to the daemon.
>
> If there was no timeout scheduled the write will just return.

yes, but after doing fcopy_respond_to_host(). I'd suggest we leave the
check in place, better safe than sorry.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/1] drivers/hv: cleanup synic msrs if vmbus connect failed

2015-10-07 Thread Vitaly Kuznetsov
"Denis V. Lunev" <d...@openvz.org> writes:

> From: Andrey Smetanin <asmeta...@virtuozzo.com>
>
> Before vmbus_connect() synic is setup per vcpu - this means
> hypervisor receives writes at synic msr's and probably allocate
> hypervisor resources per synic setup.
>
> If vmbus_connect() failed for some reason it's neccessary to cleanup
> synic setup by call hv_synic_cleanup() at each vcpu to get a chance
> to free allocated resources by hypervisor per synic.

"to make sure it is safe to free previously allocated resources" (as we
can free them anyway)?

>
> This patch does appropriate cleanup in case of vmbus_connect() failure.
>
> Signed-off-by: Andrey Smetanin <asmeta...@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <d...@openvz.org>
> CC: "K. Y. Srinivasan" <k...@microsoft.com>
> CC: Haiyang Zhang <haiya...@microsoft.com>
> CC: Vitaly Kuznetsov <vkuzn...@redhat.com>

Reviewed-by: Vitaly Kuznetsov <vkuzn...@redhat.com>

> ---
>  drivers/hv/vmbus_drv.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index f19b6f7..3297731 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -867,7 +867,7 @@ static int vmbus_bus_init(int irq)
>   on_each_cpu(hv_synic_init, NULL, 1);
>   ret = vmbus_connect();
>   if (ret)
> - goto err_alloc;
> + goto err_connect;
>
>   if (vmbus_proto_version > VERSION_WIN7)
>   cpu_hotplug_disable();
> @@ -885,6 +885,8 @@ static int vmbus_bus_init(int irq)
>
>   return 0;
>
> +err_connect:
> + on_each_cpu(hv_synic_cleanup, NULL, 1);
>  err_alloc:
>   hv_synic_free();
>   hv_remove_vmbus_irq();

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] x86: guest: rely on leaf 0x40000001 to detect Hyper-V

2015-10-02 Thread Vitaly Kuznetsov
Paolo Bonzini <pbonz...@redhat.com> writes:

> The specification says that "Microsoft Hv" is actually a vendor ID field
> that is only used for reporting and diagnostic purposes.  The actual
> field that you need to check is the interface ID that you get in eax
> when querying the HYPERV_CPUID_INTERFACE.
>
> Change ms_hyperv_platform to actually do what the specification suggests.
> This roughy matches what Windows looks for, though Windows actually
> ignores HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS completely.
>
> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>

Reviewed-by: Vitaly Kuznetsov <vkuzn...@redhat.com>

(and it seems K. Y. is missing on the CC: list, fixed).

> ---
>  arch/x86/kernel/cpu/mshyperv.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 381c8b9b3a33..7910e7fd705b 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -116,18 +116,16 @@ static void hv_machine_crash_shutdown(struct pt_regs 
> *regs)
>  static uint32_t  __init ms_hyperv_platform(void)
>  {
>   u32 eax;
> - u32 hyp_signature[3];
>
>   if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
>   return 0;
>
> - cpuid(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS,
> -   , _signature[0], _signature[1], _signature[2]);
> -
> - if (eax >= HYPERV_CPUID_MIN &&
> - eax <= HYPERV_CPUID_MAX &&
> - !memcmp("Microsoft Hv", hyp_signature, 12))
> - return HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS;
> + eax = cpuid_eax(HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS);
> + if (eax >= HYPERV_CPUID_MIN && eax <= HYPERV_CPUID_MAX) {
> + eax = cpuid_eax(HYPERV_CPUID_INTERFACE);
> + if (!memcmp(, "Hv#1", sizeof(eax)))
> + return HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS;
> + }
>
>   return 0;
>  }

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 0/3] scsi: Some miscellaneous fixes

2015-09-29 Thread Vitaly Kuznetsov
"K. Y. Srinivasan" <k...@microsoft.com> writes:

[...]

>
> Vitaly Kuznetsov (2):
>   scsi_scan: don't dump trace when scsi_prep_async_scan() is called
> twice
>   scsi: introduce short_inquiry flag for broken host adapters

James,

I'm sorry for the annoyance but when I asked about these patches last
time you said we don't have them reviewed. Is it OK now when we have
signed-off-by: from K. Y. or do we need to ask someone else?

Thanks,

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hyperv: Fix build failure with HYPERVISOR_GUEST=y && KEXEC_CORE=n

2015-09-29 Thread Vitaly Kuznetsov
Ben Hutchings <b...@decadent.org.uk> writes:

> Currently with this configuration I get:
>
> arch/x86/built-in.o: In function `hv_machine_crash_shutdown':
> mshyperv.c:(.text+0x29656): undefined reference to
> `native_machine_crash_shutdown'

https://lkml.org/lkml/2015/9/23/183
...
https://lkml.org/lkml/2015/9/7/440
...

>
> Fixes: b4370df2b1f5 ("Drivers: hv: vmbus: add special crash handler")
> Signed-off-by: Ben Hutchings <b...@decadent.org.uk>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 381c8b9..0c94600 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -35,7 +35,7 @@ struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
>
>  static void (*hv_kexec_handler)(void);
> -static void (*hv_crash_handler)(struct pt_regs *regs);
> +static void (*hv_crash_handler)(struct pt_regs *regs) __maybe_unused;
>
>  #if IS_ENABLED(CONFIG_HYPERV)
>  static void (*vmbus_handler)(void);
> @@ -105,12 +105,14 @@ static void hv_machine_shutdown(void)
>   native_machine_shutdown();
>  }
>
> +#ifdef CONFIG_KEXEC_CORE
>  static void hv_machine_crash_shutdown(struct pt_regs *regs)
>  {
>   if (hv_crash_handler)
>   hv_crash_handler(regs);
>   native_machine_crash_shutdown(regs);
>  }
> +#endif
>
>  static uint32_t  __init ms_hyperv_platform(void)
> @@ -187,7 +189,9 @@ static void __init ms_hyperv_init_platform(void)
>  #endif
>
>   machine_ops.shutdown = hv_machine_shutdown;
> +#ifdef CONFIG_KEXEC_CORE
>   machine_ops.crash_shutdown = hv_machine_crash_shutdown;
> +#endif
>   mark_tsc_unstable("running on Hyper-V");
>  }

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hyperv: Fix build failure with HYPERVISOR_GUEST=y && KEXEC_CORE=n

2015-09-29 Thread Vitaly Kuznetsov
Ben Hutchings <b...@decadent.org.uk> writes:

> On Tue, 2015-09-29 at 10:47 +0200, Vitaly Kuznetsov wrote:
>> Ben Hutchings <b...@decadent.org.uk> writes:
>> 
>> > Currently with this configuration I get:
>> > 
>> > arch/x86/built-in.o: In function `hv_machine_crash_shutdown':
>> > mshyperv.c:(.text+0x29656): undefined reference to
>> > `native_machine_crash_shutdown'
>> 
>> https://lkml.org/lkml/2015/9/23/183
>> ...
>> https://lkml.org/lkml/2015/9/7/440
>> ...
>
> That's unhelpful, given the instability of lkml.org.
>

Ah, sorry, worked for me. Patches to fix the issue were posted several
times already, I *think* we have the fix in 'tip' kernel atm:
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=7d381b749a010ccd48d5649c843cac0d14d1c229

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v3] x86: hyperv: fix build in !CONFIG_KEXEC_CORE case

2015-09-23 Thread Vitaly Kuznetsov
Recent changes in Hyper-V driver ("Drivers: hv: vmbus: add special crash
handler") broke the build when CONFIG_KEXEC_CORE is not set:

arch/x86/built-in.o: In function `hv_machine_crash_shutdown':
arch/x86/kernel/cpu/mshyperv.c:112: undefined reference to 
`native_machine_crash_shutdown'

Decorate all kexec related code with #ifdef CONFIG_KEXEC_CORE.

Reported-by: Jim Davis <jim.ep...@gmail.com>
Reported-by: Stephen Hemminger <step...@networkplumber.org>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v2:
- Hide hv_kexec_handler/hv_crash_handler under IS_ENABLED(CONFIG_HYPERV)
  to avoid warings in !CONFIG_HYPERV && !CONFIG_KEXEC case [kbuild robot]

Changes since v1:
- Previously I tried solving the issue by defining
  native_machine_crash_shutdown in !CONFIG_KEXEC_CORE case:
  https://lkml.org/lkml/2015/8/11/417. This was to   avoid having #ifdefs
  in C code [Greg KH]. Such approach, however, raised a question on
  bloating kernel with unneeded stuff [Ingo Molnar].
---
 arch/x86/kernel/cpu/mshyperv.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 381c8b9..20e242e 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -34,11 +34,10 @@
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
 
-static void (*hv_kexec_handler)(void);
-static void (*hv_crash_handler)(struct pt_regs *regs);
-
 #if IS_ENABLED(CONFIG_HYPERV)
 static void (*vmbus_handler)(void);
+static void (*hv_kexec_handler)(void);
+static void (*hv_crash_handler)(struct pt_regs *regs);
 
 void hyperv_vector_handler(struct pt_regs *regs)
 {
@@ -96,8 +95,8 @@ void hv_remove_crash_handler(void)
hv_crash_handler = NULL;
 }
 EXPORT_SYMBOL_GPL(hv_remove_crash_handler);
-#endif
 
+#ifdef CONFIG_KEXEC_CORE
 static void hv_machine_shutdown(void)
 {
if (kexec_in_progress && hv_kexec_handler)
@@ -111,7 +110,8 @@ static void hv_machine_crash_shutdown(struct pt_regs *regs)
hv_crash_handler(regs);
native_machine_crash_shutdown(regs);
 }
-
+#endif /* CONFIG_KEXEC_CORE */
+#endif /* CONFIG_HYPERV */
 
 static uint32_t  __init ms_hyperv_platform(void)
 {
@@ -186,8 +186,10 @@ static void __init ms_hyperv_init_platform(void)
no_timer_check = 1;
 #endif
 
+#if IS_ENABLED(CONFIG_HYPERV) && defined(CONFIG_KEXEC_CORE)
machine_ops.shutdown = hv_machine_shutdown;
machine_ops.crash_shutdown = hv_machine_crash_shutdown;
+#endif
mark_tsc_unstable("running on Hyper-V");
 }
 
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] storvsc: get rid of homegrown copy_{to, from}_bounce_buffer()

2015-09-23 Thread Vitaly Kuznetsov
Christoph Hellwig <h...@infradead.org> writes:

> On Tue, Sep 22, 2015 at 06:27:50PM +0200, Vitaly Kuznetsov wrote:
>> Storvsc driver needs to ensure there are no 'holes' in the presented
>> sg list (all segments in the middle of the list need to be of PAGE_SIZE).
>
> I think it should instead set a virt_boundary.  That's what we added for
> the NVMe driver which has the same requirements, and Sagi recently also
> switched iSER to it after we ensured that flag is handled correctly by
> the SG_IO ioctl.

Wow,

I checked and blk_queue_virt_boundary(sdevice->request_queue, PAGE_SIZE
- 1) seems to be solving the issue completely, no bounce buffer
required. I'll test more and send v2 with removing the rest.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2] storvsc: get rid of bounce buffer

2015-09-23 Thread Vitaly Kuznetsov
Storvsc driver needs to ensure there are no 'holes' in the presented
sg list (all segments in the middle of the list need to be of PAGE_SIZE).
When a hole is detected storvsc driver creates a 'bounce sgl' without
holes and copies data over with copy_{to,from}_bounce_buffer() functions.
Setting virt_boundary_mask to PAGE_SIZE - 1 guarantees we'll never see
such holes so we can significantly simplify the driver. This is also
supposed to bring us some performance improvement for certain workloads
as we eliminate copying.

Reported-by: Radim Krčmář <rkrc...@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v1:
- This patch is a successor of 'storvsc: get rid of homegrown
 copy_{to,from}_bounce_buffer()'. Use virt_boundary instead to
 eliminate the need for bounce buffer completely [Christoph Hellwig].
---
 drivers/scsi/storvsc_drv.c | 286 +
 1 file changed, 5 insertions(+), 281 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 40c43ae..eeade40 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -393,9 +393,6 @@ static void storvsc_on_channel_callback(void *context);
 struct storvsc_cmd_request {
struct scsi_cmnd *cmd;
 
-   unsigned int bounce_sgl_count;
-   struct scatterlist *bounce_sgl;
-
struct hv_device *device;
 
/* Synchronize the request/response if needed */
@@ -586,241 +583,6 @@ get_in_err:
 
 }
 
-static void destroy_bounce_buffer(struct scatterlist *sgl,
- unsigned int sg_count)
-{
-   int i;
-   struct page *page_buf;
-
-   for (i = 0; i < sg_count; i++) {
-   page_buf = sg_page(([i]));
-   if (page_buf != NULL)
-   __free_page(page_buf);
-   }
-
-   kfree(sgl);
-}
-
-static int do_bounce_buffer(struct scatterlist *sgl, unsigned int sg_count)
-{
-   int i;
-
-   /* No need to check */
-   if (sg_count < 2)
-   return -1;
-
-   /* We have at least 2 sg entries */
-   for (i = 0; i < sg_count; i++) {
-   if (i == 0) {
-   /* make sure 1st one does not have hole */
-   if (sgl[i].offset + sgl[i].length != PAGE_SIZE)
-   return i;
-   } else if (i == sg_count - 1) {
-   /* make sure last one does not have hole */
-   if (sgl[i].offset != 0)
-   return i;
-   } else {
-   /* make sure no hole in the middle */
-   if (sgl[i].length != PAGE_SIZE || sgl[i].offset != 0)
-   return i;
-   }
-   }
-   return -1;
-}
-
-static struct scatterlist *create_bounce_buffer(struct scatterlist *sgl,
-   unsigned int sg_count,
-   unsigned int len,
-   int write)
-{
-   int i;
-   int num_pages;
-   struct scatterlist *bounce_sgl;
-   struct page *page_buf;
-   unsigned int buf_len = ((write == WRITE_TYPE) ? 0 : PAGE_SIZE);
-
-   num_pages = ALIGN(len, PAGE_SIZE) >> PAGE_SHIFT;
-
-   bounce_sgl = kcalloc(num_pages, sizeof(struct scatterlist), GFP_ATOMIC);
-   if (!bounce_sgl)
-   return NULL;
-
-   sg_init_table(bounce_sgl, num_pages);
-   for (i = 0; i < num_pages; i++) {
-   page_buf = alloc_page(GFP_ATOMIC);
-   if (!page_buf)
-   goto cleanup;
-   sg_set_page(_sgl[i], page_buf, buf_len, 0);
-   }
-
-   return bounce_sgl;
-
-cleanup:
-   destroy_bounce_buffer(bounce_sgl, num_pages);
-   return NULL;
-}
-
-/* Assume the original sgl has enough room */
-static unsigned int copy_from_bounce_buffer(struct scatterlist *orig_sgl,
-   struct scatterlist *bounce_sgl,
-   unsigned int orig_sgl_count,
-   unsigned int bounce_sgl_count)
-{
-   int i;
-   int j = 0;
-   unsigned long src, dest;
-   unsigned int srclen, destlen, copylen;
-   unsigned int total_copied = 0;
-   unsigned long bounce_addr = 0;
-   unsigned long dest_addr = 0;
-   unsigned long flags;
-   struct scatterlist *cur_dest_sgl;
-   struct scatterlist *cur_src_sgl;
-
-   local_irq_save(flags);
-   cur_dest_sgl = orig_sgl;
-   cur_src_sgl = bounce_sgl;
-   for (i = 0; i < orig_sgl_count; i++) {
-   dest_addr = (unsigned long)
-   kmap_atomic(sg_page(cur_dest_sgl)) +
-   cur_dest_sgl->offset;
-   dest = dest_addr;
-   destlen = cur_dest_sgl->length;

[PATCH] storvsc: get rid of homegrown copy_{to,from}_bounce_buffer()

2015-09-22 Thread Vitaly Kuznetsov
Storvsc driver needs to ensure there are no 'holes' in the presented
sg list (all segments in the middle of the list need to be of PAGE_SIZE).
When a hole is detected storvsc driver creates a 'bounce sgl' without
holes and copies data over with its own copy_{to,from}_bounce_buffer()
functions. Scsi layer already has scsi_sg_copy_{to,from}_buffer()
functions to linearize a list, the only difference is that already
existent functions create a flat buffer instead of a new list but we can
cope.

Reported-by: Radim Krčmář <rkrc...@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/scsi/storvsc_drv.c | 281 ++---
 1 file changed, 36 insertions(+), 245 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 40c43ae..31e8c67 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -393,8 +393,8 @@ static void storvsc_on_channel_callback(void *context);
 struct storvsc_cmd_request {
struct scsi_cmnd *cmd;
 
-   unsigned int bounce_sgl_count;
-   struct scatterlist *bounce_sgl;
+   unsigned int bounce_len;
+   void *bounce_buf;
 
struct hv_device *device;
 
@@ -586,21 +586,6 @@ get_in_err:
 
 }
 
-static void destroy_bounce_buffer(struct scatterlist *sgl,
- unsigned int sg_count)
-{
-   int i;
-   struct page *page_buf;
-
-   for (i = 0; i < sg_count; i++) {
-   page_buf = sg_page(([i]));
-   if (page_buf != NULL)
-   __free_page(page_buf);
-   }
-
-   kfree(sgl);
-}
-
 static int do_bounce_buffer(struct scatterlist *sgl, unsigned int sg_count)
 {
int i;
@@ -628,199 +613,6 @@ static int do_bounce_buffer(struct scatterlist *sgl, 
unsigned int sg_count)
return -1;
 }
 
-static struct scatterlist *create_bounce_buffer(struct scatterlist *sgl,
-   unsigned int sg_count,
-   unsigned int len,
-   int write)
-{
-   int i;
-   int num_pages;
-   struct scatterlist *bounce_sgl;
-   struct page *page_buf;
-   unsigned int buf_len = ((write == WRITE_TYPE) ? 0 : PAGE_SIZE);
-
-   num_pages = ALIGN(len, PAGE_SIZE) >> PAGE_SHIFT;
-
-   bounce_sgl = kcalloc(num_pages, sizeof(struct scatterlist), GFP_ATOMIC);
-   if (!bounce_sgl)
-   return NULL;
-
-   sg_init_table(bounce_sgl, num_pages);
-   for (i = 0; i < num_pages; i++) {
-   page_buf = alloc_page(GFP_ATOMIC);
-   if (!page_buf)
-   goto cleanup;
-   sg_set_page(_sgl[i], page_buf, buf_len, 0);
-   }
-
-   return bounce_sgl;
-
-cleanup:
-   destroy_bounce_buffer(bounce_sgl, num_pages);
-   return NULL;
-}
-
-/* Assume the original sgl has enough room */
-static unsigned int copy_from_bounce_buffer(struct scatterlist *orig_sgl,
-   struct scatterlist *bounce_sgl,
-   unsigned int orig_sgl_count,
-   unsigned int bounce_sgl_count)
-{
-   int i;
-   int j = 0;
-   unsigned long src, dest;
-   unsigned int srclen, destlen, copylen;
-   unsigned int total_copied = 0;
-   unsigned long bounce_addr = 0;
-   unsigned long dest_addr = 0;
-   unsigned long flags;
-   struct scatterlist *cur_dest_sgl;
-   struct scatterlist *cur_src_sgl;
-
-   local_irq_save(flags);
-   cur_dest_sgl = orig_sgl;
-   cur_src_sgl = bounce_sgl;
-   for (i = 0; i < orig_sgl_count; i++) {
-   dest_addr = (unsigned long)
-   kmap_atomic(sg_page(cur_dest_sgl)) +
-   cur_dest_sgl->offset;
-   dest = dest_addr;
-   destlen = cur_dest_sgl->length;
-
-   if (bounce_addr == 0)
-   bounce_addr = (unsigned long)kmap_atomic(
-   sg_page(cur_src_sgl));
-
-   while (destlen) {
-   src = bounce_addr + cur_src_sgl->offset;
-   srclen = cur_src_sgl->length - cur_src_sgl->offset;
-
-   copylen = min(srclen, destlen);
-   memcpy((void *)dest, (void *)src, copylen);
-
-   total_copied += copylen;
-   cur_src_sgl->offset += copylen;
-   destlen -= copylen;
-   dest += copylen;
-
-   if (cur_src_sgl->offset == cur_src_sgl->length) {
-   /* full */
-   kunmap_atomic((void *)bounce_addr);
-   j++;
-
-   /*
- 

[PATCH v2] x86: hyperv: fix build in !CONFIG_KEXEC_CORE case

2015-09-22 Thread Vitaly Kuznetsov
Recent changes in Hyper-V driver ("Drivers: hv: vmbus: add special crash
handler") broke the build when CONFIG_KEXEC_CORE is not set:

arch/x86/built-in.o: In function `hv_machine_crash_shutdown':
arch/x86/kernel/cpu/mshyperv.c:112: undefined reference to 
`native_machine_crash_shutdown'

Decorate all kexec related code with #ifdef CONFIG_KEXEC_CORE.

Reported-by: Jim Davis <jim.ep...@gmail.com>
Reported-by: Stephen Hemminger <step...@networkplumber.org>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Previously I tried solving the issue by defining native_machine_crash_shutdown
in !CONFIG_KEXEC_CORE case: https://lkml.org/lkml/2015/8/11/417. This was to
avoid having #ifdefs in C code [Greg KH]. Such approach, however, raised a
question on bloating kernel with unneeded stuff [Ingo Molnar].
---
 arch/x86/kernel/cpu/mshyperv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 381c8b9..6cec20a 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -98,6 +98,7 @@ void hv_remove_crash_handler(void)
 EXPORT_SYMBOL_GPL(hv_remove_crash_handler);
 #endif
 
+#ifdef CONFIG_KEXEC_CORE
 static void hv_machine_shutdown(void)
 {
if (kexec_in_progress && hv_kexec_handler)
@@ -111,7 +112,7 @@ static void hv_machine_crash_shutdown(struct pt_regs *regs)
hv_crash_handler(regs);
native_machine_crash_shutdown(regs);
 }
-
+#endif
 
 static uint32_t  __init ms_hyperv_platform(void)
 {
@@ -186,8 +187,10 @@ static void __init ms_hyperv_init_platform(void)
no_timer_check = 1;
 #endif
 
+#ifdef CONFIG_KEXEC_CORE
machine_ops.shutdown = hv_machine_shutdown;
machine_ops.crash_shutdown = hv_machine_crash_shutdown;
+#endif
mark_tsc_unstable("running on Hyper-V");
 }
 
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 2/5] hv: add helpers to handle hv_util device state

2015-09-21 Thread Vitaly Kuznetsov
Olaf Hering  writes:

> On Sun, Sep 20, Greg KH wrote:
>
>> Just use a lock, that's what it is there for.
>
> How would that help? It might help because it enforces ordering. But
> that requires that all three utils get refactored to deal with the
> introduced locking. I will let KY comment on this.
>
> The issue I see with fcopy is that after or while fcopy_respond_to_host
> runs an interrupt triggers which also calls into
> hv_fcopy_onchannelcallback.  It was most likely caused by a logic change
> in "recent" vmbus updates because this did not happen before. At least,
> the fcopy hang was not seen earler. Maybe the bug did just not trigger
> up to now for other reasons...

I'd like to see a trace from the hang, it is not obvious to me how it
happened and what caused it. (or if you have such hang scenario in your
head, can you please reveal it?) AFAICS barriers you introduced don't
give you guarantees in an SMP environment.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 2/5] hv: add helpers to handle hv_util device state

2015-09-21 Thread Vitaly Kuznetsov
Olaf Hering <o...@aepfle.de> writes:

> On Mon, Sep 21, Vitaly Kuznetsov wrote:
>
>> I'd like to see a trace from the hang, it is not obvious to me how it
>> happened and what caused it. (or if you have such hang scenario in your
>> head, can you please reveal it?)
>
> There is no trace. I think fcopy_respond_to_host notifies the host,
> which in turn triggers an interrupt right away which is processed while
> fcopy_on_msg is executing somewhere between the return from
> fcopy_respond_to_host and the call into hv_fcopy_onchannelcallback.
>

I think it is fcopy_transaction.fcopy_context which gets out of sync.

When we're done processing some request we have the following code:

fcopy_transaction.state = HVUTIL_USERSPACE_RECV;
fcopy_respond_to_host(*val);
fcopy_transaction.state = HVUTIL_READY;
hv_poll_channel(fcopy_transaction.fcopy_context,
hv_fcopy_onchannelcallback);

If interrupt happens after we did fcopy_respond_to_host()
fcopy_transaction.state will still be HVUTIL_USERSPACE_RECV or even its
previous HVUTIL_USERSPACE_REQ but it's OK as we have the following in
hv_fcopy_onchannelcallback()

if (fcopy_transaction.state > HVUTIL_READY) {
/*
 * We will defer processing this callback once
 * the current transaction is complete.
 */
fcopy_transaction.fcopy_context = context;
return;
}

And we're supposed to process the work with hv_poll_channel(). The
problem is (I guess) that fcopy_transaction.fcopy_context gets out of
sync and it still has its previous value (possibly NULL). We call
hv_poll_channel() with NULL and everything gets stuck as we'll never
process the request.

AFAICS proper locking is requred here (and probably in all three
drivers), we need to protect not only .state but the whole transaction.

[...]

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] hyper-v: mark TSC as unstable

2015-08-19 Thread Vitaly Kuznetsov
Hyper-V top-level functional specification states, that algorithms should
be resilient to sudden jumps forward or backward in the TSC value, this
means that we should consider TSC as unstable. In some cases tsc tests are
able to detect the instability, it was detected in 543 out of 646 boots in
my testing:

 Measured 6277 cycles TSC warp between CPUs, turning off TSC clock.
 tsc: Marking TSC unstable due to check_tsc_sync_source failed

This is, however, just a heuristic. On Hyper-V platform there are two good
clocksources: MSR-based hyperv_clocksource and recently introduced TSC
page.

Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
---
 arch/x86/kernel/cpu/mshyperv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index aad4bd8..6fd023d 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -141,6 +141,7 @@ static void __init ms_hyperv_init_platform(void)
no_timer_check = 1;
 #endif
 
+   mark_tsc_unstable(running on Hyper-V);
 }
 
 const __refconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] scsi: storvsc: be more picky about scmnd-sc_data_direction

2015-08-13 Thread Vitaly Kuznetsov
KY Srinivasan k...@microsoft.com writes:

 -Original Message-
 From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
 Sent: Thursday, June 25, 2015 9:12 AM
 To: linux-s...@vger.kernel.org
 Cc: Long Li; KY Srinivasan; Haiyang Zhang; James E.J. Bottomley;
 de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; Radim Krčmář
 Subject: [PATCH] scsi: storvsc: be more picky about scmnd-
 sc_data_direction
 
 Under the 'default' case in scmnd-sc_data_direction we have 3 options:
 - DMA_NONE which we handle correctly.
 - DMA_BIDIRECTIONAL which is never supposed to be set by SCSI stack.
 - Garbage value.
 
 Do WARN() and return -EINVAL in the last two cases. virtio_scsi does
 BUG_ON() here but it looks like an overkill.
 
 Reported-by: Radim Krčmář rkrc...@redhat.com
 Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
 Signed-off-by: K. Y. Srinivasan k...@microsoft.com

Sorry for the ping but it seems the fix never made it to scsi tree...

 ---
  drivers/scsi/storvsc_drv.c | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
 index 3c6584f..61f4855 100644
 --- a/drivers/scsi/storvsc_drv.c
 +++ b/drivers/scsi/storvsc_drv.c
 @@ -1598,10 +1598,18 @@ static int storvsc_queuecommand(struct
 Scsi_Host *host, struct scsi_cmnd *scmnd)
  vm_srb-data_in = READ_TYPE;
  vm_srb-win8_extension.srb_flags |=
 SRB_FLAGS_DATA_IN;
  break;
 -default:
 +case DMA_NONE:
  vm_srb-data_in = UNKNOWN_TYPE;
  vm_srb-win8_extension.srb_flags |=
 SRB_FLAGS_NO_DATA_TRANSFER;
  break;
 +default:
 +/*
 + * This is DMA_BIDIRECTIONAL or something else we are
 never
 + * supposed to see here.
 + */
 +WARN(1, Unexpected data direction: %d\n,
 + scmnd-sc_data_direction);
 +return -EINVAL;
  }
 
 
 --
 2.4.3

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] scsi: storvsc: use shost_for_each_device() instead of open coding

2015-08-13 Thread Vitaly Kuznetsov
Long Li lon...@microsoft.com writes:

 -Original Message-
 From: KY Srinivasan
 Sent: Friday, July 03, 2015 11:35 AM
 To: Vitaly Kuznetsov; linux-s...@vger.kernel.org
 Cc: Long Li; Haiyang Zhang; James E.J. Bottomley; 
 de...@linuxdriverproject.org;
 linux-ker...@vger.kernel.org
 Subject: RE: [PATCH] scsi: storvsc: use shost_for_each_device() instead of 
 open
 coding
 
 
 
  -Original Message-
  From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
  Sent: Wednesday, July 1, 2015 2:31 AM
  To: linux-s...@vger.kernel.org
  Cc: Long Li; KY Srinivasan; Haiyang Zhang; James E.J. Bottomley;
  de...@linuxdriverproject.org; linux-ker...@vger.kernel.org
  Subject: [PATCH] scsi: storvsc: use shost_for_each_device() instead of
  open coding
 
  Comment in struct Scsi_Host says that drivers are not supposed to
  access __devices directly. storvsc_host_scan() doesn't happen in irq
  context so we can just use shost_for_each_device().
 
  Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
 
 Signed-off-by: K. Y. Srinivasan k...@microsoft.com
 Reviewed-by: Long Li lon...@microsoft.com

Sorry for the ping but it seems the fix never made it to scsi tree...

  ---
   drivers/scsi/storvsc_drv.c | 9 +
   1 file changed, 1 insertion(+), 8 deletions(-)
 
  diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
  index 3c6584f..9ea912b 100644
  --- a/drivers/scsi/storvsc_drv.c
  +++ b/drivers/scsi/storvsc_drv.c
  @@ -426,7 +426,6 @@ static void storvsc_host_scan(struct work_struct
  *work)
 struct storvsc_scan_work *wrk;
 struct Scsi_Host *host;
 struct scsi_device *sdev;
  -  unsigned long flags;
 
 wrk = container_of(work, struct storvsc_scan_work, work);
 host = wrk-host;
  @@ -443,14 +442,8 @@ static void storvsc_host_scan(struct work_struct
  *work)
  * may have been removed this way.
  */
 mutex_lock(host-scan_mutex);
  -  spin_lock_irqsave(host-host_lock, flags);
  -  list_for_each_entry(sdev, host-__devices, siblings) {
  -  spin_unlock_irqrestore(host-host_lock, flags);
  +  shost_for_each_device(sdev, host)
 scsi_test_unit_ready(sdev, 1, 1, NULL);
  -  spin_lock_irqsave(host-host_lock, flags);
  -  continue;
  -  }
  -  spin_unlock_irqrestore(host-host_lock, flags);
 mutex_unlock(host-scan_mutex);
 /*
  * Now scan the host to discover LUNs that may have been added.
  --
  2.4.3

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 1/1] X86: mshyperv.c: Fix a compilation issue.

2015-08-10 Thread Vitaly Kuznetsov
KY Srinivasan k...@microsoft.com writes:

 -Original Message-
 From: Greg KH [mailto:gre...@linuxfoundation.org]
 Sent: Thursday, August 6, 2015 2:11 PM
 To: KY Srinivasan k...@microsoft.com
 Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org;
 vkuzn...@redhat.com; s...@canb.auug.org.au; t...@linutronix.de;
 mi...@redhat.com; h...@zytor.com; x...@kernel.org; Haiyang Zhang
 haiya...@microsoft.com
 Subject: Re: [PATCH 1/1] X86: mshyperv.c: Fix a compilation issue.
 
 On Thu, Aug 06, 2015 at 02:41:24PM -0700, K. Y. Srinivasan wrote:
  Building with a random configuration file, this build failure
  was reported:
 
  arch/x86/built-in.o: In function `hv_machine_crash_shutdown':
  arch/x86/kernel/cpu/mshyperv.c:112: undefined
  reference to `native_machine_crash_shutdown'
 
  This patch fixes the problem
 
  Reported-by: Jim Davis jim.ep...@gmail.com
  Signed-off-by: K. Y. Srinivasan k...@microsoft.com
  ---
   arch/x86/kernel/cpu/mshyperv.c |2 ++
   1 files changed, 2 insertions(+), 0 deletions(-)
 
  diff --git a/arch/x86/kernel/cpu/mshyperv.c 
  b/arch/x86/kernel/cpu/mshyperv.c
  index f794bfa..0faed5e0 100644
  --- a/arch/x86/kernel/cpu/mshyperv.c
  +++ b/arch/x86/kernel/cpu/mshyperv.c
  @@ -109,7 +109,9 @@ static void hv_machine_crash_shutdown(struct pt_regs 
  *regs)
   {
 if (hv_crash_handler)
 hv_crash_handler(regs);
  +#ifdef CONFIG_KEXEC_CORE
 native_machine_crash_shutdown(regs);
  +#endif
 
 Why is kexec the factor here?  And if it really does, can't it just be
 CONFIG_KEXEC, or, can kexec provide a dummy inline function so that
 you don't have to have a #ifdef in a .c file?

 Greg,

 I am on vacation till next Tuesday. I will address your concern after I get 
 back if Vitaly
 has not addressed it by then.

I think the issue is not Hyper-V-related: we have
native_machine_crash_shutdown() defined in asm/reboot.h but its
implementation in kernel/crash.c is only being compiled with
CONFIG_KEXEC. I suggest we fix it with something like the following:

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index a82c4f1..6763e3e 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -16,8 +16,15 @@ struct machine_ops {

extern struct machine_ops machine_ops;

-void native_machine_crash_shutdown(struct pt_regs *regs);
void native_machine_shutdown(void);
+#ifdef CONFIG_KEXEC
+void native_machine_crash_shutdown(struct pt_regs *regs);
+#else
+static inline void native_machine_crash_shutdown(struct pt_regs *regs)
+{
+   native_machine_shutdown();
+}
+#endif
void __noreturn machine_real_restart(unsigned int type);
/* These must match dispatch in arch/x86/realmore/rm/reboot.S */
#define MRR_BIOS   0

I'll post a patch if there are no objections.

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v7 3/3] Drivers: hv: vmbus: use cpu_hotplug_enable/disable

2015-07-23 Thread Vitaly Kuznetsov
Commit e513229b4c38 (Drivers: hv: vmbus: prevent cpu offlining on newer
hypervisors) was altering smp_ops.cpu_disable to prevent CPU offlining.
We can bo better by using cpu_hotplug_enable/disable functions instead of
such hard-coding.

Reported-by: Radim Krčmář rkrc...@redhat.com
Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
Signed-off-by: K. Y. Srinivasan k...@microsoft.com
---
 drivers/hv/vmbus_drv.c | 38 --
 1 file changed, 4 insertions(+), 34 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index cf20400..6de65fb 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -763,38 +763,6 @@ static void vmbus_isr(void)
}
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-static int hyperv_cpu_disable(void)
-{
-   return -ENOSYS;
-}
-
-static void hv_cpu_hotplug_quirk(bool vmbus_loaded)
-{
-   static void *previous_cpu_disable;
-
-   /*
-* Offlining a CPU when running on newer hypervisors (WS2012R2, Win8,
-* ...) is not supported at this moment as channel interrupts are
-* distributed across all of them.
-*/
-
-   if ((vmbus_proto_version == VERSION_WS2008) ||
-   (vmbus_proto_version == VERSION_WIN7))
-   return;
-
-   if (vmbus_loaded) {
-   previous_cpu_disable = smp_ops.cpu_disable;
-   smp_ops.cpu_disable = hyperv_cpu_disable;
-   pr_notice(CPU offlining is not supported by hypervisor\n);
-   } else if (previous_cpu_disable)
-   smp_ops.cpu_disable = previous_cpu_disable;
-}
-#else
-static void hv_cpu_hotplug_quirk(bool vmbus_loaded)
-{
-}
-#endif
 
 /*
  * vmbus_bus_init -Main vmbus driver initialization routine.
@@ -836,7 +804,8 @@ static int vmbus_bus_init(int irq)
if (ret)
goto err_alloc;
 
-   hv_cpu_hotplug_quirk(true);
+   if (vmbus_proto_version  VERSION_WIN7)
+   cpu_hotplug_disable();
 
/*
 * Only register if the crash MSRs are available
@@ -1121,7 +1090,8 @@ static void __exit vmbus_exit(void)
smp_call_function_single(cpu, hv_synic_cleanup, NULL, 1);
}
acpi_bus_unregister_driver(vmbus_acpi_driver);
-   hv_cpu_hotplug_quirk(false);
+   if (vmbus_proto_version  VERSION_WIN7)
+   cpu_hotplug_enable();
 }
 
 
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v7 1/3] cpu-hotplug: convert cpu_hotplug_disabled to a counter

2015-07-23 Thread Vitaly Kuznetsov
As a prerequisite to exporting cpu_hotplug_enable/cpu_hotplug_disable
functions to modules we need to convert cpu_hotplug_disabled to a counter
to properly support disable - disable - enable call sequences. E.g.
after Hyper-V vmbus module (which is supposed to be the first user of
exported cpu_hotplug_enable/cpu_hotplug_disable) did cpu_hotplug_disable()
hibernate path calls disable_nonboot_cpus() and if we hit an error in
_cpu_down() enable_nonboot_cpus() will be called on the failure path (thus
making cpu_hotplug_disabled = 0 and leaving cpu hotplug in 'enabled'
state). Same problem is possible if more than 1 module use
cpu_hotplug_disable/cpu_hotplug_enable on their load/unload paths. When
one of these modules is been unloaded it is logical to leave cpu hotplug
in 'disabled' state.

To support the change we need to increse cpu_hotplug_disabled counter
in disable_nonboot_cpus() unconditionally as all users of
disable_nonboot_cpus() are supposed to do enable_nonboot_cpus() in case
an error was returned.

Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
Reviewed-by: Thomas Gleixner t...@linutronix.de
---
Changes since v6:
- make this patch first in the series [Thomas Gleixner], minor description
  change.
---
 Documentation/power/suspend-and-cpuhotplug.txt |  6 +++---
 kernel/cpu.c   | 21 +
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/Documentation/power/suspend-and-cpuhotplug.txt 
b/Documentation/power/suspend-and-cpuhotplug.txt
index 2850df3..2fc9095 100644
--- a/Documentation/power/suspend-and-cpuhotplug.txt
+++ b/Documentation/power/suspend-and-cpuhotplug.txt
@@ -72,7 +72,7 @@ More details follow:
 |
 v
Disable regular cpu hotplug
-by setting cpu_hotplug_disabled=1
+by increasing cpu_hotplug_disabled
 |
 v
 Release cpu_add_remove_lock
@@ -89,7 +89,7 @@ Resuming back is likewise, with the counterparts being (in 
the order of
 execution during resume):
 * enable_nonboot_cpus() which involves:
|  Acquire cpu_add_remove_lock
-   |  Reset cpu_hotplug_disabled to 0, thereby enabling regular cpu hotplug
+   |  Decrease cpu_hotplug_disabled, thereby enabling regular cpu hotplug
|  Call _cpu_up() [for all those cpus in the frozen_cpus mask, in a loop]
|  Release cpu_add_remove_lock
v
@@ -120,7 +120,7 @@ after the entire cycle is complete (i.e., suspend + resume).
Acquire cpu_add_remove_lock
 |
 v
-  If cpu_hotplug_disabled is 1
+  If cpu_hotplug_disabled  0
 return gracefully
 |
 |
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 5644ec5..0fca2ba 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -191,14 +191,14 @@ void cpu_hotplug_done(void)
 void cpu_hotplug_disable(void)
 {
cpu_maps_update_begin();
-   cpu_hotplug_disabled = 1;
+   cpu_hotplug_disabled++;
cpu_maps_update_done();
 }
 
 void cpu_hotplug_enable(void)
 {
cpu_maps_update_begin();
-   cpu_hotplug_disabled = 0;
+   WARN_ON(--cpu_hotplug_disabled  0);
cpu_maps_update_done();
 }
 
@@ -608,13 +608,18 @@ int disable_nonboot_cpus(void)
}
}
 
-   if (!error) {
+   if (!error)
BUG_ON(num_online_cpus()  1);
-   /* Make sure the CPUs won't be enabled by someone else */
-   cpu_hotplug_disabled = 1;
-   } else {
+   else
pr_err(Non-boot CPUs are not disabled\n);
-   }
+
+   /*
+* Make sure the CPUs won't be enabled by someone else. We need to do
+* this even in case of failure as all disable_nonboot_cpus() users are
+* supposed to do enable_nonboot_cpus() on the failure path.
+*/
+   cpu_hotplug_disabled++;
+
cpu_maps_update_done();
return error;
 }
@@ -633,7 +638,7 @@ void __ref enable_nonboot_cpus(void)
 
/* Allow everyone to use the CPU hotplug again */
cpu_maps_update_begin();
-   cpu_hotplug_disabled = 0;
+   WARN_ON(--cpu_hotplug_disabled  0);
if (cpumask_empty(frozen_cpus))
goto out;
 
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v7 2/3] cpu-hotplug: export cpu_hotplug_enable/cpu_hotplug_disable

2015-07-23 Thread Vitaly Kuznetsov
Hyper-V module needs to disable cpu hotplug (offlining) as there is no
support from hypervisor side to reassign already opened event channels
to a different CPU. Currently it is been done by altering
smp_ops.cpu_disable but it is hackish.

Signed-off-by: Vitaly Kuznetsov vkuzn...@redhat.com
Reviewed-by: Thomas Gleixner t...@linutronix.de
---
Changes since v6:
- fix a typo in description [Thomas Gleixner]
---
 kernel/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 0fca2ba..718ea76 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -194,6 +194,7 @@ void cpu_hotplug_disable(void)
cpu_hotplug_disabled++;
cpu_maps_update_done();
 }
+EXPORT_SYMBOL_GPL(cpu_hotplug_disable);
 
 void cpu_hotplug_enable(void)
 {
@@ -201,7 +202,7 @@ void cpu_hotplug_enable(void)
WARN_ON(--cpu_hotplug_disabled  0);
cpu_maps_update_done();
 }
-
+EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
 #endif /* CONFIG_HOTPLUG_CPU */
 
 /* Need to know about CPUs going up/down? */
-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v7 0/3] Drivers: hv: vmbus: use cpu_hotplug_enable/disable for CPU offlining prevention

2015-07-23 Thread Vitaly Kuznetsov
Changes since v6:
- Rearrange patches. [Thomas Gleixner]
- Fix a typo in PATCH 2 description [Thomas Gleixner].
- Add Reviewed-by: [Thomas Gleixner].

Changes since v5:
- Split PATCH 1 into two (PATCH 1/3 and 3/3), rewrite changelogs. [Thomas
  Gleixner]

Changes since v4:
- In disable_nonboot_cpus() do cpu_hotplug_disabled++ unconditionally as
  its users are doing enable_nonboot_cpus() on their failure paths.

Changes since v3:
- add WARN_ON when decreasing cpu_hotplug_disabled [Peter Zijlstra]

Changes since v2:
- Rebase on top of current Greg's char-misc-next tree [K. Y. Srinivasan]

Changes since v1:
- Make cpu_hotplug_disabled a counter [Radim Krčmář]

Export cpu_hotplug_enable/cpu_hotplug_disable functions from cpu.c and use
them instead of altering smp_ops.cpu_disable in Hyper-V vmbus module.

Vitaly Kuznetsov (3):
  cpu-hotplug: convert cpu_hotplug_disabled to a counter
  cpu-hotplug: export cpu_hotplug_enable/cpu_hotplug_disable
  Drivers: hv: vmbus: use cpu_hotplug_enable/disable

 Documentation/power/suspend-and-cpuhotplug.txt |  6 ++--
 drivers/hv/vmbus_drv.c | 38 +++---
 kernel/cpu.c   | 24 ++--
 3 files changed, 22 insertions(+), 46 deletions(-)

-- 
2.4.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH V3 3/7] Drivers: hv: vmbus: add APIs to send/recv hvsock packet and get the r/w-ability

2015-07-21 Thread Vitaly Kuznetsov
Dexuan Cui de...@microsoft.com writes:

 This will be used by the coming net/hvsock driver.

 Signed-off-by: Dexuan Cui de...@microsoft.com
 ---
  drivers/hv/channel.c  | 133 
 ++
  drivers/hv/hyperv_vmbus.h |   4 ++
  drivers/hv/ring_buffer.c  |  14 +
  include/linux/hyperv.h|  32 +++
  4 files changed, 183 insertions(+)

 diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
 index b09d1b7..ffdef03 100644
 --- a/drivers/hv/channel.c
 +++ b/drivers/hv/channel.c
 @@ -758,6 +758,53 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
 *channel,
  EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer_ctl);

  /*
 + * vmbus_sendpacket_hvsock - Send the hvsock payload 'buf' into the vmbus
 + * ringbuffer
 + */
 +int vmbus_sendpacket_hvsock(struct vmbus_channel *channel, void *buf, u32 
 len)
 +{
 + struct vmpipe_proto_header pipe_hdr;
 + struct vmpacket_descriptor desc;
 + struct kvec bufferlist[4];
 + u32 packetlen_aligned;
 + u32 packetlen;
 + u64 aligned_data = 0;
 + bool signal = false;
 + int ret;
 +
 + packetlen = HVSOCK_HEADER_LEN + len;
 + packetlen_aligned = ALIGN(packetlen, sizeof(u64));
 +
 + /* Setup the descriptor */
 + desc.type = VM_PKT_DATA_INBAND;
 + /* in 8-bytes granularity */
 + desc.offset8 = sizeof(struct vmpacket_descriptor)  3;
 + desc.len8 = (u16)(packetlen_aligned  3);
 + desc.flags = 0;
 + desc.trans_id = 0;
 +
 + pipe_hdr.pkt_type = 1;
 + pipe_hdr.data_size = len;
 +
 + bufferlist[0].iov_base = desc;
 + bufferlist[0].iov_len  = sizeof(struct vmpacket_descriptor);
 + bufferlist[1].iov_base = pipe_hdr;
 + bufferlist[1].iov_len  = sizeof(struct vmpipe_proto_header);
 + bufferlist[2].iov_base = buf;
 + bufferlist[2].iov_len  = len;
 + bufferlist[3].iov_base = aligned_data;
 + bufferlist[3].iov_len  = packetlen_aligned - packetlen;
 +
 + ret = hv_ringbuffer_write(channel-outbound, bufferlist, 4, signal);
 +
 + if (ret == 0  signal)
 + vmbus_setevent(channel);
 +
 + return ret;
 +}
 +EXPORT_SYMBOL_GPL(vmbus_sendpacket_hvsock);
 +
 +/*
   * vmbus_sendpacket_pagebuffer - Send a range of single-page buffer
   * packets using a GPADL Direct packet type.
   */
 @@ -978,3 +1025,89 @@ int vmbus_recvpacket_raw(struct vmbus_channel *channel, 
 void *buffer,
   return ret;
  }
  EXPORT_SYMBOL_GPL(vmbus_recvpacket_raw);
 +
 +/*
 + * vmbus_recvpacket_hvsock - Receive the hvsock payload from the vmbus
 + * ringbuffer into the 'buffer'.
 + */
 +int vmbus_recvpacket_hvsock(struct vmbus_channel *channel, void *buffer,
 + u32 bufferlen, u32 *buffer_actual_len)
 +{
 + struct vmpipe_proto_header *pipe_hdr;
 + struct vmpacket_descriptor *desc;
 + u32 packet_len, payload_len;
 + bool signal = false;
 + int ret;
 +
 + *buffer_actual_len = 0;
 +
 + if (bufferlen  HVSOCK_HEADER_LEN)
 + return -ENOBUFS;
 +
 + ret = hv_ringbuffer_peek(channel-inbound, buffer,
 +  HVSOCK_HEADER_LEN);
 + if (ret != 0)
 + return 0;

I'd suggest you do something like

if (ret == -EAGIAIN)
return 0;
else if (ret)
return ret;

to make it future-proof (e.g. when a new error is returned by
hv_ringbuffer_peek). And a comment would also be useful as it is unclear
why we silence errors here.

 +
 + desc = (struct vmpacket_descriptor *)buffer;
 + packet_len = desc-len8  3;
 + if (desc-type != VM_PKT_DATA_INBAND ||
 + desc-offset8 != (sizeof(*desc) / 8) ||
 + packet_len  HVSOCK_HEADER_LEN)
 + return -EIO;
 +
 + pipe_hdr = (struct vmpipe_proto_header *)(desc + 1);
 + payload_len = pipe_hdr-data_size;
 +
 + if (pipe_hdr-pkt_type != 1 || payload_len == 0)
 + return -EIO;
 +
 + if (HVSOCK_PKT_LEN(payload_len) != packet_len + PREV_INDICES_LEN)
 + return -EIO;
 +
 + if (bufferlen  packet_len - HVSOCK_HEADER_LEN)
 + return -ENOBUFS;
 +
 + /* Copy over the hvsock payload to the user buffer */
 + ret = hv_ringbuffer_read(channel-inbound, buffer,
 +  packet_len - HVSOCK_HEADER_LEN,
 +  HVSOCK_HEADER_LEN, signal);
 + if (ret != 0)
 + return ret;
 +
 + *buffer_actual_len = payload_len;
 +
 + if (signal)
 + vmbus_setevent(channel);
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(vmbus_recvpacket_hvsock);
 +
 +/*
 + * vmbus_get_hvsock_rw_status - can the ringbuffer be read/written?
 + */
 +void vmbus_get_hvsock_rw_status(struct vmbus_channel *channel,
 + bool *can_read, bool *can_write)
 +{
 + u32 avl_read_bytes, avl_write_bytes, dummy;
 +
 + if (can_read != NULL) {
 + hv_get_ringbuffer_available_space(channel-inbound,
 +   avl_read_bytes,
 +

Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe

2015-07-17 Thread Vitaly Kuznetsov
K. Y. Srinivasan k...@microsoft.com writes:

 The current code returns from probe without waiting for the proper handling
 of subchannels that may be requested. If the netvsc driver were to be rapidly
 loaded/unloaded, we can  trigger a panic as the unload will be tearing
 down state that may not have been fully setup yet. We fix this issue by making
 sure that we return from the probe call only after ensuring that the
 sub-channel offers in flight are properly handled.

 Signed-off-by: K. Y. Srinivasan k...@microsoft.com
 Reviewed-and-tested-by: Haiyang Zhang haiya...@microsoft.com
 ---
  drivers/net/hyperv/hyperv_net.h   |2 ++
  drivers/net/hyperv/rndis_filter.c |   25 +
  2 files changed, 27 insertions(+), 0 deletions(-)

 diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
 index 26cd14c..925b75d 100644
 --- a/drivers/net/hyperv/hyperv_net.h
 +++ b/drivers/net/hyperv/hyperv_net.h
 @@ -671,6 +671,8 @@ struct netvsc_device {
   u32 send_table[VRSS_SEND_TAB_SIZE];
   u32 max_chn;
   u32 num_chn;
 + spinlock_t sc_lock; /* Protects num_sc_offered variable */
 + u32 num_sc_offered;
   atomic_t queue_sends[NR_CPUS];

   /* Holds rndis device info */
 diff --git a/drivers/net/hyperv/rndis_filter.c 
 b/drivers/net/hyperv/rndis_filter.c
 index 2e40417..2e09f3f 100644
 --- a/drivers/net/hyperv/rndis_filter.c
 +++ b/drivers/net/hyperv/rndis_filter.c
 @@ -984,9 +984,16 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
   struct netvsc_device *nvscdev;
   u16 chn_index = new_sc-offermsg.offer.sub_channel_index;
   int ret;
 + unsigned long flags;

   nvscdev = hv_get_drvdata(new_sc-primary_channel-device_obj);

 + spin_lock_irqsave(nvscdev-sc_lock, flags);
 + nvscdev-num_sc_offered--;
 + spin_unlock_irqrestore(nvscdev-sc_lock, flags);
 + if (nvscdev-num_sc_offered == 0)
 + complete(nvscdev-channel_init_wait);
 +
   if (chn_index = nvscdev-num_chn)
   return;

 @@ -1015,8 +1022,10 @@ int rndis_filter_device_add(struct hv_device *dev,
   u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
   u32 mtu, size;
   u32 num_rss_qs;
 + u32 sc_delta;
   const struct cpumask *node_cpu_mask;
   u32 num_possible_rss_qs;
 + unsigned long flags;

   rndis_device = get_rndis_device();
   if (!rndis_device)
 @@ -1039,6 +1048,8 @@ int rndis_filter_device_add(struct hv_device *dev,
   net_device-max_chn = 1;
   net_device-num_chn = 1;

 + spin_lock_init(net_device-sc_lock);
 +
   net_device-extension = rndis_device;
   rndis_device-net_dev = net_device;

 @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device *dev,
   num_possible_rss_qs = cpumask_weight(node_cpu_mask);
   net_device-num_chn = min(num_possible_rss_qs, num_rss_qs);

 + num_rss_qs = net_device-num_chn - 1;
 + net_device-num_sc_offered = num_rss_qs;
 +
   if (net_device-num_chn == 1)
   goto out;

 @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device *dev,

   ret = rndis_filter_set_rss_param(rndis_device, net_device-num_chn);

 + /*
 +  * Wait for the host to send us the sub-channel offers.
 +  */
 + spin_lock_irqsave(net_device-sc_lock, flags);
 + sc_delta = net_device-num_chn - 1 - num_rss_qs;
 + net_device-num_sc_offered -= sc_delta;
 + spin_unlock_irqrestore(net_device-sc_lock, flags);
 +
 + if (net_device-num_sc_offered != 0)
 + wait_for_completion(net_device-channel_init_wait);

I'd suggest we add an essentian timeout (big, let's say 30 sec.)
here. In case something goes wrong we don't really want to hang the
whole kernel for forever. Such bugs are hard to debug as if a 'kernel
hangs' is reported we can't be sure which wait caused it. We can even
have something like:

 t = wait_for_completion_timeout(net_device-channel_init_wait, 30*HZ);
 BUG_ON(t == 0);

This is much better as we'll be sure what went wrong. (I know other
pieces of hyper-v code use wait_for_completion() without a timeout, this
is rather a general suggestion for all of them).

  out:
   if (ret) {
   net_device-max_chn = 1;
   net_device-num_chn = 1;
   }
 +
   return 0; /* return 0 because primary channel can be used alone */

  err_dev_remv:

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be processed during probe

2015-07-17 Thread Vitaly Kuznetsov
KY Srinivasan k...@microsoft.com writes:

 -Original Message-
 From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
 Sent: Friday, July 17, 2015 7:13 AM
 To: KY Srinivasan
 Cc: da...@davemloft.net; net...@vger.kernel.org; linux-
 ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
 a...@canonical.com; jasow...@redhat.com; Dexuan Cui
 Subject: Re: [PATCH net-next 1/1] hv_netvsc: Wait for sub-channels to be
 processed during probe
 
 K. Y. Srinivasan k...@microsoft.com writes:
 
  The current code returns from probe without waiting for the proper
 handling
  of subchannels that may be requested. If the netvsc driver were to be
 rapidly
  loaded/unloaded, we can  trigger a panic as the unload will be tearing
  down state that may not have been fully setup yet. We fix this issue by
 making
  sure that we return from the probe call only after ensuring that the
  sub-channel offers in flight are properly handled.
 
  Signed-off-by: K. Y. Srinivasan k...@microsoft.com
  Reviewed-and-tested-by: Haiyang Zhang haiya...@microsoft.com
  ---
   drivers/net/hyperv/hyperv_net.h   |2 ++
   drivers/net/hyperv/rndis_filter.c |   25 +
   2 files changed, 27 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/net/hyperv/hyperv_net.h
 b/drivers/net/hyperv/hyperv_net.h
  index 26cd14c..925b75d 100644
  --- a/drivers/net/hyperv/hyperv_net.h
  +++ b/drivers/net/hyperv/hyperv_net.h
  @@ -671,6 +671,8 @@ struct netvsc_device {
 u32 send_table[VRSS_SEND_TAB_SIZE];
 u32 max_chn;
 u32 num_chn;
  +  spinlock_t sc_lock; /* Protects num_sc_offered variable */
  +  u32 num_sc_offered;
 atomic_t queue_sends[NR_CPUS];
 
 /* Holds rndis device info */
  diff --git a/drivers/net/hyperv/rndis_filter.c
 b/drivers/net/hyperv/rndis_filter.c
  index 2e40417..2e09f3f 100644
  --- a/drivers/net/hyperv/rndis_filter.c
  +++ b/drivers/net/hyperv/rndis_filter.c
  @@ -984,9 +984,16 @@ static void netvsc_sc_open(struct vmbus_channel
 *new_sc)
 struct netvsc_device *nvscdev;
 u16 chn_index = new_sc-offermsg.offer.sub_channel_index;
 int ret;
  +  unsigned long flags;
 
 nvscdev = hv_get_drvdata(new_sc-primary_channel-device_obj);
 
  +  spin_lock_irqsave(nvscdev-sc_lock, flags);
  +  nvscdev-num_sc_offered--;
  +  spin_unlock_irqrestore(nvscdev-sc_lock, flags);
  +  if (nvscdev-num_sc_offered == 0)
  +  complete(nvscdev-channel_init_wait);
  +
 if (chn_index = nvscdev-num_chn)
 return;
 
  @@ -1015,8 +1022,10 @@ int rndis_filter_device_add(struct hv_device
 *dev,
 u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
 u32 mtu, size;
 u32 num_rss_qs;
  +  u32 sc_delta;
 const struct cpumask *node_cpu_mask;
 u32 num_possible_rss_qs;
  +  unsigned long flags;
 
 rndis_device = get_rndis_device();
 if (!rndis_device)
  @@ -1039,6 +1048,8 @@ int rndis_filter_device_add(struct hv_device
 *dev,
 net_device-max_chn = 1;
 net_device-num_chn = 1;
 
  +  spin_lock_init(net_device-sc_lock);
  +
 net_device-extension = rndis_device;
 rndis_device-net_dev = net_device;
 
  @@ -1116,6 +1127,9 @@ int rndis_filter_device_add(struct hv_device
 *dev,
 num_possible_rss_qs = cpumask_weight(node_cpu_mask);
 net_device-num_chn = min(num_possible_rss_qs, num_rss_qs);
 
  +  num_rss_qs = net_device-num_chn - 1;
  +  net_device-num_sc_offered = num_rss_qs;
  +
 if (net_device-num_chn == 1)
 goto out;
 
  @@ -1157,11 +1171,22 @@ int rndis_filter_device_add(struct hv_device
 *dev,
 
 ret = rndis_filter_set_rss_param(rndis_device, net_device-
 num_chn);
 
  +  /*
  +   * Wait for the host to send us the sub-channel offers.
  +   */
  +  spin_lock_irqsave(net_device-sc_lock, flags);
  +  sc_delta = net_device-num_chn - 1 - num_rss_qs;
  +  net_device-num_sc_offered -= sc_delta;
  +  spin_unlock_irqrestore(net_device-sc_lock, flags);
  +
  +  if (net_device-num_sc_offered != 0)
  +  wait_for_completion(net_device-channel_init_wait);
 
 I'd suggest we add an essentian timeout (big, let's say 30 sec.)
 here. In case something goes wrong we don't really want to hang the
 whole kernel for forever. Such bugs are hard to debug as if a 'kernel
 hangs' is reported we can't be sure which wait caused it. We can even
 have something like:
 
  t = wait_for_completion_timeout(net_device-channel_init_wait, 30*HZ);
  BUG_ON(t == 0);
 
 This is much better as we'll be sure what went wrong. (I know other
 pieces of hyper-v code use wait_for_completion() without a timeout, this
 is rather a general suggestion for all of them).

 There is some history here. Initially, I had timeout for calls where we could 
 reasonably
 rollback state if we timed out. Some calls were subsequently changed to 
 unconditional
 wait because under some load conditions, these timeouts would trigger 
 (granted I did not
 have 30 second timeout; it was a 5 second timeout).

 Greg was opposed to calls to BUG_ON() in general for drivers

Re: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device shutdown

2015-07-14 Thread Vitaly Kuznetsov
Dexuan Cui de...@microsoft.com writes:

 -Original Message-
 From: Vitaly Kuznetsov
 Sent: Monday, July 13, 2015 20:19
 Subject: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on 
 device
 shutdown
 
 When a new subchannel offer from host comes during device shutdown (e.g.
 when a netvsc/storvsc module is unloadedshortly after it was loaded) a
 crash can happen as vmbus_process_offer() is not anyhow serialized with
 vmbus_remove().

 How about vmbus_onoffer_rescind()? 
 It's not serialized with vmbus_remove() either, so I think there is an issue 
 too?

 I remember when 'rmmod hv_netvsc', we get a rescind-offer message for
 each subchannel.


True, I think we have a race with rescind messages as well, we just
never saw crashes for some reason. I'll think how we can make the fix
more general.

 As an example we can try calling subchannel create
 callback when the module is already unloaded.
 The following crash was observed while keeping loading/unloading hv_netvsc
 module on 64-CPU guest:
 
   hv_netvsc vmbus_14: net device safe to remove
   BUG: unable to handle kernel paging request at a118
   IP: [a01c1b21] netvsc_sc_open+0x31/0xb0 [hv_netvsc]
   PGD 1f3946a067 PUD 1f38a5f067 PMD 0
   Oops:  [#1] SMP
   ...
   Call Trace:
   [a0055cd7] vmbus_onoffer+0x477/0x530 [hv_vmbus]
   [81092e1f] ? move_linked_works+0x5f/0x80
   [a0055fd3] vmbus_onmessage+0x33/0xa0 [hv_vmbus]
   [a0052f81] vmbus_onmessage_work+0x21/0x30 [hv_vmbus]
   [81095cde] process_one_work+0x18e/0x4e0
   ...
 
 The issue cannot be solved by just resetting sc_creation_callback on
 driver removal as while we search for the parent channel with channel_lock
 held we release it after the channel was found and it can disapper beneath
 our feet while we're still in vmbus_process_offer();
 
 Introduce new sc_create_lock mutex and take it in vmbus_remove() to ensure
 no new subchannels are created after we started the removal procedure.
 Check its state with mutex_trylock in vmbus_process_offer().

 In my 8-CPU VM, I can very easily reproduce the panic by
 1. running
   while ((1)); do modprobe -r  hv_netvsc; modprobe hv_netvsc; sleep 10; done.

 and
 2. in vmbus_onoffer_rescind(), we sleep 3s after a subchannel is added into
 the primary channel's sc_list (and before the sc_creation_callback is 
 invoked):
 (I added line 275)

 262 if (!fnew) {
 263 /*
 264  * Check to see if this is a sub-channel.
 265  */
 266 if (newchannel-offermsg.offer.sub_channel_index != 0) {
 267 /*
 268  * Process the sub-channel.
 269  */
 270 newchannel-primary_channel = channel;
 271 spin_lock_irqsave(channel-lock, flags);
 272 list_add_tail(newchannel-sc_list, 
 channel-sc_list);
 273 channel-num_sc++;
 274 spin_unlock_irqrestore(channel-lock, flags);
 275 ssleep(3);
 276 } else
 277 goto err_free_chan;
 278 }

It is possible to see crashes even without such intrumentation, move
CPUs and slower host will do the job.

Thanks,

-- 
  Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


<    2   3   4   5   6   7   8   9   10   >