[PATCH] x86/hyperv: suppress "PCI: Fatal: No config space access function found"

2018-09-18 Thread Dexuan Cui


A Generatin-2 Linux VM on Hyper-V doesn't have the legacy PCI bus, and
users always see the scary warning, which is actually harmless. The patch
is made to suppress the warning.

Signed-off-by: Dexuan Cui 
Cc: K. Y. Srinivasan 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
---
 arch/x86/hyperv/hv_init.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 20c876c..7abb09e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -17,6 +17,7 @@
  *
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -253,6 +254,22 @@ static int hv_cpu_die(unsigned int cpu)
return 0;
 }
 
+static int __init hv_pci_init(void)
+{
+   int gen2vm = efi_enabled(EFI_BOOT);
+
+   /*
+* For Generation-2 VM, we exit from pci_arch_init() by returning 0.
+* The purpose is to suppress the harmless warning:
+* "PCI: Fatal: No config space access function found"
+*/
+   if (gen2vm)
+   return 0;
+
+   /* For Generation-1 VM, we'll proceed in pci_arch_init().  */
+   return 1;
+}
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -329,6 +346,8 @@ void __init hyperv_init(void)
 
hv_apic_init();
 
+   x86_init.pci.arch_init = hv_pci_init;
+
/*
 * Register Hyper-V specific clocksource.
 */
-- 
2.7.4

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 3/5] Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up

2018-10-17 Thread Dexuan Cui
> From: devel  On Behalf Of
> KY Srinivasan
> Sent: Tuesday, October 16, 2018 23:02
> > > --- a/drivers/hv/hv_kvp.c
> > > +++ b/drivers/hv/hv_kvp.c
> > > @@ -353,6 +353,9 @@ static void process_ib_ipinfo(void *in_msg, void
> > *out_msg, int op)
> > >
> > >   out->body.kvp_ip_val.dhcp_enabled = in-
> > >kvp_ip_val.dhcp_enabled;
> > >
> > > + __attribute__ ((fallthrough));
> >
> > The comment should be sufficient for this, right?  I haven't seen many
> > uses of this attribute before, how common is it?
> 
> Yes; a common should be sufficient.

You all are right. 
Right now, I realized the gcc warning can also be suppressed by a simple line
of comment:

/* fallthrough */

It looks gcc treats this comment specially. 

If I add something more in the comment, like
/* add fallthrough */
, the warning can not be suppressed. :-)

I'll do a new version for KY.

Thank you all!

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 3/5] Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up

2018-10-17 Thread Dexuan Cui
> From: devel  On Behalf Of
> Greg KH
> Sent: Tuesday, October 16, 2018 22:07
> > ...
> > +   case KVP_OP_GET:
> > +   message->body.kvp_get.data.key_size =
> > +   utf16s_to_utf8s(
> > +   (wchar_t *)in_msg->body.kvp_get.data.key,
> > +   in_msg->body.kvp_get.data.key_size,
> > +   UTF16_LITTLE_ENDIAN,
> > +   message->body.kvp_get.data.key,
> > +   HV_KVP_EXCHANGE_MAX_KEY_SIZE - 1) + 1;
> 
> Worst indentation ever :(
> 
> Yeah, I know it follows the others above it, but you should reconsider
> it in the future...
> greg k-h

Agreed. We should consider improving this in future.

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 5/5] Tools: hv: kvp: Fix a warning of buffer overflow with gcc 8.0.1

2018-10-17 Thread Dexuan Cui
> From: devel  On Behalf Of
> Greg KH
> Sent: Tuesday, October 16, 2018 22:07
> 
> On Wed, Oct 17, 2018 at 03:14:06AM +, k...@linuxonhyperv.com wrote:
> > From: Dexuan Cui 
> >
> > The patch fixes:
> >
> > hv_kvp_daemon.c: In function 'kvp_set_ip_info':
> > hv_kvp_daemon.c:1305:2: note: 'snprintf' output between 41 and 4136 bytes
> > into a destination of size 4096
> > ...
> > --- a/tools/hv/hv_kvp_daemon.c
> > +++ b/tools/hv/hv_kvp_daemon.c
> > @@ -1176,7 +1176,7 @@ static int kvp_set_ip_info(char *if_name, struct
> hv_kvp_ipaddr_value *new_val)
> > int error = 0;
> > char if_file[PATH_MAX];
> > FILE *file;
> > -   char cmd[PATH_MAX];
> > +   char cmd[PATH_MAX * 2];
> 
> Overkill?

Well, I found the -Wformat-truncation warning can be suppressed by checking
the return value of snprintf, so I'm going to do a new patch for KY to resend 
like this:

@@ -1178,6 +1178,7 @@ static int kvp_set_ip_info(char *if_name, struct 
hv_kvp_ipaddr_value *new_val)
FILE *file;
char cmd[PATH_MAX];
char *mac_addr;
+   int str_len;

/*
 * Set the configuration for the specified interface with
@@ -1301,8 +1302,19 @@ static int kvp_set_ip_info(char *if_name, struct 
hv_kvp_ipaddr_value *new_val)
 * invoke the external script to do its magic.
 */

-   snprintf(cmd, sizeof(cmd), KVP_SCRIPTS_PATH "%s %s",
-"hv_set_ifconfig", if_file);
+   str_len = snprintf(cmd, sizeof(cmd), KVP_SCRIPTS_PATH "%s %s",
+"hv_set_ifconfig", if_file);
+
+   /*
+* This is a little overcautious, but it's necessary to suppress some
+* false warnings from gcc 8.0.1.
+*/
+   if (str_len <= 0 || (unsigned int)str_len >= sizeof(cmd)) {
+   syslog(LOG_ERR, "Cmd '%s' (len=%d) may be too long",
+  cmd, str_len);
+   return HV_E_FAIL;
+   }


Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up

2018-10-31 Thread Dexuan Cui
> From: Michael Kelley 
> Sent: Wednesday, October 24, 2018 08:38
> From: k...@linuxonhyperv.com   Sent: Wednesday,
> October 17, 2018 10:10 PM
> > From: Dexuan Cui 
> >
> > In kvp_send_key(), we do need call process_ib_ipinfo() if
> > message->kvp_hdr.operation is KVP_OP_GET_IP_INFO, because it turns out
> > the userland hv_kvp_daemon needs the info of operation, adapter_id and
> > addr_family. With the incorrect fc62c3b1977d, the host can't get the
> > VM's IP via KVP.
> >
> > And, fc62c3b1977d added a "break;", but actually forgot to initialize
> > the key_size/value in the case of KVP_OP_SET, so the default key_size of
> > 0 is passed to the kvp daemon, and the pool files
> > /var/lib/hyperv/.kvp_pool_* can't be updated.
> >
> > This patch effectively rolls back the previous fc62c3b1977d, and
> > correctly fixes the "this statement may fall through" warnings.
> >
> > This patch is tested on WS 2012 R2 and 2016.
> >
> > Fixes: fc62c3b1977d ("Drivers: hv: kvp: Fix two "this statement may fall
> through" warnings")
> > Signed-off-by: Dexuan Cui 
> > Cc: K. Y. Srinivasan 
> > Cc: Haiyang Zhang 
> > Cc: Stephen Hemminger 
> > Cc: 
> > Signed-off-by: K. Y. Srinivasan 
> > ---
> >  drivers/hv/hv_kvp.c | 26 ++
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> >
> Reviewed-by: Michael Kelley 

Hi Greg,
Can you please take a look at this patch?

We need it to fix a regression introduced by me in:
Fixes: fc62c3b1977d ("Drivers: hv: kvp: Fix two "this statement may fall 
through" warnings")

The faulty patch is being meged into the old stable kernels... So we need to
take this patch ASAP. Thanks!

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up

2018-11-01 Thread Dexuan Cui
> From: gre...@linuxfoundation.org 
> Sent: Thursday, November 1, 2018 11:57
> To: Dexuan Cui 
> 
> On Wed, Oct 31, 2018 at 11:23:54PM +0000, Dexuan Cui wrote:
> > > From: Michael Kelley 
> > > Sent: Wednesday, October 24, 2018 08:38
> > > From: k...@linuxonhyperv.com   Sent:
> Wednesday,
> > > October 17, 2018 10:10 PM
> > > > From: Dexuan Cui 
> > > >
> > > > In kvp_send_key(), we do need call process_ib_ipinfo() if
> > > > message->kvp_hdr.operation is KVP_OP_GET_IP_INFO, because it turns
> out
> > > > the userland hv_kvp_daemon needs the info of operation, adapter_id
> and
> > > > addr_family. With the incorrect fc62c3b1977d, the host can't get the
> > > > VM's IP via KVP.
> > > >
> > > > And, fc62c3b1977d added a "break;", but actually forgot to initialize
> > > > the key_size/value in the case of KVP_OP_SET, so the default key_size of
> > > > 0 is passed to the kvp daemon, and the pool files
> > > > /var/lib/hyperv/.kvp_pool_* can't be updated.
> > > >
> > > > This patch effectively rolls back the previous fc62c3b1977d, and
> > > > correctly fixes the "this statement may fall through" warnings.
> > > >
> > > > This patch is tested on WS 2012 R2 and 2016.
> > > >
> > > > Fixes: fc62c3b1977d ("Drivers: hv: kvp: Fix two "this statement may fall
> > > through" warnings")
> > > > Signed-off-by: Dexuan Cui 
> > > > Cc: K. Y. Srinivasan 
> > > > Cc: Haiyang Zhang 
> > > > Cc: Stephen Hemminger 
> > > > Cc: 
> > > > Signed-off-by: K. Y. Srinivasan 
> > > > ---
> > > >  drivers/hv/hv_kvp.c | 26 ++
> > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > >
> > > Reviewed-by: Michael Kelley 
> >
> > Hi Greg,
> > Can you please take a look at this patch?
> 
> Nope, I'm not the hv maintainer, they need to look at this and ack it,
> not me :)
> 
> greg k-h

Hi Greg,
KY has added his Signed-off-by in the mail.

I'll ask the other HV maintainers to take a look as well.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up

2018-11-10 Thread Dexuan Cui
> From: gre...@linuxfoundation.org 
> Sent: Thursday, November 1, 2018 21:54
> To: Dexuan Cui 
> Cc: Michael Kelley ; KY Srinivasan
> ; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; Stephen Hemminger ;
> vkuznets ; Sasha Levin
> ; Haiyang Zhang ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH V2 3/5] Drivers: hv: kvp: Fix the recent regression caused
> by incorrect clean-up
> 
> On Thu, Nov 01, 2018 at 07:22:28PM +, Dexuan Cui wrote:
> > > From: gre...@linuxfoundation.org 
> > > Sent: Thursday, November 1, 2018 11:57
> > > To: Dexuan Cui 
> > >
> > > On Wed, Oct 31, 2018 at 11:23:54PM +, Dexuan Cui wrote:
> > > > > From: Michael Kelley 
> > > > > Sent: Wednesday, October 24, 2018 08:38
> > > > > From: k...@linuxonhyperv.com   Sent:
> > > Wednesday,
> > > > > October 17, 2018 10:10 PM
> > > > > > From: Dexuan Cui 
> > > > > >
> > > > > > In kvp_send_key(), we do need call process_ib_ipinfo() if
> > > > > > message->kvp_hdr.operation is KVP_OP_GET_IP_INFO, because it
> turns
> > > out
> > > > > > the userland hv_kvp_daemon needs the info of operation, adapter_id
> > > and
> > > > > > addr_family. With the incorrect fc62c3b1977d, the host can't get the
> > > > > > VM's IP via KVP.
> > > > > >
> > > > > > And, fc62c3b1977d added a "break;", but actually forgot to 
> > > > > > initialize
> > > > > > the key_size/value in the case of KVP_OP_SET, so the default 
> > > > > > key_size
> of
> > > > > > 0 is passed to the kvp daemon, and the pool files
> > > > > > /var/lib/hyperv/.kvp_pool_* can't be updated.
> > > > > >
> > > > > > This patch effectively rolls back the previous fc62c3b1977d, and
> > > > > > correctly fixes the "this statement may fall through" warnings.
> > > > > >
> > > > > > This patch is tested on WS 2012 R2 and 2016.
> > > > > >
> > > > > > Fixes: fc62c3b1977d ("Drivers: hv: kvp: Fix two "this statement may
> fall
> > > > > through" warnings")
> > > > > > Signed-off-by: Dexuan Cui 
> > > > > > Cc: K. Y. Srinivasan 
> > > > > > Cc: Haiyang Zhang 
> > > > > > Cc: Stephen Hemminger 
> > > > > > Cc: 
> > > > > > Signed-off-by: K. Y. Srinivasan 
> > > > > > ---
> > > > > >  drivers/hv/hv_kvp.c | 26 ++
> > > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > > >
> > > > > Reviewed-by: Michael Kelley 
> > > >
> > > > Hi Greg,
> > > > Can you please take a look at this patch?
> > >
> > > Nope, I'm not the hv maintainer, they need to look at this and ack it,
> > > not me :)
> > >
> > > greg k-h
> >
> > Hi Greg,
> > KY has added his Signed-off-by in the mail.
> >
> > I'll ask the other HV maintainers to take a look as well.
> 
> Ok, then I'll look at it after 4.20-rc1 is out, nothing I can do until
> then anyway...
> 
> thanks,
> 
> greg k-h

Hi Greg,
Can you please take a look at the patch now?

The patch has received

Reviewed-by: Michael Kelley 
Signed-off-by: Haiyang Zhang 
Signed-off-by: K. Y. Srinivasan 

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 2/2] Drivers: hv: vmbus: offload the handling of channels to two workqueues

2018-11-26 Thread Dexuan Cui
> From: devel  On Behalf Of
> Greg KH
> Sent: Monday, November 26, 2018 11:35 AM
> As Sasha pointed out, this patch does not even apply :(

Sorry, I'll rebase to char-misc's char-misc-testing branch, which has had one 
of the patches, i.e.

4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API 
vmbus_get_outgoing_channel()")

Thanks,
-- Dexuan
 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] [repost] Drivers: hv: vmbus: Offload the handling of channels to two workqueues

2018-11-28 Thread Dexuan Cui


vmbus_process_offer() mustn't call channel->sc_creation_callback()
directly for sub-channels, because sc_creation_callback() ->
vmbus_open() may never get the host's response to the
OPEN_CHANNEL message (the host may rescind a channel at any time,
e.g. in the case of hot removing a NIC), and vmbus_onoffer_rescind()
may not wake up the vmbus_open() as it's blocked due to a non-zero
vmbus_connection.offer_in_progress, and finally we have a deadlock.

The above is also true for primary channels, if the related device
drivers use sync probing mode by default.

And, usually the handling of primary channels and sub-channels can
depend on each other, so we should offload them to different
workqueues to avoid possible deadlock, e.g. in sync-probing mode,
NIC1's netvsc_subchan_work() can race with NIC2's netvsc_probe() ->
rtnl_lock(), and causes deadlock: the former gets the rtnl_lock
and waits for all the sub-channels to appear, but the latter
can't get the rtnl_lock and this blocks the handling of sub-channels.

The patch can fix the multiple-NIC deadlock described above for
v3.x kernels (e.g. RHEL 7.x) which don't support async-probing
of devices, and v4.4, v4.9, v4.14 and v4.18 which support async-probing
but don't enable async-probing for Hyper-V drivers (yet).

The patch can also fix the hang issue in sub-channel's handling described
above for all versions of kernels, including v4.19 and v4.20-rc4.

So, actually the patch should be applied to all the existing kernels,
not only the kernels that have 8195b1396ec8.

Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
Cc: sta...@vger.kernel.org
Cc: Stephen Hemminger 
Cc: K. Y. Srinivasan 
Cc: Haiyang Zhang 
Signed-off-by: Dexuan Cui 
Signed-off-by: K. Y. Srinivasan 
---

There is no change in this repost. I just rebased this patch to today's
char-misc's char-misc-next branch. Previously KY posted the patch with his
Signed-off-by (which is kept in this repost), but there was a conflict issue.

Note: the patch can't be cleanly applied to char-misc's char-misc-linus branch 
--
to do that, we need to cherry-pick the supporting patch first:
4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API 
vmbus_get_outgoing_channel()")

 drivers/hv/channel_mgmt.c | 188 +++---
 drivers/hv/connection.c   |  24 +-
 drivers/hv/hyperv_vmbus.h |   7 ++
 include/linux/hyperv.h|   7 ++
 4 files changed, 161 insertions(+), 65 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 82e6736..d016890 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -434,60 +434,16 @@ void vmbus_free_channels(void)
}
 }
 
-/*
- * vmbus_process_offer - Process the offer by creating a channel/device
- * associated with this offer
- */
-static void vmbus_process_offer(struct vmbus_channel *newchannel)
+/* Note: the function can run concurrently for primary/sub channels. */
+static void vmbus_add_channel_work(struct work_struct *work)
 {
-   struct vmbus_channel *channel;
-   bool fnew = true;
+   struct vmbus_channel *newchannel =
+   container_of(work, struct vmbus_channel, add_channel_work);
+   struct vmbus_channel *primary_channel = newchannel->primary_channel;
unsigned long flags;
u16 dev_type;
int ret;
 
-   /* Make sure this is a new offer */
-   mutex_lock(&vmbus_connection.channel_mutex);
-
-   /*
-* Now that we have acquired the channel_mutex,
-* we can release the potentially racing rescind thread.
-*/
-   atomic_dec(&vmbus_connection.offer_in_progress);
-
-   list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
-   if (!uuid_le_cmp(channel->offermsg.offer.if_type,
-   newchannel->offermsg.offer.if_type) &&
-   !uuid_le_cmp(channel->offermsg.offer.if_instance,
-   newchannel->offermsg.offer.if_instance)) {
-   fnew = false;
-   break;
-   }
-   }
-
-   if (fnew)
-   list_add_tail(&newchannel->listentry,
- &vmbus_connection.chn_list);
-
-   mutex_unlock(&vmbus_connection.channel_mutex);
-
-   if (!fnew) {
-   /*
-* Check to see if this is a sub-channel.
-*/
-   if (newchannel->offermsg.offer.sub_channel_index != 0) {
-   /*
-* Process the sub-channel.
-*/
-   newchannel->primary_channel = channel;
-   spin_lock_irqsave(&channel->lock, flags);
-   list_add_tail(&newchannel->sc_list, &channel->sc_list);
-   spin_unlo

RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of channels to two workqueues

2018-11-29 Thread Dexuan Cui
> From: gre...@linuxfoundation.org 
> Sent: Wednesday, November 28, 2018 11:45 PM
> >
> > There is no change in this repost. I just rebased this patch to today's
> > char-misc's char-misc-next branch. Previously KY posted the patch with his
> > Signed-off-by (which is kept in this repost), but there was a conflict 
> > issue.
> >
> > Note: the patch can't be cleanly applied to char-misc's char-misc-linus 
> > branch
> --
> > to do that, we need to cherry-pick the supporting patch first:
> > 4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API
> vmbus_get_outgoing_channel()")
> 
> That is not going to work for the obvious reason that this dependant
> patch is not going to be merged into 4.20-final.

It looks the dependent patch (4d3c5c69191f) is going to miss the v4.20 release. 
This is not a big issue, as the dependent patch isn't really important.
 
> So, what do you expect us to do here?  The only way this can be accepted
> is to have it go into my -next branch, which means it will show up in
> 4.21-rc1, is that ok?

Is there any chance for this patch ("Drivers: hv: vmbus: Offload the handling 
...") to
go into v4.20? 

If yes, I can quickly do a rebase to char-misc's char-misc-linus branch,
because actually the conflict can be very easily fixed. And I can help to fix 
any 
conflict when the dependent patch is backported to v4.20.1.

If no, I think this patch and the dependent patch can both go into v4.21, and 
they can be both backported to v4.20.1 in future.

> But then, if that happens, it will fail to apply to any stable tree for
> 4.20 and older, like you are asking it to be done for.
> 
> So what do you expect me to do here with this?
> 
> totally confused,
> 
> greg k-h

I hope my above reply made me clear. Sorry, I'm not really know how exactly
the releasing procedure works...

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of channels to two workqueues

2018-11-30 Thread Dexuan Cui
> From: KY Srinivasan 
> Sent: Friday, November 30, 2018 9:31 AM
> > From: Dexuan Cui 
> > Sent: Thursday, November 29, 2018 12:17 AM
> > To: gre...@linuxfoundation.org
> > Cc: KY Srinivasan ; Haiyang Zhang
> > ; Stephen Hemminger
> > ; linux-ker...@vger.kernel.org;
> > de...@linuxdriverproject.org; a...@canonical.com; vkuznets
> > ; o...@aepfle.de; jasow...@redhat.com; Michael
> > Kelley 
> > Subject: RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of
> > channels to two workqueues
> >
> > > From: gre...@linuxfoundation.org 
> > > Sent: Wednesday, November 28, 2018 11:45 PM
> > > >
> > > > There is no change in this repost. I just rebased this patch to today's
> > > > char-misc's char-misc-next branch. Previously KY posted the patch with
> > his
> > > > Signed-off-by (which is kept in this repost), but there was a conflict 
> > > > issue.
> > > >
> > > > Note: the patch can't be cleanly applied to char-misc's char-misc-linus
> > branch
> > > --
> > > > to do that, we need to cherry-pick the supporting patch first:
> > > > 4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API
> > > vmbus_get_outgoing_channel()")
> > >
> > > That is not going to work for the obvious reason that this dependant
> > > patch is not going to be merged into 4.20-final.
> >
> > It looks the dependent patch (4d3c5c69191f) is going to miss the v4.20
> > release.
> > This is not a big issue, as the dependent patch isn't really important.
> >
> > > So, what do you expect us to do here?  The only way this can be accepted
> > > is to have it go into my -next branch, which means it will show up in
> > > 4.21-rc1, is that ok?
> >
> > Is there any chance for this patch ("Drivers: hv: vmbus: Offload the 
> > handling
> > ...") to
> > go into v4.20?
> >
> > If yes, I can quickly do a rebase to char-misc's char-misc-linus branch,
> > because actually the conflict can be very easily fixed. And I can help to 
> > fix any
> > conflict when the dependent patch is backported to v4.20.1.
> 
> This patch fixes an important bug while the patch this depends on is not
> critical.
> I suggest we revert the patch that this patch depends on
> and we can submit a new version of this patch that can go in now - into 4.20
> release.
> 
> K. Y

I agree.

Hi Greg,
Please let us know what we can do to try to push this important fix into v4.20.

Actually it's straightforward, though it looks big. And, we ave done a full 
testing
with the patch.

Thanks,
--Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of channels to two workqueues

2018-12-02 Thread Dexuan Cui
> From: gre...@linuxfoundation.org 
> Sent: Saturday, December 1, 2018 11:34 PM
> To: Dexuan Cui 
> Cc: KY Srinivasan ; Haiyang Zhang
> ; Stephen Hemminger
> ; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; a...@canonical.com; vkuznets
> ; o...@aepfle.de; jasow...@redhat.com; Michael
> Kelley 
> Subject: Re: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of
> channels to two workqueues
> 
> On Fri, Nov 30, 2018 at 06:09:54PM +, Dexuan Cui wrote:
> > > From: KY Srinivasan 
> > > Sent: Friday, November 30, 2018 9:31 AM
> > > > From: Dexuan Cui 
> > > > Sent: Thursday, November 29, 2018 12:17 AM
> > > > To: gre...@linuxfoundation.org
> > > > Cc: KY Srinivasan ; Haiyang Zhang
> > > > ; Stephen Hemminger
> > > > ; linux-ker...@vger.kernel.org;
> > > > de...@linuxdriverproject.org; a...@canonical.com; vkuznets
> > > > ; o...@aepfle.de; jasow...@redhat.com;
> Michael
> > > > Kelley 
> > > > Subject: RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling 
> > > > of
> > > > channels to two workqueues
> > > >
> > > > > From: gre...@linuxfoundation.org 
> > > > > Sent: Wednesday, November 28, 2018 11:45 PM
> > > > > >
> > > > > > There is no change in this repost. I just rebased this patch to 
> > > > > > today's
> > > > > > char-misc's char-misc-next branch. Previously KY posted the patch
> with
> > > > his
> > > > > > Signed-off-by (which is kept in this repost), but there was a 
> > > > > > conflict
> issue.
> > > > > >
> > > > > > Note: the patch can't be cleanly applied to char-misc's
> char-misc-linus
> > > > branch
> > > > > --
> > > > > > to do that, we need to cherry-pick the supporting patch first:
> > > > > > 4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API
> > > > > vmbus_get_outgoing_channel()")
> > > > >
> > > > > That is not going to work for the obvious reason that this dependant
> > > > > patch is not going to be merged into 4.20-final.
> > > >
> > > > It looks the dependent patch (4d3c5c69191f) is going to miss the v4.20
> > > > release.
> > > > This is not a big issue, as the dependent patch isn't really important.
> > > >
> > > > > So, what do you expect us to do here?  The only way this can be
> accepted
> > > > > is to have it go into my -next branch, which means it will show up in
> > > > > 4.21-rc1, is that ok?
> > > >
> > > > Is there any chance for this patch ("Drivers: hv: vmbus: Offload the
> handling
> > > > ...") to
> > > > go into v4.20?
> > > >
> > > > If yes, I can quickly do a rebase to char-misc's char-misc-linus branch,
> > > > because actually the conflict can be very easily fixed. And I can help 
> > > > to
> fix any
> > > > conflict when the dependent patch is backported to v4.20.1.
> > >
> > > This patch fixes an important bug while the patch this depends on is not
> > > critical.
> > > I suggest we revert the patch that this patch depends on
> > > and we can submit a new version of this patch that can go in now - into
> 4.20
> > > release.
> > >
> > > K. Y
> >
> > I agree.
> >
> > Hi Greg,
> > Please let us know what we can do to try to push this important fix into
> v4.20.
> >
> > Actually it's straightforward, though it looks big. And, we ave done a full
> testing
> > with the patch.
> 
> Ok, you all need to figure this out on your own, sorry.  Please give me
> a patch that I can actually apply to the tree if you need it merged into
> 4.20-final.  This shouldn't be tough, you all have been doing this long
> enough by now...
> 
> I have no bandwidth to do this myself for you,
> 
> greg k-h

Hi Greg,
Please use the attached patch: I rebased the patch to today's char-misc's
char-misc-linus branch. It can also cleanly apply to Linus's master branch
today.

Please let me know in case you need me to re-post the patch in a new mail
(rather than the attachment here) or you need me to rebase the patch to
a different tree/branch.

Thanks,
Dexuan



0001-Drivers-hv-vmbus-Offload-the-handling-of-channels-to.patch
Description: 0001-Drivers-hv-vmbus-Offload-the-handling-of-channels-to.patch
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of channels to two workqueues

2018-12-02 Thread Dexuan Cui
> From: gre...@linuxfoundation.org 
> Sent: Sunday, December 2, 2018 7:33 AM
> To: Dexuan Cui 
> Cc: KY Srinivasan ; Haiyang Zhang
> ; Stephen Hemminger
> ; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; a...@canonical.com; vkuznets
> ; o...@aepfle.de; jasow...@redhat.com; Michael
> Kelley 
> Subject: Re: [PATCH] [repost] Drivers: hv: vmbus: Offload the handling of
> channels to two workqueues
> 
> On Sun, Dec 02, 2018 at 08:47:03AM +, Dexuan Cui wrote:
> > Hi Greg,
> > Please use the attached patch: I rebased the patch to today's char-misc's
> > char-misc-linus branch. It can also cleanly apply to Linus's master branch
> > today.
> 
> I can't use an attached patch, you know better.  Please fix up and
> resend properly.
> 
> greg k-h

Ok, let me re-send it right now.

BTW, Linus's tree was updated to v4.20-rc5 98 minutes ago. This patch can
still cleanly apply there, and on your char-misc-linus branch.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] [REPOST for the char-misc-linus branch] Drivers: hv: vmbus: Offload the handling of channels to two workqueues

2018-12-02 Thread Dexuan Cui


vmbus_process_offer() mustn't call channel->sc_creation_callback()
directly for sub-channels, because sc_creation_callback() ->
vmbus_open() may never get the host's response to the
OPEN_CHANNEL message (the host may rescind a channel at any time,
e.g. in the case of hot removing a NIC), and vmbus_onoffer_rescind()
may not wake up the vmbus_open() as it's blocked due to a non-zero
vmbus_connection.offer_in_progress, and finally we have a deadlock.

The above is also true for primary channels, if the related device
drivers use sync probing mode by default.

And, usually the handling of primary channels and sub-channels can
depend on each other, so we should offload them to different
workqueues to avoid possible deadlock, e.g. in sync-probing mode,
NIC1's netvsc_subchan_work() can race with NIC2's netvsc_probe() ->
rtnl_lock(), and causes deadlock: the former gets the rtnl_lock
and waits for all the sub-channels to appear, but the latter
can't get the rtnl_lock and this blocks the handling of sub-channels.

The patch can fix the multiple-NIC deadlock described above for
v3.x kernels (e.g. RHEL 7.x) which don't support async-probing
of devices, and v4.4, v4.9, v4.14 and v4.18 which support async-probing
but don't enable async-probing for Hyper-V drivers (yet).

The patch can also fix the hang issue in sub-channel's handling described
above for all versions of kernels, including v4.19 and v4.20-rc4.

So actually the patch should be applied to all the existing kernels,
not only the kernels that have 8195b1396ec8.

Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
Cc: sta...@vger.kernel.org
Cc: Stephen Hemminger 
Cc: K. Y. Srinivasan 
Cc: Haiyang Zhang 
Signed-off-by: Dexuan Cui 
Signed-off-by: K. Y. Srinivasan 
---

[Dec. 2, 2018] Dexuan: I rebased this patch to today's char-misc's
char-misc-linus branch, whose top commit is:
d8f190ee836a ("Merge branch 'akpm' (patches from Andrew)")

Previously KY posted the patch with his Signed-off-by (which is kept in this
repost), but there was a conflict issue.

We have done a full testing with the patch.
We hope the important patch can be in the final v4.20. Thanks!

 drivers/hv/channel_mgmt.c | 189 ++
 drivers/hv/connection.c   |  24 +-
 drivers/hv/hyperv_vmbus.h |   7 ++
 include/linux/hyperv.h|   7 ++
 4 files changed, 161 insertions(+), 66 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 6277597d..edd34c1 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -435,61 +435,16 @@ void vmbus_free_channels(void)
}
 }
 
-/*
- * vmbus_process_offer - Process the offer by creating a channel/device
- * associated with this offer
- */
-static void vmbus_process_offer(struct vmbus_channel *newchannel)
+/* Note: the function can run concurrently for primary/sub channels. */
+static void vmbus_add_channel_work(struct work_struct *work)
 {
-   struct vmbus_channel *channel;
-   bool fnew = true;
+   struct vmbus_channel *newchannel =
+   container_of(work, struct vmbus_channel, add_channel_work);
+   struct vmbus_channel *primary_channel = newchannel->primary_channel;
unsigned long flags;
u16 dev_type;
int ret;
 
-   /* Make sure this is a new offer */
-   mutex_lock(&vmbus_connection.channel_mutex);
-
-   /*
-* Now that we have acquired the channel_mutex,
-* we can release the potentially racing rescind thread.
-*/
-   atomic_dec(&vmbus_connection.offer_in_progress);
-
-   list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
-   if (!uuid_le_cmp(channel->offermsg.offer.if_type,
-   newchannel->offermsg.offer.if_type) &&
-   !uuid_le_cmp(channel->offermsg.offer.if_instance,
-   newchannel->offermsg.offer.if_instance)) {
-   fnew = false;
-   break;
-   }
-   }
-
-   if (fnew)
-   list_add_tail(&newchannel->listentry,
- &vmbus_connection.chn_list);
-
-   mutex_unlock(&vmbus_connection.channel_mutex);
-
-   if (!fnew) {
-   /*
-* Check to see if this is a sub-channel.
-*/
-   if (newchannel->offermsg.offer.sub_channel_index != 0) {
-   /*
-* Process the sub-channel.
-*/
-   newchannel->primary_channel = channel;
-   spin_lock_irqsave(&channel->lock, flags);
-   list_add_tail(&newchannel->sc_list, &channel->sc_list);
-   channel->num_sc++;
-   spin_unlock_irqre

RE: [PATCH] vmbus: fix subchannel removal

2018-12-09 Thread Dexuan Cui
> From: Stephen Hemminger 
> Sent: Friday, December 7, 2018 10:59 AM
> To: KY Srinivasan ; Haiyang Zhang
> ; Dexuan Cui ;
> mga...@redhat.com
> Cc: de...@linuxdriverproject.org; Stephen Hemminger
> 
> Subject: [PATCH] vmbus: fix subchannel removal
> 
> The changes to split ring allocation from open/close, broke
> the cleanup of subchannels. This resulted in problems using
> uio on network devices because the subchannel was left behind
> when the network device was unbound.
> 
> The cause was in the disconnect logic which used list splice
> to move the subchannel list into a local variable. This won't
> work because the subchannel list is needed later during the
> process of the rescind messages (relid2channel).
> 
> The fix is to just leave the subchannel list in place
> which is what the original code did. The list is cleaned
> up later when the host rescind is processed.
> 
> Fixes: ae6935ed7d42 ("vmbus: split ring buffer allocation from open")
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/hv/channel.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index fe00b12e4417..bea4c9850247 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -701,20 +701,12 @@ static int vmbus_close_internal(struct
> vmbus_channel *channel)
>  int vmbus_disconnect_ring(struct vmbus_channel *channel)
>  {
>   struct vmbus_channel *cur_channel, *tmp;
> - unsigned long flags;
> - LIST_HEAD(list);
>   int ret;
> 
>   if (channel->primary_channel != NULL)
>   return -EINVAL;
> 
> - /* Snapshot the list of subchannels */
> - spin_lock_irqsave(&channel->lock, flags);
> - list_splice_init(&channel->sc_list, &list);
> - channel->num_sc = 0;
> - spin_unlock_irqrestore(&channel->lock, flags);
> -
> - list_for_each_entry_safe(cur_channel, tmp, &list, sc_list) {
> +     list_for_each_entry_safe(cur_channel, tmp, &channel->sc_list, sc_list) {
>   if (cur_channel->rescind)
>   wait_for_completion(&cur_channel->rescind_event);
> 

Reviewed-by: Dexuan Cui 

4.20-rc6 is out now. I hope this fix still has a chance to go in v4.20-final.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels

2018-12-13 Thread Dexuan Cui


Before 98f4c651762c, we returned zeros for unopened channels.
With 98f4c651762c, we started to return random on-stack values.

We'd better return -EINVAL instead.

Fixes: 98f4c651762c ("hv: move ringbuffer bus attributes to dev_groups")
Cc: sta...@vger.kernel.org
Cc: K. Y. Srinivasan 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Signed-off-by: Dexuan Cui 
---
 drivers/hv/vmbus_drv.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 283d184..d0ff656 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -316,6 +316,8 @@ static ssize_t out_intr_mask_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound, &outbound);
return sprintf(buf, "%d\n", outbound.current_interrupt_mask);
 }
@@ -329,6 +331,8 @@ static ssize_t out_read_index_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound, &outbound);
return sprintf(buf, "%d\n", outbound.current_read_index);
 }
@@ -343,6 +347,8 @@ static ssize_t out_write_index_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound, &outbound);
return sprintf(buf, "%d\n", outbound.current_write_index);
 }
@@ -357,6 +363,8 @@ static ssize_t out_read_bytes_avail_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound, &outbound);
return sprintf(buf, "%d\n", outbound.bytes_avail_toread);
 }
@@ -371,6 +379,8 @@ static ssize_t out_write_bytes_avail_show(struct device 
*dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound, &outbound);
return sprintf(buf, "%d\n", outbound.bytes_avail_towrite);
 }
@@ -384,6 +394,8 @@ static ssize_t in_intr_mask_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
return sprintf(buf, "%d\n", inbound.current_interrupt_mask);
 }
@@ -397,6 +409,8 @@ static ssize_t in_read_index_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
return sprintf(buf, "%d\n", inbound.current_read_index);
 }
@@ -410,6 +424,8 @@ static ssize_t in_write_index_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
return sprintf(buf, "%d\n", inbound.current_write_index);
 }
@@ -424,6 +440,8 @@ static ssize_t in_read_bytes_avail_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
return sprintf(buf, "%d\n", inbound.bytes_avail_toread);
 }
@@ -438,6 +456,8 @@ static ssize_t in_write_bytes_avail_show(struct device *dev,
 
if (!hv_dev->channel)
return -ENODEV;
+   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
+   return -EINVAL;
hv_ringbuffer_get_debuginfo(&hv_dev->channel->inbound, &inbound);
return sprintf(buf, "%d\n", inbound.bytes_avail_towrite);
 }
-- 
2.7.4

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels

2018-12-17 Thread Dexuan Cui
> From: Stephen Hemminger 
> On Thu, 13 Dec 2018 16:35:43 +0000
> Dexuan Cui  wrote:
> 
> > Before 98f4c651762c, we returned zeros for unopened channels.
> > With 98f4c651762c, we started to return random on-stack values.
> >
> > We'd better return -EINVAL instead.
> 
> The concept looks fine, but maybe it would be simpler to move it into
> hv_ringbuffer_get_debuginfo and have it return an error code.
> 
> Since so much of the code is repeated, I would probably make a
> macro which generates the code as well.
> 
> Something like this:

Thanks, Stephen! Now the patch has been in char-misc's char-misc-linus
branch, so IMO we may as well leave it as is (considering the code here is
unlikely to be frqeuencly changed), and we have a smaller patch this way. :-)

But, yes, I agree with you that generally we should make a common
function to avoid duplicate code.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels

2018-12-17 Thread Dexuan Cui
> From: Stephen Hemminger 
> Sent: Monday, December 17, 2018 10:17 AM
> To: Dexuan Cui 
> 
> On Mon, 17 Dec 2018 18:00:29 +0000
> Dexuan Cui  wrote:
> 
> > > From: Stephen Hemminger 
> > > On Thu, 13 Dec 2018 16:35:43 +
> > > Dexuan Cui  wrote:
> > >
> > > > Before 98f4c651762c, we returned zeros for unopened channels.
> > > > With 98f4c651762c, we started to return random on-stack values.
> > > >
> > > > We'd better return -EINVAL instead.
> > >
> > > The concept looks fine, but maybe it would be simpler to move it into
> > > hv_ringbuffer_get_debuginfo and have it return an error code.
> > >
> > > Since so much of the code is repeated, I would probably make a
> > > macro which generates the code as well.
> > >
> > > Something like this:
> >
> > Thanks, Stephen! Now the patch has been in char-misc's char-misc-linus
> > branch, so IMO we may as well leave it as is (considering the code here is
> > unlikely to be frqeuencly changed), and we have a smaller patch this way. 
> > :-)
> >
> > But, yes, I agree with you that generally we should make a common
> > function to avoid duplicate code.
> >
> > Thanks,
> > -- Dexuan
> 
> The old code was risky because it would silently return stack garbage.
> Having an error check in get_debuginfo would eliminate that.

OK, then let me make another patch based on the latest char-misc-linus.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels

2018-12-17 Thread Dexuan Cui
> From: devel  On Behalf Of
> Dexuan Cui
> Sent: Monday, December 17, 2018 10:31 AM
> > From: Stephen Hemminger 
> >
> > The old code was risky because it would silently return stack garbage.
> > Having an error check in get_debuginfo would eliminate that.
> 
> OK, then let me make another patch based on the latest char-misc-linus.
> 
> -- Dexuan

Hi Stephen, your patch can apply cleanly. Let me rebase your patch to
char-misc-linus, do a test, and then post it with your Signed-off-by and mine: 
I assume you're Ok with this. Please let me know in case it's not. :-)

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] Drivers: hv: vmbus: Check for ring when getting debug info

2018-12-17 Thread Dexuan Cui


fc96df16a1ce is good and can already fix the "return stack garbage" issue,
but let's also improve hv_ringbuffer_get_debuginfo(), which would silently
return stack garbage, if people forget to check channel->state or
ring_info->ring_buffer, when using the function in the future.

Having an error check in the function would eliminate the potential risk.

Add a Fixes tag to indicate the patch depdendency.

Fixes: fc96df16a1ce ("Drivers: hv: vmbus: Return -EINVAL for the sys files for 
unopened channels")
Cc: sta...@vger.kernel.org
Cc: K. Y. Srinivasan 
Cc: Haiyang Zhang 
Signed-off-by: Stephen Hemminger 
Signed-off-by: Dexuan Cui 
---

*NOTE*: the patch is based on char-misc's char-misc-linus branch.

 drivers/hv/ring_buffer.c | 31 -
 drivers/hv/vmbus_drv.c   | 91 
 include/linux/hyperv.h   |  5 +--
 3 files changed, 79 insertions(+), 48 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 64d0c85..1f1a55e 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -164,26 +164,25 @@ hv_get_ringbuffer_availbytes(const struct 
hv_ring_buffer_info *rbi,
 }
 
 /* Get various debug metrics for the specified ring buffer. */
-void hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
-struct hv_ring_buffer_debug_info *debug_info)
+int hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
+   struct hv_ring_buffer_debug_info *debug_info)
 {
u32 bytes_avail_towrite;
u32 bytes_avail_toread;
 
-   if (ring_info->ring_buffer) {
-   hv_get_ringbuffer_availbytes(ring_info,
-   &bytes_avail_toread,
-   &bytes_avail_towrite);
-
-   debug_info->bytes_avail_toread = bytes_avail_toread;
-   debug_info->bytes_avail_towrite = bytes_avail_towrite;
-   debug_info->current_read_index =
-   ring_info->ring_buffer->read_index;
-   debug_info->current_write_index =
-   ring_info->ring_buffer->write_index;
-   debug_info->current_interrupt_mask =
-   ring_info->ring_buffer->interrupt_mask;
-   }
+   if (!ring_info->ring_buffer)
+   return -EINVAL;
+
+   hv_get_ringbuffer_availbytes(ring_info,
+&bytes_avail_toread,
+&bytes_avail_towrite);
+   debug_info->bytes_avail_toread = bytes_avail_toread;
+   debug_info->bytes_avail_towrite = bytes_avail_towrite;
+   debug_info->current_read_index = ring_info->ring_buffer->read_index;
+   debug_info->current_write_index = ring_info->ring_buffer->write_index;
+   debug_info->current_interrupt_mask
+   = ring_info->ring_buffer->interrupt_mask;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(hv_ringbuffer_get_debuginfo);
 
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index d0ff656..403fee0 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -313,12 +313,16 @@ static ssize_t out_intr_mask_show(struct device *dev,
 {
struct hv_device *hv_dev = device_to_hv_device(dev);
struct hv_ring_buffer_debug_info outbound;
+   int ret;
 
if (!hv_dev->channel)
return -ENODEV;
-   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
-   return -EINVAL;
-   hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound, &outbound);
+
+   ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
+ &outbound);
+   if (ret < 0)
+   return ret;
+
return sprintf(buf, "%d\n", outbound.current_interrupt_mask);
 }
 static DEVICE_ATTR_RO(out_intr_mask);
@@ -328,12 +332,15 @@ static ssize_t out_read_index_show(struct device *dev,
 {
struct hv_device *hv_dev = device_to_hv_device(dev);
struct hv_ring_buffer_debug_info outbound;
+   int ret;
 
if (!hv_dev->channel)
return -ENODEV;
-   if (hv_dev->channel->state != CHANNEL_OPENED_STATE)
-   return -EINVAL;
-   hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound, &outbound);
+
+   ret = hv_ringbuffer_get_debuginfo(&hv_dev->channel->outbound,
+ &outbound);
+   if (ret < 0)
+   return ret;
return sprintf(buf, "%d\n", outbound.current_read_index);
 }
 static DEVICE_ATTR_RO(out_read_index);
@@ -344,12 +351,15 @@ static ssize_t out_write_index_show(struct device *dev,
 {
struct hv_device *hv_dev = device_to_hv_device(dev)

[PATCH][re-post] vmbus: fix subchannel removal

2019-01-09 Thread Dexuan Cui


The changes to split ring allocation from open/close, broke
the cleanup of subchannels. This resulted in problems using
uio on network devices because the subchannel was left behind
when the network device was unbound.

The cause was in the disconnect logic which used list splice
to move the subchannel list into a local variable. This won't
work because the subchannel list is needed later during the
process of the rescind messages (relid2channel).

The fix is to just leave the subchannel list in place
which is what the original code did. The list is cleaned
up later when the host rescind is processed.

Without the fix, we have a lot of "hang" issues in netvsc when we
try to change the NIC's MTU, set the number of channels, etc.

Fixes: ae6935ed7d42 ("vmbus: split ring buffer allocation from open")
Cc: sta...@vger.kernel.org
Signed-off-by: Stephen Hemminger 
Signed-off-by: Dexuan Cui 

---

The patch was firstly posted on Dec-7 2018:
https://www.spinics.net/lists/linux-driver-devel/msg120802.html
but it looks it's neglected.

Now let me rebase it to v5.0-rc1: the line "channel->num_sc = 0;"
in the original patch must be removed due to
4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API 
vmbus_get_outgoing_channel()")

When the patch is backported to v4.20.1, there will be a conflict because
v4.20.1 doesn't have 4d3c5c69191f. I suggest we cherry-pick 4d3c5c69191f
into v4.20.1, before we backport this patch.

 drivers/hv/channel.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index ce0ba20..bea4c98 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -701,19 +701,12 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
 int vmbus_disconnect_ring(struct vmbus_channel *channel)
 {
struct vmbus_channel *cur_channel, *tmp;
-   unsigned long flags;
-   LIST_HEAD(list);
int ret;
 
if (channel->primary_channel != NULL)
return -EINVAL;
 
-   /* Snapshot the list of subchannels */
-   spin_lock_irqsave(&channel->lock, flags);
-   list_splice_init(&channel->sc_list, &list);
-   spin_unlock_irqrestore(&channel->lock, flags);
-
-   list_for_each_entry_safe(cur_channel, tmp, &list, sc_list) {
+   list_for_each_entry_safe(cur_channel, tmp, &channel->sc_list, sc_list) {
if (cur_channel->rescind)
wait_for_completion(&cur_channel->rescind_event);
 
-- 
2.7.4

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] Drivers: hv: vmbus: Expose counters for interrupts and full conditions

2019-01-09 Thread Dexuan Cui
> From: Kimberly Brown 
> Sent: Friday, January 4, 2019 8:35 PM

...
> +What:
> /sys/bus/vmbus/devices//channels//intr_in_full
> +Date:   January 2019
> +KernelVersion:  4.21

There is no 4.21 version: see https://lkml.org/lkml/2019/1/6/178  :-)

Thanks!
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] Drivers: hv: vmbus: Expose counters for interrupts and full conditions

2019-01-09 Thread Dexuan Cui
> From: Michael Kelley 
> Sent: Saturday, January 5, 2019 1:01 PM
> > From: Kimberly Brown   Sent: Friday, January 4,
>  > 2019 8:35 PM
> >
> >  static inline void set_channel_pending_send_size(struct vmbus_channel *c,
> >  u32 size)
> >  {
> > +   if (size) {
> > +   ++c->out_full_total;
> > +
> > +   if (!c->out_full_flag) {
> > +   ++c->out_full_first;
> > +   c->out_full_flag = true;
> > +   }
> > +   } else {
> > +   c->out_full_flag = false;
> > +   }
> > +
> > c->outbound.ring_buffer->pending_send_sz = size;
> >  }
> >
> 
> I think there may be an atomicity problem with the above code.  I looked
> in the hv_sock code, and didn't see any locks being held when
> set_channel_pending_send_size() is called.   The original code doesn't need
> a lock because it is just storing a single value into pending_send_sz.
>
> In the similar code in hv_ringbuffer_write(), the ring buffer spin lock is 
> held
> while the counts are incremented and the out_full_flag is maintained, so all 
> is
> good there.  But some locking may be needed here.  Dexuan knows the
> hv_sock
> code best and can comment on whether there is any higher level
> synchronization
> that prevents multiple threads from running the above code on the same
> channel.
> Even if there is such higher level synchronization, this code probably 
> shouldn't
> depend on it for correctness.
> 
> Michael

Yes, there is indeed a higher level per-"sock" lock, e.g. in the code path
vsock_stream_sendmsg() -> vsock_stream_has_space() -> 
transport->stream_has_space() -> hvs_stream_has_space() -> 
hvs_set_channel_pending_send_size(), we acquire the lock by
lock_sock(sk) at the beginning of vsock_stream_sendmsg(), and call
release_sock(sk) at the end of the function.

So we don't have an atomicity issue here for hv_sock, which is the only user
of set_channel_pending_send_size(), so far.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v3] Drivers: hv: vmbus: Expose counters for interrupts and full conditions

2019-01-16 Thread Dexuan Cui
> From: Kimberly Brown 
> Sent: Wednesday, January 16, 2019 8:38 PM
> To: Michael Kelley ; Long Li
> ; Sasha Levin ;
> Dexuan Cui 
> Cc: KY Srinivasan ; Haiyang Zhang
> ; Stephen Hemminger
> ; de...@linuxdriverproject.org;
> linux-ker...@vger.kernel.org
> Subject: [PATCH v3] Drivers: hv: vmbus: Expose counters for interrupts and 
> full
> conditions
> 
> Counter values for per-channel interrupts and ring buffer full
> conditions are useful for investigating performance.
> 
> Expose counters in sysfs for 2 types of guest to host interrupts:
> 1) Interrupts caused by the channel's outbound ring buffer transitioning
> from empty to not empty
> 2) Interrupts caused by the channel's inbound ring buffer transitioning
> from full to not full while a packet is waiting for enough buffer space to
> become available
> 
> Expose 2 counters in sysfs for the number of times that write operations
> encountered a full outbound ring buffer:
> 1) The total number of write operations that encountered a full
> condition
> 2) The number of write operations that were the first to encounter a
> full condition
> 
> I tested this patch by confirming that the sysfs files were created and
> observing the counter values. The values seemed to increase by a
> reasonable amount when the Hyper-v related drivers were in use.
> 
> Signed-off-by: Kimberly Brown 
> ---
> Changes in v3:
>  - Used the outbound ring buffer spinlock to protect the the full
>condition counters in set_channel_pending_send_size()
>  - Corrected the KernelVersion values for the new entries in
>Documentation/ABI/stable/sysfs-bus-vmbus
> 
> Changes in v2:
>  - Added mailing lists to the cc list
>  - Removed the host to guest interrupt counters proposed in v1 because
>they were not accurate
>  - Added full condition counters for the channel's outbound ring buffer
> 
>  Documentation/ABI/stable/sysfs-bus-vmbus | 33 
>  drivers/hv/ring_buffer.c | 14 -
>  drivers/hv/vmbus_drv.c   | 32 
>  include/linux/hyperv.h   | 38
> 
>  4 files changed, 116 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus
> b/Documentation/ABI/stable/sysfs-bus-vmbus
> index 3fed8fdb873d..a0304c563467 100644
> --- a/Documentation/ABI/stable/sysfs-bus-vmbus
> +++ b/Documentation/ABI/stable/sysfs-bus-vmbus
> @@ -146,3 +146,36 @@ KernelVersion:   4.16
>  Contact: Stephen Hemminger 
>  Description: Binary file created by uio_hv_generic for ring buffer
>  Users:   Userspace drivers
> +
> +What:
> /sys/bus/vmbus/devices//channels//intr_in_full
> +Date:   January 2019
> +KernelVersion:  5.0
> +Contact:Michael Kelley 
> +Description:Number of guest to host interrupts caused by the inbound
> ring
> + buffer transitioning from full to not full while a packet is
> + waiting for buffer space to become available
> +Users:  Debugging tools
> +
> +What:
> /sys/bus/vmbus/devices//channels//intr_out_empty
> +Date:   January 2019
> +KernelVersion:  5.0
> +Contact:Michael Kelley 
> +Description:Number of guest to host interrupts caused by the outbound
> ring
> + buffer transitioning from empty to not empty
> +Users:  Debugging tools
> +
> +What:
> /sys/bus/vmbus/devices//channels//out_full_first
> +Date:   January 2019
> +KernelVersion:  5.0
> +Contact:Michael Kelley 
> +Description:Number of write operations that were the first to encounter
> an
> + outbound ring buffer full condition
> +Users:  Debugging tools
> +
> +What:
> /sys/bus/vmbus/devices//channels//out_full_total
> +Date:   January 2019
> +KernelVersion:  5.0
> +Contact:Michael Kelley 
> +Description:Total number of write operations that encountered an
> outbound
> + ring buffer full condition
> +Users:  Debugging tools
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 1f1a55e07733..9e8b31ccc142 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -74,8 +74,10 @@ static void hv_signal_on_write(u32 old_write, struct
> vmbus_channel *channel)
>* This is the only case we need to signal when the
>* ring transitions from being empty to non-empty.
>*/
> - if (old_write == READ_ONCE(rbi->ring_buffer->read_index))
> + if (old_write == READ_ONCE(rbi->ring_buffer->read_index)) {
> + ++channel->intr_out_empty;
>   vmbus_set

RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-21 Thread Dexuan Cui
> From: Kimberly Brown 
> Sent: Monday, January 21, 2019 6:08 PM
> Subject: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions
> 
> The channel level "_show" functions are vulnerable to race conditions.
> Add a mutex lock and unlock around the call to the channel level "_show"
> functions in vmbus_chan_attr_show().
> 
> This problem was discussed here:
> https://lkml.org/lkml/2018/10/18/830
> 
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1414,6 +1414,7 @@ static ssize_t vmbus_chan_attr_show(struct kobject
> *kobj,
>   = container_of(attr, struct vmbus_chan_attribute, attr);
>   const struct vmbus_channel *chan
>   = container_of(kobj, struct vmbus_channel, kobj);
> + ssize_t ret;
> 
>   if (!attribute->show)
>   return -EIO;
> @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> kobject *kobj,
>   if (chan->state != CHANNEL_OPENED_STATE)
>   return -EINVAL;
> 
> - return attribute->show(chan, buf);
> + mutex_lock(&vmbus_connection.channel_mutex);
> + ret = attribute->show(chan, buf);
> + mutex_unlock(&vmbus_connection.channel_mutex);
> + return ret;
>  }

It looks this patch is already able to fix the original issue:
6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
as chan->state can't be CHANNEL_OPENED_STATE when
channel->ringbuffer_page is not set up yet, or has been freed.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-22 Thread Dexuan Cui
> From: Kimberly Brown 
> Sent: Monday, January 21, 2019 10:43 PM
> > > @@ -1421,7 +1422,10 @@ static ssize_t vmbus_chan_attr_show(struct
> > > kobject *kobj,
> > >   if (chan->state != CHANNEL_OPENED_STATE)
> > >   return -EINVAL;
> > >
> > > - return attribute->show(chan, buf);
> > > + mutex_lock(&vmbus_connection.channel_mutex);
> > > + ret = attribute->show(chan, buf);
> > > + mutex_unlock(&vmbus_connection.channel_mutex);
> > > + return ret;
> > >  }
> >
> > It looks this patch is already able to fix the original issue:
> > 6712cc9c2211 ("vmbus: don't return values for uninitalized channels"),
> > as chan->state can't be CHANNEL_OPENED_STATE when
> > channel->ringbuffer_page is not set up yet, or has been freed.
> > -- Dexuan
> 
> I think that patch 6712cc9c2211 fixes the problem when the channel is
> not set up yet, but it doesn't fix the problem when the channel is being
> closed
Yeah, now I see your point.

> The channel could be freed after the check that "chan->state" is
> CHANNEL_OPENED_STATE, while the "attribute->show()" function is running.

IMO the channel struct itself can't be freed while the "attribute->show()"
function is running, because I suppose the sysfs interface should have an extra
reference to channel->kobj (see vmbus_add_channel_kobj()), so before the sysfs
files are removed, the channel struct itself can't disappear.
(I didn't check the related code very carefully, so I could be wrong. :-) )

But as you pointed, at least for sub-channels, channel->ringbuffer_page
can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
the "attribute->show()" could crash when the race happens.
Adding channel_mutex here seems to be able to fix the race for
sub-channels, as the same channel_mutex is used in vmbus_disconnect_ring().

For a primary channel, vmbus_close() -> vmbus_free_ring() can still
free channel->ringbuffer_page without notifying the "attribute->show()".
We may also need to acquire/release the channel_mutex in vmbus_close()?
 
> Actually, there should be checks that "chan" is not null and that
> "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> need to fix that.
I suppose "chan" can not be NULL here (see the above).

Checking "chan->state" may not help to completely fix the race, because
there is no locking/synchronization code in
vmbus_close_internal() when we test and change "channel->state".

I guess we may need to check if channel->ringbuffer_page is NULL in
the "attribute->show()". 

PS, to prove that a race condition does exist and can really cause a panic or
something, I usually add some msleep() delays in different paths so that I
can reproduce the crash every time I take a special action, e.g. when I read
the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can 
prove
a patch can indeed help, at least it can fix the crash which would happen
without the patch. :-)

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-01-23 Thread Dexuan Cui


Add the Hyper-V _DSM command set to the white list of NVDIMM command
sets.

This command set is documented at http://www.uefi.org/RFIC_LIST
(see the link to "Virtual NVDIMM 0x1901" on the page).

Signed-off-by: Dexuan Cui 
---

I'm going to change the user-space utility "ndctl" to support Hyper-V Virtual 
NVDIMM.
This kernel patch is required first.

 drivers/acpi/nfit/core.c   | 5 -
 drivers/acpi/nfit/nfit.h   | 6 +-
 include/uapi/linux/ndctl.h | 1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 011d3db19c80..fb48cb17a519 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1840,7 +1840,7 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc 
*acpi_desc,
dev_set_drvdata(&adev_dimm->dev, nfit_mem);
 
/*
-* Until standardization materializes we need to consider 4
+* Until standardization materializes we need to consider 5
 * different command sets.  Note, that checking for function0 (bit0)
 * tells us if any commands are reachable through this GUID.
 */
@@ -1865,6 +1865,8 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc 
*acpi_desc,
dsm_mask &= ~(1 << 8);
} else if (nfit_mem->family == NVDIMM_FAMILY_MSFT) {
dsm_mask = 0x;
+   } else if (nfit_mem->family == NVDIMM_FAMILY_HYPERV) {
+   dsm_mask = 0x1f;
} else {
dev_dbg(dev, "unknown dimm command family\n");
nfit_mem->family = -1;
@@ -3707,6 +3709,7 @@ static __init int nfit_init(void)
guid_parse(UUID_NFIT_DIMM_N_HPE1, &nfit_uuid[NFIT_DEV_DIMM_N_HPE1]);
guid_parse(UUID_NFIT_DIMM_N_HPE2, &nfit_uuid[NFIT_DEV_DIMM_N_HPE2]);
guid_parse(UUID_NFIT_DIMM_N_MSFT, &nfit_uuid[NFIT_DEV_DIMM_N_MSFT]);
+   guid_parse(UUID_NFIT_DIMM_N_HYPERV, &nfit_uuid[NFIT_DEV_DIMM_N_HYPERV]);
 
nfit_wq = create_singlethread_workqueue("nfit");
if (!nfit_wq)
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index 33691aecfcee..9194022f29f3 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -34,11 +34,14 @@
 /* https://msdn.microsoft.com/library/windows/hardware/mt604741 */
 #define UUID_NFIT_DIMM_N_MSFT "1ee68b36-d4bd-4a1a-9a16-4f8e53d46e05"
 
+/* http://www.uefi.org/RFIC_LIST */
+#define UUID_NFIT_DIMM_N_HYPERV "5746c5f2-a9a2-4264-ad0e-e4ddc9e09e80"
+
 #define ACPI_NFIT_MEM_FAILED_MASK (ACPI_NFIT_MEM_SAVE_FAILED \
| ACPI_NFIT_MEM_RESTORE_FAILED | ACPI_NFIT_MEM_FLUSH_FAILED \
| ACPI_NFIT_MEM_NOT_ARMED | ACPI_NFIT_MEM_MAP_FAILED)
 
-#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_MSFT
+#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_HYPERV
 
 #define NVDIMM_STANDARD_CMDMASK \
 (1 << ND_CMD_SMART | 1 << ND_CMD_SMART_THRESHOLD | 1 << ND_CMD_DIMM_FLAGS \
@@ -94,6 +97,7 @@ enum nfit_uuids {
NFIT_DEV_DIMM_N_HPE1 = NVDIMM_FAMILY_HPE1,
NFIT_DEV_DIMM_N_HPE2 = NVDIMM_FAMILY_HPE2,
NFIT_DEV_DIMM_N_MSFT = NVDIMM_FAMILY_MSFT,
+   NFIT_DEV_DIMM_N_HYPERV = NVDIMM_FAMILY_HYPERV,
NFIT_SPA_VOLATILE,
NFIT_SPA_PM,
NFIT_SPA_DCR,
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index f57c9e434d2d..de5d90212409 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -243,6 +243,7 @@ struct nd_cmd_pkg {
 #define NVDIMM_FAMILY_HPE1 1
 #define NVDIMM_FAMILY_HPE2 2
 #define NVDIMM_FAMILY_MSFT 3
+#define NVDIMM_FAMILY_HYPERV 4
 
 #define ND_IOCTL_CALL  _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-01-28 Thread Dexuan Cui
> From: Dan Williams 
> Sent: Monday, January 28, 2019 1:01 PM
> 
> Hi Dexuan,
> Looks good. Just one update request and a note below...
> 
> On Wed, Jan 23, 2019 at 12:51 PM Dexuan Cui  wrote:
> > ...
> > --- a/drivers/acpi/nfit/core.c
> > +++ b/drivers/acpi/nfit/core.c
> > @@ -1840,7 +1840,7 @@ static int acpi_nfit_add_dimm(struct
> acpi_nfit_desc *acpi_desc,
> > dev_set_drvdata(&adev_dimm->dev, nfit_mem);
> >
> > /*
> > -* Until standardization materializes we need to consider 4
> > +* Until standardization materializes we need to consider 5
> >  * different command sets.  Note, that checking for function0
> (bit0)
> >  * tells us if any commands are reachable through this GUID.
> >  */
> 
> This comment is stale. This "HYPERV" family is the first example of
> the "right" way to define a new NVDIMM command set. Lets update it to
> mention the process and considerations for submitting new command sets
> to UEFI (http://www.uefi.org/RFIC_LIST). The fact that there's now a
> defined process is a step forward from when this comment was initially
> written. Also, the fact that the process encourages "adopt" vs
> "supersede" addresses the main concern about vendor-specific
> command-set proliferation.

I made the below simple change. Is this enough? Please suggest the proper
wording/sentence, as I'm relatively new to NVDIMM, and I don't really know the
history of the standardization process.

--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1732,8 +1732,10 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc 
*acpi_desc,
dev_set_drvdata(&adev_dimm->dev, nfit_mem);

/*
-* Until standardization materializes we need to consider 4
-* different command sets.  Note, that checking for function0 (bit0)
+* New command sets should be submitted to UEFI
+* http://www.uefi.org/RFIC_LIST.
+*
+* Note, that checking for function0 (bit0)
 * tells us if any commands are reachable through this GUID.
 */
for (i = 0; i <= NVDIMM_FAMILY_MAX; i++)

 
> > @@ -1865,6 +1865,8 @@ static int acpi_nfit_add_dimm(struct
> acpi_nfit_desc *acpi_desc,
> > dsm_mask &= ~(1 << 8);
> > } else if (nfit_mem->family == NVDIMM_FAMILY_MSFT) {
> > dsm_mask = 0x;
> > +   } else if (nfit_mem->family == NVDIMM_FAMILY_HYPERV) {
> > +   dsm_mask = 0x1f;
> 
> Just a note that starting with commit 5e9e38d0db1d "acpi/nfit: Block
> function zero DSMs", bit0 in this mask will be cleared to ensure that
> only the kernel has the ability to probe for supported DSM functions.

Thanks for the note!

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-01-28 Thread Dexuan Cui
> From: Dan Williams 
> Sent: Monday, January 28, 2019 1:55 PM
> 
> On Mon, Jan 28, 2019 at 1:40 PM Dexuan Cui  wrote:
> 
> > I made the below simple change. Is this enough? Please suggest the proper
> > wording/sentence, as I'm relatively new to NVDIMM, and I don't really know
> the history of the standardization process.
> 
> How about something a bit more relevant for the code in question:
> 
 
> There are 4 "legacy" NVDIMM command sets
> (NVDIMM_FAMILY_{INTEL,MSFT,HPE1,HPE2}) that were created before an EFI
> working group was established to constrain this proliferation. The
> nfit driver probes for the supported command set by GUID. Note, If
> you're a platform developer looking to add a new command set to this
> probe consider using an existing set, or otherwise seek approval to
> publish the command set at
> http://www.uefi.org/RFIC_LIST

Looks perfect! Let me use this, rebase the patch to 
https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git/tag/?h=libnvdimm-fixes-5.0-rc4
and post a v2 later today.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-01-28 Thread Dexuan Cui


Add the Hyper-V _DSM command set to the white list of NVDIMM command
sets.

This command set is documented at http://www.uefi.org/RFIC_LIST
(see "Virtual NVDIMM 0x1901").

Thanks Dan Williams  for writing the
comment change.

Signed-off-by: Dexuan Cui 
Reviewed-by: Michael Kelley 
---

Changes in v2:
Updated the comment and changelog (Thanks, Dan!)
Rebased to the tag libnvdimm-fixes-5.0-rc4 of the nvdimm tree.

 drivers/acpi/nfit/core.c   | 17 ++---
 drivers/acpi/nfit/nfit.h   |  6 +-
 include/uapi/linux/ndctl.h |  1 +
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index e18ade5d74e9..a9270c99be72 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -1861,9 +1861,17 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc 
*acpi_desc,
dev_set_drvdata(&adev_dimm->dev, nfit_mem);
 
/*
-* Until standardization materializes we need to consider 4
-* different command sets.  Note, that checking for function0 (bit0)
-* tells us if any commands are reachable through this GUID.
+* There are 4 "legacy" NVDIMM command sets
+* (NVDIMM_FAMILY_{INTEL,MSFT,HPE1,HPE2}) that were created before
+* an EFI working group was established to constrain this
+* proliferation. The nfit driver probes for the supported command
+* set by GUID. Note, if you're a platform developer looking to add
+* a new command set to this probe, consider using an existing set,
+* or otherwise seek approval to publish the command set at
+* http://www.uefi.org/RFIC_LIST.
+*
+* Note, that checking for function0 (bit0) tells us if any commands
+* are reachable through this GUID.
 */
for (i = 0; i <= NVDIMM_FAMILY_MAX; i++)
if (acpi_check_dsm(adev_dimm->handle, to_nfit_uuid(i), 1, 1))
@@ -1886,6 +1894,8 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc 
*acpi_desc,
dsm_mask &= ~(1 << 8);
} else if (nfit_mem->family == NVDIMM_FAMILY_MSFT) {
dsm_mask = 0x;
+   } else if (nfit_mem->family == NVDIMM_FAMILY_HYPERV) {
+   dsm_mask = 0x1f;
} else {
dev_dbg(dev, "unknown dimm command family\n");
nfit_mem->family = -1;
@@ -3729,6 +3739,7 @@ static __init int nfit_init(void)
guid_parse(UUID_NFIT_DIMM_N_HPE1, &nfit_uuid[NFIT_DEV_DIMM_N_HPE1]);
guid_parse(UUID_NFIT_DIMM_N_HPE2, &nfit_uuid[NFIT_DEV_DIMM_N_HPE2]);
guid_parse(UUID_NFIT_DIMM_N_MSFT, &nfit_uuid[NFIT_DEV_DIMM_N_MSFT]);
+   guid_parse(UUID_NFIT_DIMM_N_HYPERV, &nfit_uuid[NFIT_DEV_DIMM_N_HYPERV]);
 
nfit_wq = create_singlethread_workqueue("nfit");
if (!nfit_wq)
diff --git a/drivers/acpi/nfit/nfit.h b/drivers/acpi/nfit/nfit.h
index 33691aecfcee..4de167b4f76f 100644
--- a/drivers/acpi/nfit/nfit.h
+++ b/drivers/acpi/nfit/nfit.h
@@ -34,11 +34,14 @@
 /* https://msdn.microsoft.com/library/windows/hardware/mt604741 */
 #define UUID_NFIT_DIMM_N_MSFT "1ee68b36-d4bd-4a1a-9a16-4f8e53d46e05"
 
+/* http://www.uefi.org/RFIC_LIST (see "Virtual NVDIMM 0x1901") */
+#define UUID_NFIT_DIMM_N_HYPERV "5746c5f2-a9a2-4264-ad0e-e4ddc9e09e80"
+
 #define ACPI_NFIT_MEM_FAILED_MASK (ACPI_NFIT_MEM_SAVE_FAILED \
| ACPI_NFIT_MEM_RESTORE_FAILED | ACPI_NFIT_MEM_FLUSH_FAILED \
| ACPI_NFIT_MEM_NOT_ARMED | ACPI_NFIT_MEM_MAP_FAILED)
 
-#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_MSFT
+#define NVDIMM_FAMILY_MAX NVDIMM_FAMILY_HYPERV
 
 #define NVDIMM_STANDARD_CMDMASK \
 (1 << ND_CMD_SMART | 1 << ND_CMD_SMART_THRESHOLD | 1 << ND_CMD_DIMM_FLAGS \
@@ -94,6 +97,7 @@ enum nfit_uuids {
NFIT_DEV_DIMM_N_HPE1 = NVDIMM_FAMILY_HPE1,
NFIT_DEV_DIMM_N_HPE2 = NVDIMM_FAMILY_HPE2,
NFIT_DEV_DIMM_N_MSFT = NVDIMM_FAMILY_MSFT,
+   NFIT_DEV_DIMM_N_HYPERV = NVDIMM_FAMILY_HYPERV,
NFIT_SPA_VOLATILE,
NFIT_SPA_PM,
NFIT_SPA_DCR,
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index f57c9e434d2d..de5d90212409 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -243,6 +243,7 @@ struct nd_cmd_pkg {
 #define NVDIMM_FAMILY_HPE1 1
 #define NVDIMM_FAMILY_HPE2 2
 #define NVDIMM_FAMILY_MSFT 3
+#define NVDIMM_FAMILY_HYPERV 4
 
 #define ND_IOCTL_CALL  _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-01-29 Thread Dexuan Cui
> From: Kimberly Brown 
> > ... 
> > But as you pointed, at least for sub-channels, channel->ringbuffer_page
> > can indeed disappear in vmbus_close() -> ... -> vmbus_free_ring(), and
> > the "attribute->show()" could crash when the race happens.
> > Adding channel_mutex here seems to be able to fix the race for
> > sub-channels, as the same channel_mutex is used in
> vmbus_disconnect_ring().
> >
> > For a primary channel, vmbus_close() -> vmbus_free_ring() can still
> > free channel->ringbuffer_page without notifying the "attribute->show()".
> > We may also need to acquire/release the channel_mutex in vmbus_close()?
> >
> > > Actually, there should be checks that "chan" is not null and that
> > > "chan->state" is CHANNEL_OPENED_STATE within the locked section. I'll
> > > need to fix that.
> > I suppose "chan" can not be NULL here (see the above).
> >
> > Checking "chan->state" may not help to completely fix the race, because
> > there is no locking/synchronization code in
> > vmbus_close_internal() when we test and change "channel->state".
> >
> 
> The calls to vmbus_close_internal() for the subchannels and the primary
> channel are protected with channel_mutex in vmbus_disconnect_ring().
> This prevents "channel->state" from changing while "attribute->show()" is
> running.
Ah, I think you're right. 
 
> > I guess we may need to check if channel->ringbuffer_page is NULL in
> > the "attribute->show()".
> >
> 
> For the primary channel, vmbus_free_ring() is called after the
> return from vmbus_disconnect_ring(). Therefore, the primary channel's
> state is changed before "channel->ringbuffer_page" is set to NULL.
> Checking the channel state should be sufficient to prevent the ring
> buffers from being freed while "attribute->show()" is running. The
> ring buffers can't be freed until the channel's state is changed, and
> the channel state change is protected by the mutex.
I think you're right (I noticed in a previous mail you mentioned you would
improve your patch to check "chan->state" with the mutax held).

> I think checking that "channel->ringbuffer_page" is not NULL would
> also work, but, as you stated, we would need to aquire/release
> channel_mutex in vmbus_close().
Then it looks unnecessary to check "channel->ringbuffer_page".
 
> > PS, to prove that a race condition does exist and can really cause a panic 
> > or
> > something, I usually add some msleep() delays in different paths so that I
> > can reproduce the crash every time I take a special action, e.g. when I read
> > the sysfs files of a NIC, I try to remove hv_netvsc driver. This way, I can 
> > prove
> > a patch can indeed help, at least it can fix the crash which would happen
> > without the patch. :-)
> >
> 
> Thanks! I was able to free the ring buffers while "attribute->show()"
> was running, which caused a null pointer dereference bug. As expected,
> the mutex lock fixed it.
Awesome!

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] nfit: acpi_nfit_ctl(): check out_obj->type in the right place

2019-01-29 Thread Dexuan Cui


In the case of ND_CMD_CALL, we should also check out_obj->type.

The patch uses out_obj->type, which is a short alias to
out_obj->package.type.

Fixes: 31eca76ba2fc ("nfit, libnvdimm: limited/whitelisted dimm command 
marshaling mechanism")
Cc: 
Signed-off-by: Dexuan Cui 
---
 drivers/acpi/nfit/core.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 0a49c57334cc..1fa378435dd2 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -554,6 +554,13 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, 
struct nvdimm *nvdimm,
return -EINVAL;
}
 
+   if (out_obj->type != ACPI_TYPE_BUFFER) {
+   dev_dbg(dev, "%s unexpected output object type cmd: %s type: 
%d\n",
+   dimm_name, cmd_name, out_obj->type);
+   rc = -EINVAL;
+   goto out;
+   }
+
if (call_pkg) {
call_pkg->nd_fw_size = out_obj->buffer.length;
memcpy(call_pkg->nd_payload + call_pkg->nd_size_in,
@@ -572,13 +579,6 @@ int acpi_nfit_ctl(struct nvdimm_bus_descriptor *nd_desc, 
struct nvdimm *nvdimm,
return 0;
}
 
-   if (out_obj->package.type != ACPI_TYPE_BUFFER) {
-   dev_dbg(dev, "%s unexpected output object type cmd: %s type: 
%d\n",
-   dimm_name, cmd_name, out_obj->type);
-   rc = -EINVAL;
-   goto out;
-   }
-
dev_dbg(dev, "%s cmd: %s output length: %d\n", dimm_name,
cmd_name, out_obj->buffer.length);
print_hex_dump_debug(cmd_name, DUMP_PREFIX_OFFSET, 4, 4,
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] nfit: Collect shutdown status for NVDIMM_FAMILY_HYPERV

2019-01-30 Thread Dexuan Cui


See http://www.uefi.org/RFIC_LIST ("Virtual NVDIMM 0x1901"):
"Get Unsafe Shutdown Count (Function Index 2)".

Let's expose the info to the userspace (e.g. ntctl) via sysfs.

Signed-off-by: Dexuan Cui 
---
 drivers/acpi/nfit/core.c   | 51 ++
 drivers/acpi/nfit/hyperv.h | 26 +++
 2 files changed, 77 insertions(+)
 create mode 100644 drivers/acpi/nfit/hyperv.h

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index d5a164b6833a..d504da34ce34 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include "intel.h"
+#include "hyperv.h"
 #include "nfit.h"
 
 /*
@@ -1802,6 +1803,54 @@ __weak void nfit_intel_shutdown_status(struct nfit_mem 
*nfit_mem)
}
 }
 
+__weak void nfit_hyperv_shutdown_status(struct nfit_mem *nfit_mem)
+{
+   struct device *dev = &nfit_mem->adev->dev;
+   struct nd_hyperv_shutdown_status status;
+   union acpi_object in_buf = {
+   .buffer.type = ACPI_TYPE_BUFFER,
+   .buffer.length = 0,
+   };
+   union acpi_object in_obj = {
+   .package.type = ACPI_TYPE_PACKAGE,
+   .package.count = 1,
+   .package.elements = &in_buf,
+   };
+   const u8 func = ND_HYPERV_GET_UNSAFE_SHUTDOWN_COUNT;
+   const guid_t *guid = to_nfit_uuid(nfit_mem->family);
+   u8 revid = nfit_dsm_revid(nfit_mem->family, func);
+   struct acpi_device *adev = nfit_mem->adev;
+   acpi_handle handle = adev->handle;
+   union acpi_object *out_obj;
+
+   if ((nfit_mem->dsm_mask & BIT(func)) == 0)
+   return;
+
+   out_obj = acpi_evaluate_dsm(handle, guid, revid, func, &in_obj);
+   if (!out_obj || out_obj->type != ACPI_TYPE_BUFFER
+|| out_obj->buffer.length < sizeof(status)) {
+   dev_dbg(dev->parent,
+   "%s: failed to get Unsafe Shutdown Count\n",
+   dev_name(dev));
+   ACPI_FREE(out_obj);
+   return;
+   }
+
+   memcpy(&status, out_obj->buffer.pointer, sizeof(status));
+   ACPI_FREE(out_obj);
+
+   if (status.general_err != 0) {
+   dev_dbg(dev->parent,
+   "%s: failed to get Unsafe Shutdown Count: err=0x%x\n",
+   dev_name(dev), status.status);
+   return;
+   }
+
+   set_bit(NFIT_MEM_DIRTY_COUNT, &nfit_mem->flags);
+   nfit_mem->dirty_shutdown = status.shutdown_count;
+}
+
+
 static void populate_shutdown_status(struct nfit_mem *nfit_mem)
 {
/*
@@ -1811,6 +1860,8 @@ static void populate_shutdown_status(struct nfit_mem 
*nfit_mem)
 */
if (nfit_mem->family == NVDIMM_FAMILY_INTEL)
nfit_intel_shutdown_status(nfit_mem);
+   else if (nfit_mem->family == NVDIMM_FAMILY_HYPERV)
+   nfit_hyperv_shutdown_status(nfit_mem);
 }
 
 static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc,
diff --git a/drivers/acpi/nfit/hyperv.h b/drivers/acpi/nfit/hyperv.h
new file mode 100644
index ..c4a25b8624cc
--- /dev/null
+++ b/drivers/acpi/nfit/hyperv.h
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(c) 2019 Microsoft Corporation. All rights reserved.
+ * Hyper-V specific definitions for _DSM of Hyper-V Virtual NVDIMM
+ *
+ * See http://www.uefi.org/RFIC_LIST (Virtual NVDIMM 0x1901)
+ */
+#ifndef _NFIT_HYPERV_H_
+#define _NFIT_HYPERV_H_
+
+#define ND_HYPERV_GET_UNSAFE_SHUTDOWN_COUNT 2
+
+struct nd_hyperv_shutdown_status {
+   union {
+   u32 status;
+   struct {
+   u16 general_err;
+   u8  func_err;
+   u8  vendor_err;
+   };
+   };
+   u32 shutdown_count;
+} __packed;
+
+
+#endif
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] nfit: Collect shutdown status for NVDIMM_FAMILY_HYPERV

2019-01-30 Thread Dexuan Cui
> From: Greg KH 
> Sent: Wednesday, January 30, 2019 11:38 AM
> 
> On Wed, Jan 30, 2019 at 06:48:40PM +0000, Dexuan Cui wrote:
> >
> > Let's expose the info to the userspace (e.g. ntctl) via sysfs.
> 
> If you add a new sysfs file, you need to add a new Documentation/ABI/
> update as well :(

It's an existing sysfs node, which was originally added by Dan in Sep 2018: 
/sys/bus/nd/devices/nmem0/nfit/dirty_shutdown. 

Before this patch, the node doesn't appear when Linux runs on Hyper-V,
and with this patch, the node can appear now.

However, indeed, the node and the other related nodes have not been
documented in Documentation/ABI/testing/sysfs-bus-nfit yet.

I suppose Dan would add that?

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] nfit: Collect shutdown status for NVDIMM_FAMILY_HYPERV

2019-01-30 Thread Dexuan Cui
> From: Linux-nvdimm  On Behalf Of
> Dexuan Cui
> Sent: Wednesday, January 30, 2019 12:03 PM
> To: Greg KH 
> Cc: Josh Poulson ; linux-nvd...@lists.01.org;
> Haiyang Zhang ;
> driverdev-devel@linuxdriverproject.org; Rafael J. Wysocki
> ; linux-ker...@vger.kernel.org; Michael Kelley
> ; Sasha Levin ;
> linux-a...@vger.kernel.org; Ross Zwisler ; Stephen
> Hemminger ; Len Brown 
> Subject: RE: [PATCH] nfit: Collect shutdown status for
> NVDIMM_FAMILY_HYPERV
> 
> > From: Greg KH 
> > Sent: Wednesday, January 30, 2019 11:38 AM
> >
> > On Wed, Jan 30, 2019 at 06:48:40PM +, Dexuan Cui wrote:
> > >
> > > Let's expose the info to the userspace (e.g. ntctl) via sysfs.
> >
> > If you add a new sysfs file, you need to add a new Documentation/ABI/
> > update as well :(
> 
> It's an existing sysfs node, which was originally added by Dan in Sep 2018:
> /sys/bus/nd/devices/nmem0/nfit/dirty_shutdown.
> 
> Before this patch, the node doesn't appear when Linux runs on Hyper-V,
> and with this patch, the node can appear now.
> 
> However, indeed, the node and the other related nodes have not been
> documented in Documentation/ABI/testing/sysfs-bus-nfit yet.
> I suppose Dan would add that?
Actually the other nodes have been documented, and only the
"dirty_shutdown" is missed. 

It should be straightfoward. Let me make a patch for this.

Thanks for the reminder, Greg!
 
Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] nfit: Document sysfs interface dirty_shutdown

2019-01-30 Thread Dexuan Cui


The new sysfs node was added in Sep 2018 in:
commit 0ead11181fe0 ("acpi, nfit: Collect shutdown status")

Now let's document it.

Signed-off-by: Dexuan Cui 
---
 Documentation/ABI/testing/sysfs-bus-nfit | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-nfit 
b/Documentation/ABI/testing/sysfs-bus-nfit
index a1cb44dcb908..bd6cde8c0ea9 100644
--- a/Documentation/ABI/testing/sysfs-bus-nfit
+++ b/Documentation/ABI/testing/sysfs-bus-nfit
@@ -153,6 +153,14 @@ Description:
subsystem controller vendor.
 
 
+What:  /sys/bus/nd/devices/nmemX/nfit/dirty_shutdown
+Date:  Sep, 2018
+KernelVersion: v4.20
+Contact:   linux-nvd...@lists.01.org
+Description:
+   (RO) Unsafe Shutdown Count for the NVDIMM device.
+
+
 What:  /sys/bus/nd/devices/ndbusX/nfit/revision
 Date:  Jun, 2015
 KernelVersion: v4.2
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: Add mutex lock to channel show functions

2019-02-01 Thread Dexuan Cui
> From: Kimberly Brown 
> Sent: Thursday, January 31, 2019 9:47 AM
> ...
> 2) Prevent a deadlock that can occur between the proposed mutex_lock()
> call in the vmbus_chan_attr_show() function and the sysfs/kernfs functions.
Hi Kim,
Can you please share more details about the deadlock? 
It's unclear to me how the deadlock happens.

> I've identified two possible solutions to the deadlock:
> 
> 1) Add a new mutex to the vmbus_channel struct which protects changes to
> "channel->state". Use this new mutex in vmbus_chan_attr_show() instead of
> "vmbus_connection.channel_mutex".
> 
> 2) Use "vmbus_connection.channel_mutex" in vmbus_chan_attr_show() as
> originally proposed, and acquire it with mutex_trylock(). If the mutex
> cannot be acquired, return -EINVAL.
It looks more like a workaround. IMO we should try to find a real fix. :-)
 
> I'm leaning towards (2), using mutex_trylock().
> "vmbus_connection.channel_mutex"
> appears to be used only when a channel is being opened or closed, so
> vmbus_chan_attr_show() should be able to acquire the mutex when the
> channel is in use.
> 
> If anyone has suggestions, please let me know.
> 
> Thanks,
> Kim

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-02-01 Thread Dexuan Cui
> From: Dan Williams 
> Sent: Friday, February 1, 2019 9:29 AM
> > Hi Dan,
> > Unluckily it looks this commit causes a regression ...
> > With the patch, "ndctl list" shows nothing, and /dev/pmem0 can't appear.
> > If I revert the patch, it will be back to normal.
> >
> > I attached the config/logs. In the bad case, "dmesg" shows a line
> > [5.259017] nd_pmem namespace0.0: 0x, too small
> must be at least 0x1000
> > Any idea why this happens? I'm digging into the details and I appreciate 
> > your
> insights.
> 
> Looks like it is working as expected. 

I was working on linux-next tree's next-20190107 and this patch did "work fine"
there. The "regression" happens on djbw/nvdimm.git tree's libnvdimm-pending
branch because we have this recent commit (Jan 19):

11189c1089da ("acpi/nfit: Fix command-supported detection"), which makes such
a change in acpi_nfit_ctl():

-   if (!test_bit(cmd, &cmd_mask) || !test_bit(func, &dsm_mask))
+   if (cmd == ND_CMD_CALL && !test_bit(func, &dsm_mask))
+   return -ENOTTY;
+   else if (!test_bit(cmd, &cmd_mask))
return -ENOTTY;

So previously ND_CMD_GET_CONFIG_DATA fails with -ENOTTY and we're good.

Now the command succeeds, but it looks the returned data is inavlid, and I see
the "regression".

> The regression you are seeing is the fact that the patch enables the kernel to
> enable nvdimm-namespace-label reads. 
Yes.

> Those reads find a namespace index block
> and a label. Unfortunately the label has the LOCAL flag set and Linux
> explicitly ignores pmem namespace labels with that bit set. The reason
Can you please point out the function that ignores the flag?

I checked where NSLABEL_FLAG_LOCAL is used, but it looks I can't find a
related function.

> for that is due to the fact that the original definition of the LOCAL
> bit from v1.1 of the namespace label implementation [1] explicitly
> limited the LOCAL flag to "block aperture" regions. If you clear that
> LOCAL flag I expect it will work. To my knowledge Windows pretends
> that the v1.1 definition never existed.
I'm trying to find out where the flag is used and how to clear it.

> The UEFI 2.7 specification for v1.2 labels states that setting the
> LOCAL flag is optional when "nlabel", number of labels in the set, is
> 1. Linux makes that mandatory as LOCAL is redundant when nlabel is 1.
> 
> That said, the Robustness Principle makes a case that Linux should
> tolerate the bit being set. However, it's just a non-trivial amount of
> work to unwind the ingrained block-aperture assumptions of that bit.
Can you please explain this a bit more? Sorry, I'm new to this area...

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-02-01 Thread Dexuan Cui
> From: Dan Williams 
> Sent: Friday, February 1, 2019 3:47 PM
> To: Dexuan Cui 
> 
> I believe it's the same reason. Without 11189c1089da the _LSR method
> will fail, and otherwise it works and finds the label that it doesn't
> like.
Exactly.
 
> I'm not seeing "invalid" data in your failure log. Could you double
> check that it's just not the success of _LSR that causes the issue?

acpi_label_read() never fails for me.

By "invalid", I only mean the messages in the dmesg.bad.txt I previously
attached (I'm just reading the specs to learn the details about NVDIMM
namespace's labels, so my description might be inaccurate) :

[4.832367] nvdimm nmem1: nsindex0 labelsize 1 invalid
[4.832369] nvdimm nmem1: nsindex1 labelsize 1 invalid
...
[5.259017] nd_pmem namespace0.0: 0x, too small must be at 
least 0x1000

> > > The regression you are seeing is the fact that the patch enables the 
> > > kernel
> to
> > > enable nvdimm-namespace-label reads.
> > Yes.
> >
> > > Those reads find a namespace index block
> > > and a label. Unfortunately the label has the LOCAL flag set and Linux
> > > explicitly ignores pmem namespace labels with that bit set. The reason
> > Can you please point out the function that ignores the flag?
> >
> > I checked where NSLABEL_FLAG_LOCAL is used, but it looks I can't find a
> > related function.
> 
> scan_labels() is where the namespace label is validated relative to
> the region type:
> 
> if (is_nd_blk(&nd_region->dev)
> == !!(flags & NSLABEL_FLAG_LOCAL))
> /* pass, region matches label type */;
> else
> continue;
> 
> It also has meaning for the namespace capacity allocation
> implementation that needed that flag to distinguish aliased capacity
> between Block Aperture Mode and PMEM Mode access.
Thanks for the pointer! I'm looking at this function.

> > > for that is due to the fact that the original definition of the LOCAL
> > > bit from v1.1 of the namespace label implementation [1] explicitly
> > > limited the LOCAL flag to "block aperture" regions. If you clear that
> > > LOCAL flag I expect it will work. To my knowledge Windows pretends
> > > that the v1.1 definition never existed.
> > I'm trying to find out where the flag is used and how to clear it.
> 
> Assuming Hyper-V implements _LSW, you can recreate / reinitialize the
> label area:

I think Hyper-V only implements _LSR:
[4.720623] nfit ACPI0012:00: device:00: has _LSR
[4.723683] nfit ACPI0012:00: device:01: has _LSR
 
> > > The UEFI 2.7 specification for v1.2 labels states that setting the
> > > LOCAL flag is optional when "nlabel", number of labels in the set, is
> > > 1. Linux makes that mandatory as LOCAL is redundant when nlabel is 1.
> > >
> > > That said, the Robustness Principle makes a case that Linux should
> > > tolerate the bit being set. However, it's just a non-trivial amount of
> > > work to unwind the ingrained block-aperture assumptions of that bit.
> > Can you please explain this a bit more? Sorry, I'm new to this area...
> 
> The short story is that Linux enforces that LOCAL == Block Mode
> Namespaces. See section 2.2 Namespace Label Layout in the original
> spec [1]. The EFI 2.7 definition tried to allow for LOCAL to be set
> when an interleave-set was comprised of a single NVDIMM, but then also
> states its optional when Nlabel is 1. It has zero functional use for
> interleave-set based namespaces even when the interleave-set-width is
> 1. So Linux takes the option to never set it, and goes further to
> reject it if it's set and the region-type does not match, because that
> follows the v1.1 meaning of the flag.
> 
> [1]:
Thanks for the link! I'll read it.
BTW, it looks Hyper-V only supports PMEM namespace, at least so far.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-02-01 Thread Dexuan Cui
> From: Linux-nvdimm  On Behalf Of
> Dexuan Cui
> Sent: Friday, February 1, 2019 4:34 PM
> > > > ...
> > > > Those reads find a namespace index block
> > > > and a label. Unfortunately the label has the LOCAL flag set and Linux
> > > > explicitly ignores pmem namespace labels with that bit set. The reason
> > > > for that is due to the fact that the original definition of the LOCAL
> > > > bit from v1.1 of the namespace label implementation [1] explicitly
> > > > limited the LOCAL flag to "block aperture" regions. If you clear that
> > > > LOCAL flag I expect it will work. To my knowledge Windows pretends
> > > > that the v1.1 definition never existed.

On the libnvdimm-pending branch, I get this:

root@decui-gen2-u1904:~/nvdimm# ndctl list
root@decui-gen2-u1904:~/nvdimm# ndctl list --idle
[
  {
"dev":"namespace1.0",
"mode":"raw",
"size":0,
"uuid":"----",
"state":"disabled"
  },
  {
"dev":"namespace0.0",
"mode":"raw",
"size":0,
"uuid":"----",
"state":"disabled"
  }
]

With the patch that clears the LOCAL label (BTW, the initial value of flags is 
0x3,
meaning a read-only local label) :
@@ -2496,6 +2500,7 @@ static int init_active_labels(struct nd_region *nd_region)
if (!label_ent)
break;
label = nd_label_active(ndd, j);
+   label->flags &= ~NSLABEL_FLAG_LOCAL;
label_ent->label = label;

I get this:

root@decui-gen2-u1904:~/nvdimm# ndctl list
root@decui-gen2-u1904:~/nvdimm# ndctl list --idle
[
  {
"dev":"namespace1.0",
"mode":"raw",
"size":0,
"uuid":"c258aaab-f72b-e546-bfa5-be5e07761dbc",
"state":"disabled",
"name":"Microsoft Hyper-V NVDIMM 1 Label"
  },
  {
"dev":"namespace0.0",
"mode":"raw",
"size":0,
"uuid":"9f0497a7-4453-7c40-ad35-21a791e00345",
"state":"disabled",
"name":"Microsoft Hyper-V NVDIMM 0 Label"
  }
]

The "size" and "mode" still don't look right, but the improvement is that
now I can see a good descriptive "name", which I suppose is retrieved
from Hyper-V.

With Ubuntu 19.04 (4.18.0-11-generic), I get this:
(Note: the "mode" and "size" are correct. The "uuid" is different from
the above "9f0497a7-4453-7c40-ad35-21a791e00345" -- this is weird.) 

root@decui-gen2-u1904:~# ndctl list
[
  {
"dev":"namespace1.0",
"mode":"raw",
"size":137438953472,
"blockdev":"pmem1"
  },
  {
"dev":"namespace0.0",
"mode":"fsdax",
"map":"dev",
"size":33820770304,
"uuid":"35018886-397e-4fe7-a348-0a4d16eec44d",
"blockdev":"pmem0"
  }
]
 
I'm trying to find out the correct solution. I apprecite your insights!

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-02-01 Thread Dexuan Cui
> From: Dan Williams 
> Sent: Friday, February 1, 2019 5:29 PM
> > ...
> > The "size" and "mode" still don't look right, but the improvement is that
> > now I can see a good descriptive "name", which I suppose is retrieved
> > from Hyper-V.
> 
> Mode is right, there is no way for Hyper-V to create Linux fsdax mode
> namespaces it requires some setup using variables only Linux knows.
> Can you send the output of:
> 
> ndctl read-labels -j all

The output is from a kernel built with the libnvdimm-pending branch plus
the one-line patch (label->flags &= ~NSLABEL_FLAG_LOCAL) in 
init_active_labels():

root@decui-gen2-u1904:~# ndctl read-labels -j all
[
  {
"dev":"nmem1",
"index":[
  {
"signature":"NAMESPACE_INDEX",
"major":1,
"minor":2,
"labelsize":256,
"seq":1,
"nslot":2
  },
  {
"signature":"NAMESPACE_INDEX",
"major":1,
"minor":2,
"labelsize":256,
"seq":2,
"nslot":2
  }
],
"label":[
  {
"uuid":"c258aaab-f72b-e546-bfa5-be5e07761dbc",
"name":"Microsoft Hyper-V NVDIMM 1 Label",
"slot":0,
"position":0,
"nlabel":1,
"isetcookie":708891662257476870,
"lbasize":0,
"dpa":0,
"rawsize":137438953472,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"----"
  }
]
  },
  {
"dev":"nmem0",
"index":[
  {
"signature":"NAMESPACE_INDEX",
"major":1,
"minor":2,
"labelsize":256,
"seq":1,
"nslot":2
  },
  {
"signature":"NAMESPACE_INDEX",
"major":1,
"minor":2,
"labelsize":256,
"seq":2,
"nslot":2
  }
],
"label":[
  {
"uuid":"9f0497a7-4453-7c40-ad35-21a791e00345",
"name":"Microsoft Hyper-V NVDIMM 0 Label",
"slot":0,
"position":0,
"nlabel":1,
"isetcookie":708891619307803909,
"lbasize":0,
"dpa":0,
"rawsize":34359738368,
"type_guid":"79d3f066-f3b4-7440-ac43-0d3318b78cdb",
"abstraction_guid":"----"
  }
]
  }
]
read 2 nmems

> > With Ubuntu 19.04 (4.18.0-11-generic), I get this:
> > (Note: the "mode" and "size" are correct. The "uuid" is different from
> > the above "9f0497a7-4453-7c40-ad35-21a791e00345" -- this is weird.)
> >
> > root@decui-gen2-u1904:~# ndctl list
> > [
> >   {
> > "dev":"namespace1.0",
> > "mode":"raw",
> > "size":137438953472,
> > "blockdev":"pmem1"
> >   },
> >   {
> > "dev":"namespace0.0",
> > "mode":"fsdax",
> > "map":"dev",
> > "size":33820770304,
> > "uuid":"35018886-397e-4fe7-a348-0a4d16eec44d",
> > "blockdev":"pmem0"
> >   }
> > ]
> 
> This is because the Ubuntu kernel has the bug that causes _LSR to fail
> so Linux falls back to a namespace defined by the region boundary. On
> that namespace there is an "fsdax" info block located at the region
> base +4K. That info block is tagged with the uuid of
> "35018886-397e-4fe7-a348-0a4d16eec44d".
Thanks for the explanation!
 
> > I'm trying to find out the correct solution. I apprecite your insights!
> 
> It's a mess. First we need to figure out whether the label is actually
> specifying a size of zero, or there is some other bug.
I agree.
 
> However, the next problem is going to be adding "fsdax" mode support
> on top of the read-only defined namespaces. The ndctl reconfiguration
> flow:
> 
>ndctl create-namespace -e namespace0.0 -m fsdax -f

> 
> ...will likely fail because deleting the previous namespace in the
> labels is part of that flow. It's always that labels are writable.
> 
> Honestly, the quickest path to something functional for Linux is to
> simply delete the _LSR support and use raw mode defined namespaces.
> Why have labels if they are read-only and the region is sufficient for
> defining boundaries?

Just now Hyper-V team confirmed _LSW is not supported.

But with Ubuntu 19.04 kernel (4.18.0-11-generic), I'm able to run the commands
without any issue:

root@decui-gen2-u1904:~# ndctl list
[
  {
"dev":"namespace1.0",
"mode":"raw",
"size":137438953472,
"blockdev":"pmem1"
  },
  {
"dev":"namespace0.0",
"mode":"fsdax",
"map":"dev",
"size":33820770304,
"uuid":"35018886-397e-4fe7-a348-0a4d16eec44d",
"blockdev":"pmem0"
  }
]

root@decui-gen2-u1904:~# ndctl destroy-namespace "namespace0.0"
  Error: namespace0.0 is active, specify --force for re-configuration

error destroying namespaces: Device or resource busy
destroyed 0 namespaces

root@decui-gen2-u1904:~# ndctl destroy-namespace "namespace0.0" --force
destroyed 1 namespace

root@decui-gen2-u1904:~# ndctl list
[
  {
"dev":"namespace1.0",
"mode":"raw",
"size":137438953472,
"blockdev":"pmem1"
  }
]

root@decui-gen2-u1904:~# ndctl create-namespace -e namespace0.0 -m fsdax -f
{
  "dev

RE: [PATCH v2] nfit: add Hyper-V NVDIMM DSM command set to white list

2019-02-01 Thread Dexuan Cui
> From: Dan Williams 
> Sent: Friday, February 1, 2019 5:29 PM
> ... 
> Honestly, the quickest path to something functional for Linux is to
> simply delete the _LSR support and use raw mode defined namespaces.
> Why have labels if they are read-only and the region is sufficient for
> defining boundaries?
Hyper-V Virtual NVDIMM can already work with Ubuntu 19.04 virtual machine
running on Hyper-V, i.e. I can create a raw or fsdax namesapce, and create a
xfs or ext4 file sysetm in /dev/pmem0p1, and mount the file system with and
without "-o dax". The basic functionality is good.

My recent work is mainly for ndctl support, i.e. get the health info by ndctl,
and use ndctl to monitor the error events (if applicable).

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] PCI: hv: Add hv_pci_remove_slots() when we unload the driver

2019-02-07 Thread Dexuan Cui


When we unload pci-hyperv, the host doesn't send us a PCI_EJECT message.
In this case we also need to make sure the sysfs pci slot directory
is removed, otherwise "cat /sys/bus/pci/slots/2/address" will trigger
"BUG: unable to handle kernel paging request". And, if we unload/reload
the driver several times, we'll have multiple pci slot directories in
/sys/bus/pci/slots/ like this:

root@localhost:~# ls -rtl  /sys/bus/pci/slots/
total 0
drwxr-xr-x 2 root root 0 Feb  7 10:49 2
drwxr-xr-x 2 root root 0 Feb  7 10:49 2-1
drwxr-xr-x 2 root root 0 Feb  7 10:51 2-2

The patch adds the missing code, and in hv_eject_device_work() it also
moves pci_destroy_slot() to an earlier place where we hold the pci lock.

Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot 
information")
Signed-off-by: Dexuan Cui 
Cc: sta...@vger.kernel.org
Cc: Stephen Hemminger 
---
 drivers/pci/controller/pci-hyperv.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index 9ba4d12c179c..6b4773727525 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1491,6 +1491,21 @@ static void hv_pci_assign_slots(struct hv_pcibus_device 
*hbus)
}
 }
 
+/*
+ * Remove entries in sysfs pci slot directory.
+ */
+static void hv_pci_remove_slots(struct hv_pcibus_device *hbus)
+{
+   struct hv_pci_dev *hpdev;
+
+   list_for_each_entry(hpdev, &hbus->children, list_entry) {
+   if (!hpdev->pci_slot)
+   continue;
+   pci_destroy_slot(hpdev->pci_slot);
+   hpdev->pci_slot = NULL;
+   }
+}
+
 /**
  * create_root_hv_pci_bus() - Expose a new root PCI bus
  * @hbus:  Root PCI bus, as understood by this driver
@@ -1887,6 +1902,10 @@ static void hv_eject_device_work(struct work_struct 
*work)
pci_lock_rescan_remove();
pci_stop_and_remove_bus_device(pdev);
pci_dev_put(pdev);
+   if (hpdev->pci_slot) {
+   pci_destroy_slot(hpdev->pci_slot);
+   hpdev->pci_slot = NULL;
+   }
pci_unlock_rescan_remove();
}
 
@@ -1894,9 +1913,6 @@ static void hv_eject_device_work(struct work_struct *work)
list_del(&hpdev->list_entry);
spin_unlock_irqrestore(&hpdev->hbus->device_list_lock, flags);
 
-   if (hpdev->pci_slot)
-   pci_destroy_slot(hpdev->pci_slot);
-
memset(&ctxt, 0, sizeof(ctxt));
ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message;
ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
@@ -2682,6 +2698,7 @@ static int hv_pci_remove(struct hv_device *hdev)
pci_lock_rescan_remove();
pci_stop_root_bus(hbus->pci_bus);
pci_remove_root_bus(hbus->pci_bus);
+   hv_pci_remove_slots(hbus);
pci_unlock_rescan_remove();
hbus->state = hv_pcibus_removed;
}
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] PCI: hv: Add hv_pci_remove_slots() when we unload the driver

2019-02-12 Thread Dexuan Cui
> From: Lorenzo Pieralisi 
> Sent: Tuesday, February 12, 2019 4:13 AM
>  ...
> This patch fixes three bugs:
> 
> 1) set hpdev->pci_slot to NULL
> 2) move code destroying the slot inside a locked region in
>hv_eject_device_work()
> 3) Add missing slots removal code in hv_pci_remove()
> 
> We need three patches, not one.
> 
> (1) and (2), I am not entirely sure we want them in stable kernels,
> since they are potential bugs, waiting for your input.
> 
> Lorenzo

(1) is actually unnecessary, as I suppose hpdev should be freed at a later
place in the same function hv_eject_device_work -> put_pcichild() -> 
kfree(hpdev).
But today I think I found a refcount bug in the hot-remove case and the 
"kfree(hpdev)"
is never called in the hot-remove case. I'll further dig into this and make 
some extra
patches.

About (2), it's a race condition that can happen when the device is being 
hot-removed
and we're unloading the pci-hyperv driver at the same time. This is not a 
normal usage,
so I agree it doesn't really need to go into the stables.

(3) should go into the stables.

I'll make 3 separate patches, and extra patches for the refcount issue, and 
possible other
minor issues.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 3/3] PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary

2019-03-04 Thread Dexuan Cui
When we hot-remove a device, usually the host sends us a PCI_EJECT message,
and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0. But when
we do the quick hot-add/hot-remove test, the host may not send us the
PCI_EJECT message, if the guest has not fully finished the initialization
by sending the PCI_RESOURCES_ASSIGNED* message to the host, so it's
potentially unsafe to only depend on the pci_destroy_slot() in
hv_eject_device_work(), though create_root_hv_pci_bus() ->
hv_pci_assign_slots() is not called in this case. Note: in this case, the
host still sends the guest a PCI_BUS_RELATIONS message with
bus_rel->device_count == 0.

And, in the quick hot-add/hot-remove test, we can have such a race: before
pci_devices_present_work() -> new_pcichild_device() adds the new device
into hbus->children, we may have already received the PCI_EJECT message,
and hence the taklet handler hv_pci_onchannelcallback() may fail to find
the "hpdev" by get_pcichild_wslot(hbus, dev_message->wslot.slot), so
hv_pci_eject_device() is NOT called; later create_root_hv_pci_bus() ->
hv_pci_assign_slots() creates the slot, and the PCI_BUS_RELATIONS message
with bus_rel->device_count == 0 removes the device from hbus->children, and
we end up being unable to remove the slot in hv_pci_remove() ->
hv_pci_remove_slots().

The patch removes the slot in pci_devices_present_work() when the device
is removed. This can address the above race. Note 1:
pci_devices_present_work() and hv_eject_device_work() run in the
singled-threaded hbus->wq, so there is not a double-remove issue for the
slot. Note 2: we can't offload hv_pci_eject_device() from
hv_pci_onchannelcallback() to the workqueue, because we need
hv_pci_onchannelcallback() synchronously call hv_pci_eject_device() to
poll the channel's ringbuffer to work around the
"hangs in hv_compose_msi_msg()" issue: see
commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")

Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot 
information")
Signed-off-by: Dexuan Cui 
Cc: sta...@vger.kernel.org
---
 drivers/pci/controller/pci-hyperv.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index b489412e3502..82acd6155adf 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1776,6 +1776,10 @@ static void pci_devices_present_work(struct work_struct 
*work)
hpdev = list_first_entry(&removed, struct hv_pci_dev,
 list_entry);
list_del(&hpdev->list_entry);
+
+   if (hpdev->pci_slot)
+   pci_destroy_slot(hpdev->pci_slot);
+
put_pcichild(hpdev);
}
 
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 2/3] PCI: hv: Add hv_pci_remove_slots() when we unload the driver

2019-03-04 Thread Dexuan Cui
When we unload pci-hyperv, the host doesn't send us a PCI_EJECT message.
In this case we also need to make sure the sysfs pci slot directory
is removed, otherwise "cat /sys/bus/pci/slots/2/address" will trigger
"BUG: unable to handle kernel paging request" (I noticed the issue when
systemd-dev crashed for me when I unloaded the driver). And, if we
unload/reload the driver several times, we'll have multiple pci slot
directories in /sys/bus/pci/slots/ like this:

root@localhost:~# ls -rtl  /sys/bus/pci/slots/
total 0
drwxr-xr-x 2 root root 0 Feb  7 10:49 2
drwxr-xr-x 2 root root 0 Feb  7 10:49 2-1
drwxr-xr-x 2 root root 0 Feb  7 10:51 2-2

The patch adds the missing code.

Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot 
information")
Signed-off-by: Dexuan Cui 
Acked-by: Stephen Hemminger 
Cc: sta...@vger.kernel.org
---
 drivers/pci/controller/pci-hyperv.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index 30f16b882746..b489412e3502 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1486,6 +1486,21 @@ static void hv_pci_assign_slots(struct hv_pcibus_device 
*hbus)
}
 }
 
+/*
+ * Remove entries in sysfs pci slot directory.
+ */
+static void hv_pci_remove_slots(struct hv_pcibus_device *hbus)
+{
+   struct hv_pci_dev *hpdev;
+
+   list_for_each_entry(hpdev, &hbus->children, list_entry) {
+   if (!hpdev->pci_slot)
+   continue;
+   pci_destroy_slot(hpdev->pci_slot);
+   hpdev->pci_slot = NULL;
+   }
+}
+
 /**
  * create_root_hv_pci_bus() - Expose a new root PCI bus
  * @hbus:  Root PCI bus, as understood by this driver
@@ -2680,6 +2695,7 @@ static int hv_pci_remove(struct hv_device *hdev)
pci_lock_rescan_remove();
pci_stop_root_bus(hbus->pci_bus);
pci_remove_root_bus(hbus->pci_bus);
+   hv_pci_remove_slots(hbus);
pci_unlock_rescan_remove();
hbus->state = hv_pcibus_removed;
}
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 0/3] pci-hyperv: fix memory leak and add pci_destroy_slot()

2019-03-04 Thread Dexuan Cui
Patch #1 fixes a memory leak caused by incorrectly-maintained hpdev->refs.

Patch #2 and #3 make sure the "slot" is removed in all the scenarios.
Without them, in the quick hot-add/hot-remove test, systemd-dev may easily
crash when trying to access a dangling sys file in /sys/bus/pci/slots/:
"BUG: unable to handle kernel paging request".

BTW, Patch #2 was posted on Feb 7, 2019, and this is the v2: the change
to hv_eject_device_work() in v1 is removed, as the change is only needed
when we hot-remove the device and remove the pci-hyperv driver at the 
same time. It looks more work is required to make this scenaro work
correctly, and since removing the driver is not really a "usual" usage,
we can address this scenario in the future.

Please review the patchset.

Dexuan Cui (3):
  PCI: hv: Fix a memory leak in hv_eject_device_work()
  PCI: hv: Add hv_pci_remove_slots() when we unload the driver
  PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if
necessary

 drivers/pci/controller/pci-hyperv.c | 23 +++
 1 file changed, 23 insertions(+)

-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH 1/3] PCI: hv: Fix a memory leak in hv_eject_device_work()

2019-03-04 Thread Dexuan Cui
After a device is just created in new_pcichild_device(), hpdev->refs is set
to 2 (i.e. the initial value of 1 plus the get_pcichild()).

When we hot remove the device from the host, in Linux VM we first call
hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and
then schedules a work of hv_eject_device_work(), so hpdev->refs becomes 3
(let's ignore the paired get/put_pcichild() in other places). But in
hv_eject_device_work(), currently we only call put_pcichild() twice,
meaning the 'hpdev' struct can't be freed in put_pcichild(). This patch
adds one put_pcichild() to fix the memory leak.

BTW, the device can also be removed when we run "rmmod pci-hyperv". On this
path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()),
hpdev->refs is 2, and we do correctly call put_pcichild() twice in
pci_devices_present_work().

Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft 
Hyper-V VMs")
Signed-off-by: Dexuan Cui 
Cc: 
---
 drivers/pci/controller/pci-hyperv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index 95441a35eceb..30f16b882746 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1900,6 +1900,9 @@ static void hv_eject_device_work(struct work_struct *work)
 sizeof(*ejct_pkt), (unsigned long)&ctxt.pkt,
 VM_PKT_DATA_INBAND, 0);
 
+   /* For the get_pcichild() in hv_pci_eject_device() */
+   put_pcichild(hpdev);
+   /* For the two refs got in new_pcichild_device() */
put_pcichild(hpdev);
put_pcichild(hpdev);
put_hvpcibus(hpdev->hbus);
-- 
2.19.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 1/3] PCI: hv: Fix a memory leak in hv_eject_device_work()

2019-03-20 Thread Dexuan Cui
> From: Michael Kelley 
> Sent: Wednesday, March 20, 2019 2:38 PM
> 
> From: Dexuan Cui 
> >
> > After a device is just created in new_pcichild_device(), hpdev->refs is set
> > to 2 (i.e. the initial value of 1 plus the get_pcichild()).
> >
> > When we hot remove the device from the host, in Linux VM we first call
> > hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and
> > then schedules a work of hv_eject_device_work(), so hpdev->refs becomes 3
> > (let's ignore the paired get/put_pcichild() in other places). But in
> > hv_eject_device_work(), currently we only call put_pcichild() twice,
> > meaning the 'hpdev' struct can't be freed in put_pcichild(). This patch
> > adds one put_pcichild() to fix the memory leak.
> >
> > BTW, the device can also be removed when we run "rmmod pci-hyperv". On
> this
> > path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()),
> > hpdev->refs is 2, and we do correctly call put_pcichild() twice in
> > pci_devices_present_work().
> 
> Exiting new_pcichild_device() with hpdev->refs set to 2 seems OK to me.
> There is the reference in the hbus->children list, and there is the reference 
> that
> is returned to the caller.  
So IMO the "normal" reference count should be 2. :-) IMO only when a hv_pci_dev
device is about to be destroyed, its reference count can drop to less than 2, 
i.e. first temporarily drop to 1 (meaning the hv_pci_dev device is removed from
hbus->children), and then drop to zero (meaning kfree(hpdev) is called).

> But what is strange is that pci_devices_present_work()
> overwrites the reference returned in local variable hpdev without doing a
> put_pcichild().
I suppose you mean:

/* First, mark all existing children as reported missing. */
spin_lock_irqsave(&hbus->device_list_lock, flags);
list_for_each_entry(hpdev, &hbus->children, list_entry) {
hpdev->reported_missing = true;
}
spin_unlock_irqrestore(&hbus->device_list_lock, flags)

This is not strange to me, because, in pci_devices_present_work(), at first we
don't know which devices are about to disappear, so we pre-mark all devices to
be potentially missing like that; if a device is still on the bus, we'll mark 
its
hpdev->reported_missing to false later; only after we know exactly which
devices are missing, we should call put_pcichild() against them. All these
seem natural to me.

> It seems like the "normal" reference count should be 1 when the
> child device is not being manipulated, not 2.
What does "not being manipulated" mean?

> The fix would be to add a call to
> put_pcichild() when the return value from new_pcichild_device() is
> overwritten.
In pci_devices_present_work(), we NEVER "overwrite" the "hpdev" returned
from new_pcichild_device(): the "reported_missing" field of the new hpdev
is implicitly initialized to false in new_pcichild_device().

> Then remove the call to put_pcichild() in pci_device_present_work() when
> missing
> children are moved to the local list. The children have been moved from one
> list
> to another, so there's no need to decrement the reference count.  Then when
> everything in the local list is deleted, the reference is correctly 
> decremented,
> presumably freeing the memory.
> 
> With this approach, the code in hv_eject_device_work() is correct.  There's
> one call to put_pcichild() to reflect removing the child device from the 
> hbus->
> children list, and one call to put_pcichild() to pair with the get_pcichild() 
> in
> hv_pci_eject_device().
Please refer to my replies above. IMO we should fix
hv_eject_device_work() rather than pci_devices_present_work().

Thanks
-- Dexuan
 
> Your patch works, but to me it leaves the ref count in an unnatural state
> most of the time.
> 
> Michael

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 3/3] PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary

2019-03-20 Thread Dexuan Cui
> From: Michael Kelley 
> Sent: Wednesday, March 20, 2019 2:44 PM
> To: Dexuan Cui ; lorenzo.pieral...@arm.com;
> bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan
> > ...
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > @@ -1776,6 +1776,10 @@ static void pci_devices_present_work(struct
> work_struct *work)
> > hpdev = list_first_entry(&removed, struct hv_pci_dev,
> >  list_entry);
> > list_del(&hpdev->list_entry);
> > +
> > +   if (hpdev->pci_slot)
> > +   pci_destroy_slot(hpdev->pci_slot);
> 
> The code is inconsistent in whether hpdev->pci_slot is set to NULL after 
> calling
> pci_destory_slot(). 
Here, in pci_devices_present_work(), it's unnecessary to set it to NULL,
Because:
1) the "hpdev" is removed from hbus->children and it can not be seen
elsewhere;
2) the "hpdev" struct is freed in the below put_pcichild():

 while (!list_empty(&removed)) {
hpdev = list_first_entry(&removed, struct hv_pci_dev,
 list_entry);
list_del(&hpdev->list_entry);

if (hpdev->pci_slot)
pci_destroy_slot(hpdev->pci_slot);

put_pcichild(hpdev);
}

> Patch 2 in this series does set it to NULL, but this code does not.
In Patch2, i.e. in the code path hv_pci_remove() -> hv_pci_remove_slots(),
we must set hpdev->pci_slot to NULL, otherwise, later, due to
hv_pci_remove() -> hv_pci_bus_exit() ->
hv_pci_devices_present() with the zero "relations", we'll double-free the
"hpdev" struct in pci_devices_present_work() -- see the above.

> And the code in hv_eject_device_work() does not set it to NULL.
It's unnecessary to set hpdev->pci_slot to NULL in hv_eject_device_work(),
Because in hv_eject_device_work(): 
1) the "hpdev" is removed from hbus->children and it can not be seen
elsewhere;
2) the "hpdev" struct is freed at the end of hv_eject_device_work() with my
first patch: [PATCH 1/3] PCI: hv: Fix a memory leak in hv_eject_device_work().
 
> It looks like all the places that test the value of hpdev->pci_slot or call
> pci_destroy_slot() are serialized, so it looks like it really doesn't matter. 
>  But
> when
> the code is inconsistent about setting to NULL, it always makes me wonder if
> there
> is a reason.
> 
> Michael

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 3/3] PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary

2019-03-20 Thread Dexuan Cui
> From: linux-hyperv-ow...@vger.kernel.org
>  On Behalf Of Dexuan Cui
> > ...
> > Patch 2 in this series does set it to NULL, but this code does not.
> In Patch2, i.e. in the code path hv_pci_remove() -> hv_pci_remove_slots(),
> we must set hpdev->pci_slot to NULL, otherwise, later, due to
> hv_pci_remove() -> hv_pci_bus_exit() ->
> hv_pci_devices_present() with the zero "relations", we'll double-free the
> "hpdev" struct in pci_devices_present_work() -- see the above.
A minor correction: I meant we'll call pci_destroy_slot(hpdev->pci_slot)
twice, not "double-free hpdev".

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 1/3] PCI: hv: Fix a memory leak in hv_eject_device_work()

2019-03-26 Thread Dexuan Cui
> From: Michael Kelley 
> Sent: Tuesday, March 26, 2019 10:47 AM
> To: Lorenzo Pieralisi ; Dexuan Cui
> 
> Cc: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan
> ; Stephen Hemminger ;
> Sasha Levin ; linux-hyp...@vger.kernel.org;
> linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Haiyang
> Zhang ; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; vkuznets ;
> marcelo.ce...@canonical.com; ja...@mellanox.com; sta...@vger.kernel.org
> Subject: RE: [PATCH 1/3] PCI: hv: Fix a memory leak in hv_eject_device_work()
> 
> From: Lorenzo Pieralisi   Sent: Tuesday, March 26,
> 2019 10:09 AM
> > On Thu, Mar 21, 2019 at 12:12:03AM +, Dexuan Cui wrote:
> > > > From: Michael Kelley 
> > > > Sent: Wednesday, March 20, 2019 2:38 PM
> > > >
> > > > From: Dexuan Cui 
> > > > >
> > > > > After a device is just created in new_pcichild_device(), hpdev->refs 
> > > > > is
> set
> > > > > to 2 (i.e. the initial value of 1 plus the get_pcichild()).
> > > > >
> > > > > When we hot remove the device from the host, in Linux VM we first call
> > > > > hv_pci_eject_device(), which increases hpdev->refs by get_pcichild()
> and
> > > > > then schedules a work of hv_eject_device_work(), so hpdev->refs
> becomes 3
> > > > > (let's ignore the paired get/put_pcichild() in other places). But in
> > > > > hv_eject_device_work(), currently we only call put_pcichild() twice,
> > > > > meaning the 'hpdev' struct can't be freed in put_pcichild(). This 
> > > > > patch
> > > > > adds one put_pcichild() to fix the memory leak.
> > > > >
> > > > > BTW, the device can also be removed when we run "rmmod pci-hyperv".
> On
> > > > this
> > > > > path (hv_pci_remove() -> hv_pci_bus_exit() ->
> hv_pci_devices_present()),
> > > > > hpdev->refs is 2, and we do correctly call put_pcichild() twice in
> > > > > pci_devices_present_work().
> > > >
> > > > Exiting new_pcichild_device() with hpdev->refs set to 2 seems OK to me.
> > > > There is the reference in the hbus->children list, and there is the
> reference that
> > > > is returned to the caller.
> > > So IMO the "normal" reference count should be 2. :-) IMO only when a
> hv_pci_dev
> > > device is about to be destroyed, its reference count can drop to less 
> > > than 2,
> > > i.e. first temporarily drop to 1 (meaning the hv_pci_dev device is removed
> from
> > > hbus->children), and then drop to zero (meaning kfree(hpdev) is called).
> > >
> > > > But what is strange is that pci_devices_present_work()
> > > > overwrites the reference returned in local variable hpdev without doing 
> > > > a
> > > > put_pcichild().
> > > I suppose you mean:
> > >
> > > /* First, mark all existing children as reported missing. */
> > > spin_lock_irqsave(&hbus->device_list_lock, flags);
> > > list_for_each_entry(hpdev, &hbus->children, list_entry) {
> > > hpdev->reported_missing = true;
> > > }
> > > spin_unlock_irqrestore(&hbus->device_list_lock, flags)
> > >
> > > This is not strange to me, because, in pci_devices_present_work(), at 
> > > first
> we
> > > don't know which devices are about to disappear, so we pre-mark all
> devices to
> > > be potentially missing like that; if a device is still on the bus, we'll 
> > > mark its
> > > hpdev->reported_missing to false later; only after we know exactly which
> > > devices are missing, we should call put_pcichild() against them. All these
> > > seem natural to me.
> > >
> > > > It seems like the "normal" reference count should be 1 when the
> > > > child device is not being manipulated, not 2.
> > > What does "not being manipulated" mean?
> > >
> > > > The fix would be to add a call to
> > > > put_pcichild() when the return value from new_pcichild_device() is
> > > > overwritten.
> > > In pci_devices_present_work(), we NEVER "overwrite" the "hpdev"
> returned
> > > from new_pcichild_device(): the "reported_missing" field of the new hpdev
> > > is implicitly initialized to false in new_pcichild_device().
>

RE: [PATCH 3/3] PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary

2019-03-26 Thread Dexuan Cui
> From: Lorenzo Pieralisi 
> Sent: Tuesday, March 26, 2019 12:55 PM
> On Mon, Mar 04, 2019 at 09:34:49PM +0000, Dexuan Cui wrote:
> > When we hot-remove a device, usually the host sends us a PCI_EJECT
> message,
> > and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0. But
> when
> > we do the quick hot-add/hot-remove test, the host may not send us the
> > PCI_EJECT message, if the guest has not fully finished the initialization
> > by sending the PCI_RESOURCES_ASSIGNED* message to the host, so it's
> > potentially unsafe to only depend on the pci_destroy_slot() in
> > hv_eject_device_work(), though create_root_hv_pci_bus() ->
> > hv_pci_assign_slots() is not called in this case. Note: in this case, the
> > host still sends the guest a PCI_BUS_RELATIONS message with
> > bus_rel->device_count == 0.
> >
> > And, in the quick hot-add/hot-remove test, we can have such a race: before
> > pci_devices_present_work() -> new_pcichild_device() adds the new device
> > into hbus->children, we may have already received the PCI_EJECT message,
> > and hence the taklet handler hv_pci_onchannelcallback() may fail to find
> > the "hpdev" by get_pcichild_wslot(hbus, dev_message->wslot.slot), so
> > hv_pci_eject_device() is NOT called; later create_root_hv_pci_bus() ->
> > hv_pci_assign_slots() creates the slot, and the PCI_BUS_RELATIONS message
> > with bus_rel->device_count == 0 removes the device from hbus->children,
> and
> > we end up being unable to remove the slot in hv_pci_remove() ->
> > hv_pci_remove_slots().
> >
> > The patch removes the slot in pci_devices_present_work() when the device
> > is removed. This can address the above race. Note 1:
> > pci_devices_present_work() and hv_eject_device_work() run in the
> > singled-threaded hbus->wq, so there is not a double-remove issue for the
> > slot. Note 2: we can't offload hv_pci_eject_device() from
> > hv_pci_onchannelcallback() to the workqueue, because we need
> > hv_pci_onchannelcallback() synchronously call hv_pci_eject_device() to
> > poll the channel's ringbuffer to work around the
> > "hangs in hv_compose_msi_msg()" issue: see
> > commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in
> hv_compose_msi_msg()")
> 
> This commit log is unreadable, sorry. Indentation, punctuation and
> formatting are just a mess, try to read it, you will notice by
> yourself.
> 
> I basically reformatted it completely and pushed the series to
> pci/controller-fixes but that's the last time I do it since I am not an
> editor, next time I won't merge it.

Hi Lorenzo,
Thank you for helping improve my changelogs! I did learn a lot after
carefully comparing the improved version with my original version. :-)

I'll try my best to write a good changelog for my future patches.

> More importantly, these patches are marked for stable, given the series
> of fixes that triggered this series please ensure it was tested
> thoroughly because it is honestly complicate to understand and I do not
> want to backport further fixes to stable kernels on top of this.

I did the hot-add/hot-remove test in a loop for several thousand times,
and the patchset worked as expected and didn't show any issue.

> Please have a look and report back.
> 
> Thanks,
> Lorenzo

Thanks again!

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier

2019-08-01 Thread Dexuan Cui


When a slot is removed, the pci_dev must still exist.

pci_remove_root_bus() removes and free all the pci_devs, so
hv_pci_remove_slots() must be called before pci_remove_root_bus(),
otherwise a general protection fault can happen, if the kernel is built
with the memory debugging options.

Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the 
driver")
Signed-off-by: Dexuan Cui 
Cc: sta...@vger.kernel.org

---

When pci-hyperv is unloaded, this panic can happen:

 general protection fault:
 CPU: 2 PID: 1091 Comm: rmmod Not tainted 5.2.0+
 RIP: 0010:pci_slot_release+0x30/0xd0
 Call Trace:
  kobject_release+0x65/0x190
  pci_destroy_slot+0x25/0x60
  hv_pci_remove+0xec/0x110 [pci_hyperv]
  vmbus_remove+0x20/0x30 [hv_vmbus]
  device_release_driver_internal+0xd5/0x1b0
  driver_detach+0x44/0x7c
  bus_remove_driver+0x75/0xc7
  vmbus_driver_unregister+0x50/0xbd [hv_vmbus]
  __x64_sys_delete_module+0x136/0x200
  do_syscall_64+0x5e/0x220

 drivers/pci/controller/pci-hyperv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index 6b9cc6e60a..68c611d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
/* Remove the bus from PCI's point of view. */
pci_lock_rescan_remove();
pci_stop_root_bus(hbus->pci_bus);
-   pci_remove_root_bus(hbus->pci_bus);
hv_pci_remove_slots(hbus);
+   pci_remove_root_bus(hbus->pci_bus);
pci_unlock_rescan_remove();
hbus->state = hv_pcibus_removed;
}
-- 
1.8.3.1



RE: [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier

2019-08-02 Thread Dexuan Cui
> From: Bjorn Helgaas 
> Sent: Friday, August 2, 2019 12:41 PM
> The subject line only describes the mechanical code change, which is
> obvious from the patch.  It would be better if we could say something
> about *why* we need this.

Hi Bjorn,
Sorry. I'll try to write a better changelog in v2. :-)
 
> On Fri, Aug 02, 2019 at 01:32:28AM +, Dexuan Cui wrote:
> >
> > When a slot is removed, the pci_dev must still exist.
> >
> > pci_remove_root_bus() removes and free all the pci_devs, so
> > hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> > otherwise a general protection fault can happen, if the kernel is built
> 
> "general protection fault" is an x86 term that doesn't really say what
> the issue is.  I suspect this would be a "use-after-free" problem.

Yes, it's use-after-free. I'll fix the the wording.
 
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
> > /* Remove the bus from PCI's point of view. */
> > pci_lock_rescan_remove();
> > pci_stop_root_bus(hbus->pci_bus);
> > -   pci_remove_root_bus(hbus->pci_bus);
> > hv_pci_remove_slots(hbus);
> > +   pci_remove_root_bus(hbus->pci_bus);
> 
> I'm curious about why we need hv_pci_remove_slots() at all.  None of
> the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
> anything similar to hv_pci_remove_slots().
> 
> Surely some of those callers also support slots, so there must be some
> other path that calls pci_destroy_slot() in those cases.  Can we use a
> similar strategy here?

Originally Stephen Heminger added the slot code for pci-hyperv.c:
a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
So he may know this better. My understanding is: we can not use the similar
stragegy used in the 2 other users of pci_create_slot():

drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot().
It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because
pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible
to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot().

drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in
the static "slot_list" list in the same file. Again, since pci-hyper-v uses a 
private
PCI-device-discovery protocol (which is based on VMBus rather the emulated
ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are
discovered by pci-hyperv, so we can not use the standard register_slot() ->
pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> 
pci_destroy_slot() can not work for pci-hyperv.

I think I can use this as the v2 changelog:

The slot must be removed before the pci_dev is removed, otherwise a panic
can happen due to use-after-free.

Thanks,
Dexuan


[PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier

2019-08-02 Thread Dexuan Cui


The slot must be removed before the pci_dev is removed, otherwise a panic
can happen due to use-after-free.

Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the 
driver")
Signed-off-by: Dexuan Cui 
Cc: sta...@vger.kernel.org
---

Changes in v2:
  Improved the changelog accordign to the discussion with Bjorn Helgaas:
  https://lkml.org/lkml/2019/8/1/1173
  https://lkml.org/lkml/2019/8/2/1559

 drivers/pci/controller/pci-hyperv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index 6b9cc6e60a..68c611d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
/* Remove the bus from PCI's point of view. */
pci_lock_rescan_remove();
pci_stop_root_bus(hbus->pci_bus);
-   pci_remove_root_bus(hbus->pci_bus);
hv_pci_remove_slots(hbus);
+   pci_remove_root_bus(hbus->pci_bus);
pci_unlock_rescan_remove();
hbus->state = hv_pcibus_removed;
}
-- 
1.8.3.1



RE: [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier

2019-08-06 Thread Dexuan Cui
> From: linux-hyperv-ow...@vger.kernel.org
>  On Behalf Of Bjorn Helgaas
> Sent: Tuesday, August 6, 2019 1:16 PM
> To: Dexuan Cui 
> 
> Thanks for updating this.  But you didn't update the subject line,
> which is really still a little too low-level.  Maybe Lorenzo will fix
> this.  Something like this, maybe?
> 
>   PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it

This is better. Thanks!

I hope Lorenzo can help to fix this so I could avoid a v3. :-)

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] PCI: PM: Also move to D0 before calling pci_legacy_resume_early()

2019-08-08 Thread Dexuan Cui


In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
but the current code misses the pci_legacy_resume_early() path, so the
state remains in PCI_UNKNOWN in that path, and during hiberantion this
causes an error for the Mellanox VF driver, which fails to enable
MSI-X: pci_msi_supported() is false due to dev->current_state != PCI_D0:

mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
mlx4_core a6d1:00:02.0: Sending reset
mlx4_core a6d1:00:02.0: Sending vhcr0
mlx4_core a6d1:00:02.0: HCA minimum page size:512
mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
PM: Device a6d1:00:02.0 failed to thaw: error -95

Signed-off-by: Dexuan Cui 
---
 drivers/pci/pci-driver.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 36dbe960306b..27dfc68db9e7 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
return error;
}
 
-   if (pci_has_legacy_pm_support(pci_dev))
-   return pci_legacy_resume_early(dev);
-
/*
 * pci_restore_state() requires the device to be in D0 (because of MSI
 * restoration among other things), so force it into D0 in case the
 * driver's "freeze" callbacks put it into a low-power state directly.
 */
pci_set_power_state(pci_dev, PCI_D0);
+
+   if (pci_has_legacy_pm_support(pci_dev))
+   return pci_legacy_resume_early(dev);
+
pci_restore_state(pci_dev);
 
if (drv && drv->pm && drv->pm->thaw_noirq)
-- 
2.19.1



RE: [PATCH] PCI: PM: Also move to D0 before calling pci_legacy_resume_early()

2019-08-08 Thread Dexuan Cui
> From: Bjorn Helgaas 
> Sent: Thursday, August 8, 2019 12:19 PM
> To: Dexuan Cui 
> 
> On Thu, Aug 08, 2019 at 06:46:51PM +0000, Dexuan Cui wrote:
> >
> > In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> > In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> > but the current code misses the pci_legacy_resume_early() path, so the
> > state remains in PCI_UNKNOWN in that path, and during hiberantion this
> > causes an error for the Mellanox VF driver, which fails to enable
> > MSI-X: pci_msi_supported() is false due to dev->current_state != PCI_D0:
> 
> s/hiberantion/hibernation/

Thanks for spoting this typo! :-)
 
> Actually, it sounds more like "during *resume*, this causes an error",
> so maybe you want s/hiberantion/resume/ instead?

Yes, it's during "resume", and to be more accurate, it happens during the
"resume" of the "thaw" phase: when we run "echo disk > /sys/power/state",
first the kernel "freezes" all the devices and create a hibernation image, then
the kernel "thaws" the devices including the disk/NIC, and writes the memory
to the disk and powers down. This patch fixes the error message for the
Mellanox VF driver in this phase. 

When the system starts again, a fresh kernel starts to run, and when the kernel
detects that a hibernation image was saved, the kernel "quiesces" the devices,
and "restores" the devices from the saved image. Here device_resume_noirq()
-> ... -> pci_pm_restore_noirq() -> pci_pm_default_resume_early() ->
pci_power_up() moves the device states back to PCI_D0. This path is not broken
and doesn't need my patch. 

Thanks,
-- Dexuan


[PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()

2019-08-13 Thread Dexuan Cui


In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.

In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
but the current code misses the pci_legacy_resume_early() path, so the
state remains in PCI_UNKNOWN in that path. As a result, in the resume
phase of hibernation, this causes an error for the Mellanox VF driver,
which fails to enable MSI-X because pci_msi_supported() is false due
to dev->current_state != PCI_D0:

mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
mlx4_core a6d1:00:02.0: Sending reset
mlx4_core a6d1:00:02.0: Sending vhcr0
mlx4_core a6d1:00:02.0: HCA minimum page size:512
mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
PM: Device a6d1:00:02.0 failed to thaw: error -95

To be more accurate, the "resume" phase means the "thaw" callbacks which
run before the system enters hibernation: when the user runs the command
"echo disk > /sys/power/state" for hibernation, first the kernel "freezes"
all the devices and creates a hibernation image, then the kernel "thaws"
the devices including the disk/NIC, writes the memory to the disk, and
powers down. This patch fixes the error message for the Mellanox VF driver
in this phase.

When the system starts again, a fresh kernel starts to run, and when the
kernel detects that a hibernation image was saved, the kernel "quiesces"
the devices, and then "restores" the devices from the saved image. In this
path:
device_resume_noirq() -> ... ->
  pci_pm_restore_noirq() ->
pci_pm_default_resume_early() ->
  pci_power_up() moves the device states back to PCI_D0. This path is
not broken and doesn't need my patch.

Signed-off-by: Dexuan Cui 
---

changes in v2:
Updated the changelog with more details.

 drivers/pci/pci-driver.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 36dbe960306b..27dfc68db9e7 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
return error;
}
 
-   if (pci_has_legacy_pm_support(pci_dev))
-   return pci_legacy_resume_early(dev);
-
/*
 * pci_restore_state() requires the device to be in D0 (because of MSI
 * restoration among other things), so force it into D0 in case the
 * driver's "freeze" callbacks put it into a low-power state directly.
 */
pci_set_power_state(pci_dev, PCI_D0);
+
+   if (pci_has_legacy_pm_support(pci_dev))
+   return pci_legacy_resume_early(dev);
+
pci_restore_state(pci_dev);
 
if (drv && drv->pm && drv->pm->thaw_noirq)
-- 
2.19.1



RE: [PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()

2019-09-02 Thread Dexuan Cui
> From: Dexuan Cui 
> Sent: Tuesday, August 13, 2019 6:07 PM
> To: lorenzo.pieral...@arm.com; bhelg...@google.com;
> linux-...@vger.kernel.org
> Cc: Michael Kelley ; linux-hyp...@vger.kernel.org;
> linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Sasha
> Levin ; Haiyang Zhang
> ; KY Srinivasan ;
> o...@aepfle.de; a...@canonical.com; jasow...@redhat.com; vkuznets
> ; marcelo.ce...@canonical.com; Stephen Hemminger
> ; ja...@mellanox.com; Dexuan Cui
> 
> Subject: [PATCH v2] PCI: PM: Move to D0 before calling
> pci_legacy_resume_early()
> 
> 
> In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> 
> In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> but the current code misses the pci_legacy_resume_early() path, so the
> state remains in PCI_UNKNOWN in that path. As a result, in the resume
> phase of hibernation, this causes an error for the Mellanox VF driver,
> which fails to enable MSI-X because pci_msi_supported() is false due
> to dev->current_state != PCI_D0:
> 
> mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
> mlx4_core a6d1:00:02.0: Sending reset
> mlx4_core a6d1:00:02.0: Sending vhcr0
> mlx4_core a6d1:00:02.0: HCA minimum page size:512
> mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
> mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode,
> aborting
> PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
> PM: Device a6d1:00:02.0 failed to thaw: error -95
> 
> To be more accurate, the "resume" phase means the "thaw" callbacks which
> run before the system enters hibernation: when the user runs the command
> "echo disk > /sys/power/state" for hibernation, first the kernel "freezes"
> all the devices and creates a hibernation image, then the kernel "thaws"
> the devices including the disk/NIC, writes the memory to the disk, and
> powers down. This patch fixes the error message for the Mellanox VF driver
> in this phase.
> 
> When the system starts again, a fresh kernel starts to run, and when the
> kernel detects that a hibernation image was saved, the kernel "quiesces"
> the devices, and then "restores" the devices from the saved image. In this
> path:
> device_resume_noirq() -> ... ->
>   pci_pm_restore_noirq() ->
> pci_pm_default_resume_early() ->
>   pci_power_up() moves the device states back to PCI_D0. This path is
> not broken and doesn't need my patch.
> 
> Signed-off-by: Dexuan Cui 
> ---
> 
> changes in v2:
>   Updated the changelog with more details.
> 
>  drivers/pci/pci-driver.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 36dbe960306b..27dfc68db9e7 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
>   return error;
>   }
> 
> - if (pci_has_legacy_pm_support(pci_dev))
> - return pci_legacy_resume_early(dev);
> -
>   /*
>* pci_restore_state() requires the device to be in D0 (because of MSI
>* restoration among other things), so force it into D0 in case the
>* driver's "freeze" callbacks put it into a low-power state directly.
>*/
>   pci_set_power_state(pci_dev, PCI_D0);
> +
> + if (pci_has_legacy_pm_support(pci_dev))
> + return pci_legacy_resume_early(dev);
> +
>   pci_restore_state(pci_dev);
> 
>   if (drv && drv->pm && drv->pm->thaw_noirq)
> --

Hi, Lorenzo, Bjorn,

Can you please take a look at the v2 ?

Thanks,
-- Dexuan


RE: [PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()

2019-09-25 Thread Dexuan Cui
> From: devel  On Behalf Of
> Dexuan Cui
> Sent: Monday, September 2, 2019 5:35 PM
> To: lorenzo.pieral...@arm.com; bhelg...@google.com;
> linux-...@vger.kernel.org
> [..snipped...]
> > ---
> >
> > changes in v2:
> > Updated the changelog with more details.
> >
> >  drivers/pci/pci-driver.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 36dbe960306b..27dfc68db9e7 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device
> *dev)
> > return error;
> > }
> >
> > -   if (pci_has_legacy_pm_support(pci_dev))
> > -   return pci_legacy_resume_early(dev);
> > -
> > /*
> >  * pci_restore_state() requires the device to be in D0 (because of MSI
> >  * restoration among other things), so force it into D0 in case the
> >  * driver's "freeze" callbacks put it into a low-power state directly.
> >  */
> > pci_set_power_state(pci_dev, PCI_D0);
> > +
> > +   if (pci_has_legacy_pm_support(pci_dev))
> > +   return pci_legacy_resume_early(dev);
> > +
> > pci_restore_state(pci_dev);
> >
> > if (drv && drv->pm && drv->pm->thaw_noirq)
> > --
> 
> Hi, Lorenzo, Bjorn,
> 
> Can you please take a look at the v2 ?
> 
> -- Dexuan

Hi Lorenzo, Bjorn, and all,
It looks this patch has not been acked by anyone.

Should I resend it? There is no change since v2.

Thanks,
-- Dexuan


RE: [PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()

2019-10-07 Thread Dexuan Cui
> -Original Message-
> From: Bjorn Helgaas 
> Sent: Monday, October 7, 2019 6:24 AM
> To: Dexuan Cui 
> Cc: lorenzo.pieral...@arm.com; linux-...@vger.kernel.org; Michael Kelley
> ; linux-hyp...@vger.kernel.org;
> linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Sasha
> Levin ; Haiyang Zhang
> ; KY Srinivasan ;
> o...@aepfle.de; a...@canonical.com; jasow...@redhat.com; vkuznets
> ; marcelo.ce...@canonical.com; Stephen Hemminger
> ; ja...@mellanox.com
> Subject: Re: [PATCH v2] PCI: PM: Move to D0 before calling
> pci_legacy_resume_early()
> 
> On Wed, Aug 14, 2019 at 01:06:55AM +, Dexuan Cui wrote:
> >
> > In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> >
> > In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> > but the current code misses the pci_legacy_resume_early() path, so the
> > state remains in PCI_UNKNOWN in that path. As a result, in the resume
> > phase of hibernation, this causes an error for the Mellanox VF driver,
> > which fails to enable MSI-X because pci_msi_supported() is false due
> > to dev->current_state != PCI_D0:
> >
> > mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
> > mlx4_core a6d1:00:02.0: Sending reset
> > mlx4_core a6d1:00:02.0: Sending vhcr0
> > mlx4_core a6d1:00:02.0: HCA minimum page size:512
> > mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
> > mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode,
> aborting
> > PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
> > PM: Device a6d1:00:02.0 failed to thaw: error -95
> >
> > To be more accurate, the "resume" phase means the "thaw" callbacks which
> > run before the system enters hibernation: when the user runs the command
> > "echo disk > /sys/power/state" for hibernation, first the kernel "freezes"
> > all the devices and creates a hibernation image, then the kernel "thaws"
> > the devices including the disk/NIC, writes the memory to the disk, and
> > powers down. This patch fixes the error message for the Mellanox VF driver
> > in this phase.
> >
> > When the system starts again, a fresh kernel starts to run, and when the
> > kernel detects that a hibernation image was saved, the kernel "quiesces"
> > the devices, and then "restores" the devices from the saved image. In this
> > path:
> > device_resume_noirq() -> ... ->
> >   pci_pm_restore_noirq() ->
> > pci_pm_default_resume_early() ->
> >   pci_power_up() moves the device states back to PCI_D0. This path is
> > not broken and doesn't need my patch.
> >
> > Signed-off-by: Dexuan Cui 
> 
> This looks like a bugfix for 5839ee7389e8 ("PCI / PM: Force devices to
> D0 in pci_pm_thaw_noirq()") so maybe it should be marked for stable as
> 5839ee7389e8 was?
> 
> Rafael, could you confirm?
> 
> > ---
> >
> > changes in v2:
> > Updated the changelog with more details.
> >
> >  drivers/pci/pci-driver.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 36dbe960306b..27dfc68db9e7 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device
> *dev)
> > return error;
> > }
> >
> > -   if (pci_has_legacy_pm_support(pci_dev))
> > -   return pci_legacy_resume_early(dev);
> > -
> > /*
> >  * pci_restore_state() requires the device to be in D0 (because of MSI
> >  * restoration among other things), so force it into D0 in case the
> >  * driver's "freeze" callbacks put it into a low-power state directly.
> >  */
> > pci_set_power_state(pci_dev, PCI_D0);
> > +
> > +   if (pci_has_legacy_pm_support(pci_dev))
> > +   return pci_legacy_resume_early(dev);
> > +
> > pci_restore_state(pci_dev);
> >
> > if (drv && drv->pm && drv->pm->thaw_noirq)
> > --
> > 2.19.1
> >

Added Rafael to "To".

Thanks,
-- Dexuan



RE: [PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()

2019-10-08 Thread Dexuan Cui
> From: Bjorn Helgaas 
> Sent: Tuesday, October 8, 2019 12:56 PM
> ...
> Wordsmithing nit: what the patch does is not "fix the error message";
> what it does is fix the *problem*, i.e., the fact that we can't
> operate the device because we can't enable MSI-X.  The message is only
> a symptom.

I totally agree. :-)

> IIUC the relevant part of the system hibernation sequence is:
> 
>   pci_pm_freeze_noirq
>   pci_pm_thaw_noirq
>   pci_pm_thaw
> 
> And the execution flow is:
> 
>   pci_pm_freeze_noirq
> if (pci_has_legacy_pm_support(pci_dev)) # true for mlx4
>   pci_legacy_suspend_late(dev, PMSG_FREEZE)
>   pci_pm_set_unknown_state
> dev->current_state = PCI_UNKNOWN  # <---
>   pci_pm_thaw_noirq
> if (pci_has_legacy_pm_support(pci_dev)) # true
>   pci_legacy_resume_early(dev)  # noop; mlx4 doesn't
> implement
>   pci_pm_thaw   # returns -95
> EOPNOTSUPP
> if (pci_has_legacy_pm_support(pci_dev)) # true
>   pci_legacy_resume
>   drv->resume
> mlx4_resume   # mlx4_driver.resume (legacy)
>   mlx4_load_one
> mlx4_enable_msi_x
>   pci_enable_msix_range
> __pci_enable_msix_range
>   __pci_enable_msix
> if (!pci_msi_supported())
>   if (dev->current_state != PCI_D0)  # <---
> return 0
>   return -EINVAL
>   err = -EOPNOTSUPP
>   "INTx is not supported ..."
> 
> (These are just my notes; you don't need to put them all into the
> commit message.  I'm just sharing them in case I'm not understanding
> correctly.)

Yes, these notes are accurate.

> > > > > When the system starts again, a fresh kernel starts to run, and when 
> > > > > the
> > > > > kernel detects that a hibernation image was saved, the kernel
> "quiesces"
> > > > > the devices, and then "restores" the devices from the saved image. In
> this
> > > > > path:
> > > > > device_resume_noirq() -> ... ->
> > > > >pci_pm_restore_noirq() ->
> > > > >  pci_pm_default_resume_early() ->
> > > > >pci_power_up() moves the device states back to PCI_D0. This
> path is
> > > > > not broken and doesn't need my patch.
> > > > >
> 
> The cc list suggests that this might be a fix for a user-reported
> problem.  Is there a launchpad or similar link you could include here?

I guess I'm the first one to notice the issue and there is not any bug link 
AFAIK.

The hibernation process usually saves the states into a local disk (before the
system is powered off), and the Mellanox NIC is not needed during the process,
so it's not a real issue that the NIC can not work between pci_pm_thaw() and 
power_down(). This may explain why nobody else noticed the issue. I happened
to see the error message, and hence investigated the issue.

> Should this be marked for stable?

I think we should do it.
 
> > > > > --- a/drivers/pci/pci-driver.c
> > > > > +++ b/drivers/pci/pci-driver.c
> > > > > @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device
> > > > *dev)
> > > > >   return error;
> > > > >   }
> > > > >
> > > > > - if (pci_has_legacy_pm_support(pci_dev))
> > > > > - return pci_legacy_resume_early(dev);
> > > > > -
> > > > >   /*
> > > > >* pci_restore_state() requires the device to be in D0 (because
> of MSI
> > > > >* restoration among other things), so force it into D0 in case
> the
> > > > >* driver's "freeze" callbacks put it into a low-power state
> directly.
> > > > >*/
> > > > >   pci_set_power_state(pci_dev, PCI_D0);
> > > > > +
> > > > > + if (pci_has_legacy_pm_support(pci_dev))
> > > > > + return pci_legacy_resume_early(dev);
> > > > > +
> > > > >   pci_restore_state(pci_dev);
> > > > >
> > > > >   if (drv && drv->pm && drv->pm->thaw_noirq)
> > > > > --
> > > > > 2.19.1
> > > > >
> > The patch looks reasonable to me, but the comment above the
> > pci_set_power_state() call needs to be updated too IMO.
> 
> Hmm.
> 
> 1) pci_restore_state() mainly writes config space, which doesn't
> require the device to be in D0.  The only thing I see that would
> require D0 is the MSI-X MMIO space, so to be more specific, the
> comment could say "restoring the MSI-X *MMIO* state requires the
> device to be in D0".
> 
> But I think you meant some other comment change.  Did you mean
> something along the lines of "a legacy drv->resume_early() callback
> and pci_restore_state() both require the device to be in D0"?
> 
> If something else, maybe you could propose some text?
> 
> 2) I assume pci_pm_thaw_noirq() should leave the device in a
> functionally equivalent state, whether it uses legacy PM or not.  Do
> we want something like the patch below instead?  If we *do* want to
> skip pci_restore_state() for legacy PM, maybe we should add a comment.
> 
> 3) Documentation/power/pci.rst says:
> 
> 

RE: [PATCH v3 0/7] PCI: PM: Move to D0 before calling pci_legacy_resume_early()

2019-10-15 Thread Dexuan Cui
> From: Bjorn Helgaas 
> Sent: Monday, October 14, 2019 4:00 PM
>  ...
> 
> Dexuan, the important thing here is the first patch, which is your [1],
> which I modified by doing pci_restore_state() as well as setting to D0:
> 
>   pci_set_power_state(pci_dev, PCI_D0);
>   pci_restore_state(pci_dev);
> 
> I'm proposing some more patches on top.  None are relevant to the problem
> you're solving; they're just minor doc and other updates in the same area.
> 
> Rafael, if you have a chance to look at these, I'd appreciate it.  I tried
> to make the doc match the code, but I'm no PM expert.
 
Thank you very much, Bjorn! The patchset looks good to me.

Thanks,
-- Dexuan


RE: [PATCH 0/3] Tools: hv: vssdaemon: freeze/thaw logic improvement for the failure case

2014-11-07 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> Sent: Saturday, November 8, 2014 1:09 AM
> To: KY Srinivasan; Haiyang Zhang; Greg Kroah-Hartman
> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; Dexuan Cui
> Subject: [PATCH 0/3] Tools: hv: vssdaemon: freeze/thaw logic improvement
> for the failure case
> 
> This patch series addresses the following issues:
> - Wrong error reporting for multiple filesystems case.
> - Skip all readonly-mounted filesystems instead of skipping iso9660.
> - Thaw all filesystems after an unsuccessful freeze attempt.
> 
> Vitaly Kuznetsov (3):
>   Tools: hv: vssdaemon: consult with errno in case of failure only
>   Tools: hv: vssdaemon: skip all filesystems mounted readonly
>   Tools: hv: vssdaemon: thaw everything in case of freeze failure
> 
>  tools/hv/hv_vss_daemon.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)

Hi Vitaly,
Thanks for your patchset!

FYI: Greg checked in a patch of mine several hours ago -- my patch
implemented "thaw all filesytems on a failure of freeze" too. :-)

Please see my patch in Greg's char-misc-next tree:
https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/commit/?h=char-misc-next&id=4f689190bb55d171d2f6614f8a6cbd4b868e48bd

Can you please rebase  your patch(es) on Greg's tree?

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2 1/2] Tools: hv: vssdaemon: report freeze errors

2014-11-10 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov
> Sent: Tuesday, November 11, 2014 0:37 AM
> To: KY Srinivasan; Haiyang Zhang; Greg Kroah-Hartman
> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; Dexuan Cui
> Subject: [PATCH v2 1/2] Tools: hv: vssdaemon: report freeze errors
> 
> When ioctl(fd, FIFREEZE, 0) results in an error we cannot report it
> to syslog instantly since that can cause write to a frozen disk.
> However, the name of the filesystem which caused the error and errno
> are valuable and we would like to get a nice human-readable message
> in the log. Save errno before calling vss_operate(VSS_OP_THAW) and
> report the error right after.
> 
> Unfortunately, FITHAW errors cannot be reported the same way as we
> need to finish thawing all filesystems before calling syslog().
> 
> We should also avoid calling endmntent() for the second time in
> case we encountered an error during freezing of '/' as it usually
> results in SEGSEGV.
> 
> Signed-off-by: Vitaly Kuznetsov 
> ---
>  tools/hv/hv_vss_daemon.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)

Hi Vitaly,
Thanks for the patch -- it does improve the error handling!
 
Acked-by: Dexuan Cui 

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2 2/2] Tools: hv: vssdaemon: skip all filesystems mounted readonly

2014-11-10 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov
> Sent: Tuesday, November 11, 2014 0:37 AM
> To: KY Srinivasan; Haiyang Zhang; Greg Kroah-Hartman
> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; Dexuan Cui
> Subject: [PATCH v2 2/2] Tools: hv: vssdaemon: skip all filesystems mounted
> readonly
> 
> Instead of making a list of exceptions for readonly filesystems
> in addition to iso9660 we already have it is better to skip freeze
> operation for all readonly-mounted filesystems.
> 
> Signed-off-by: Vitaly Kuznetsov 
> ---
>  tools/hv/hv_vss_daemon.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/hv/hv_vss_daemon.c b/tools/hv/hv_vss_daemon.c
> index ee44f0d..5e63f70 100644
> --- a/tools/hv/hv_vss_daemon.c
> +++ b/tools/hv/hv_vss_daemon.c
> @@ -102,7 +102,7 @@ static int vss_operate(int operation)
>   while ((ent = getmntent(mounts))) {
>   if (strncmp(ent->mnt_fsname, match, strlen(match)))
>   continue;
> - if (strcmp(ent->mnt_type, "iso9660") == 0)
> + if (hasmntopt(ent, MNTOPT_RO) != NULL)
>   continue;
>           if (strcmp(ent->mnt_type, "vfat") == 0)
>   continue;
> --

Acked-by: Dexuan Cui 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-11 Thread Dexuan Cui
In the case the user-space daemon crashes, hangs or is killed, we
need to down the semaphore, otherwise, after the daemon starts next
time, the obsolete data in fcopy_transaction.message or
fcopy_transaction.fcopy_msg will be used immediately.

Cc: K. Y. Srinivasan 
Signed-off-by: Dexuan Cui 
---
 drivers/hv/hv_fcopy.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index 23b2ce2..177122a 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct *dummy)
 * process the pending transaction.
 */
fcopy_respond_to_host(HV_E_FAIL);
+
+   /* In the case the user-space daemon crashes, hangs or is killed, we
+* need to down the semaphore, otherwise, after the daemon starts next
+* time, the obsolete data in fcopy_transaction.message or
+* fcopy_transaction.fcopy_msg will be used immediately.
+*/
+   if (down_trylock(&fcopy_transaction.read_sema))
+   pr_debug("FCP: failed to acquire the semaphore\n");
+
 }
 
 static int fcopy_handle_handshake(u32 version)
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] tools: hv: ignore ENOBUFS in the KVP daemon

2014-11-19 Thread Dexuan Cui
Under high memory pressure and very high KVP R/W test pressure, the netlink
recvfrom() may transiently return ENOBUFS to the daemon -- we found this
during a 2-week stress test.

We'd better not terminate the daemon on this failure, because a typical KVP
user can re-try the R/W and hopefully it will succeed next time.

Cc: K. Y. Srinivasan 
Signed-off-by: Dexuan Cui 
---
 tools/hv/hv_kvp_daemon.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 22b0764..9f4b303 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1559,8 +1559,15 @@ int main(int argc, char *argv[])
addr_p, &addr_l);
 
if (len < 0) {
+   int saved_errno = errno;
syslog(LOG_ERR, "recvfrom failed; pid:%u error:%d %s",
addr.nl_pid, errno, strerror(errno));
+
+   if (saved_errno == ENOBUFS) {
+   syslog(LOG_ERR, "error = ENOBUFS: ignored");
+   continue;
+   }
+
close(fd);
return -1;
}
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] tools: hv: ignore ENOBUFS in the KVP daemon

2014-11-19 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov
> Sent: Wednesday, November 19, 2014 18:50 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; Haiyang Zhang
> Subject: Re: [PATCH] tools: hv: ignore ENOBUFS in the KVP daemon
> 
> Dexuan Cui  writes:
> 
> > Under high memory pressure and very high KVP R/W test pressure, the
> netlink
> > recvfrom() may transiently return ENOBUFS to the daemon -- we found
> this
> > during a 2-week stress test.
> >
> > We'd better not terminate the daemon on this failure, because a typical
> KVP
> > user can re-try the R/W and hopefully it will succeed next time.
> >
> > diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
> > index 22b0764..9f4b303 100644
> > --- a/tools/hv/hv_kvp_daemon.c
> > +++ b/tools/hv/hv_kvp_daemon.c
> > @@ -1559,8 +1559,15 @@ int main(int argc, char *argv[])
> > addr_p, &addr_l);
> >
> > if (len < 0) {
> > +   int saved_errno = errno;
> > syslog(LOG_ERR, "recvfrom failed; pid:%u
> error:%d %s",
> > addr.nl_pid, errno, strerror(errno));
> > +
> > +   if (saved_errno == ENOBUFS) {
> 
> is it possible to meet EAGAIN (or EWOULDBLOCK) here as well? I'd suggest
> we ignore these as well in such case. Ignoring ENOMEM here is doubtful,
> I think. But possible.
> 
>   Vitaly

I don't think EAGAIN is possible  because "man recvfrom" says
   "If  no messages are available at the socket, the receive calls wait for a
 message to arrive, unless the socket is nonblocking (see fcntl(2)), in 
which
 case the value -1 is returned and  the  external variable  errno is set to
EAGAIN or EWOULDBLOCK".

The same man page mention ENOMEM for recvmsg(), but not recvfrom().

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] tools: hv: ignore ENOBUFS in the KVP daemon

2014-11-19 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> Sent: Wednesday, November 19, 2014 20:41 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; Haiyang Zhang
> Subject: Re: [PATCH] tools: hv: ignore ENOBUFS in the KVP daemon
> 
> Dexuan Cui  writes:
> 
> >> -Original Message-
> >> From: Vitaly Kuznetsov
> >> Sent: Wednesday, November 19, 2014 18:50 PM
> >> To: Dexuan Cui
> >> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> driverdev-
> >> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> >> jasow...@redhat.com; Haiyang Zhang
> >> Subject: Re: [PATCH] tools: hv: ignore ENOBUFS in the KVP daemon
> >>
> >> Dexuan Cui  writes:
> >>
> >> > Under high memory pressure and very high KVP R/W test pressure,
> the netlink
> >> > recvfrom() may transiently return ENOBUFS to the daemon -- we found
> this
> >> > during a 2-week stress test.
> >> >
> >> > We'd better not terminate the daemon on this failure, because a
> typical KVP
> >> > user can re-try the R/W and hopefully it will succeed next time.
> >> >
> >> > diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
> >> > index 22b0764..9f4b303 100644
> >> > --- a/tools/hv/hv_kvp_daemon.c
> >> > +++ b/tools/hv/hv_kvp_daemon.c
> >> > @@ -1559,8 +1559,15 @@ int main(int argc, char *argv[])
> >> >  addr_p, &addr_l);
> >> >
> >> >  if (len < 0) {
> >> > +int saved_errno = errno;
> >> >  syslog(LOG_ERR, "recvfrom failed; pid:%u
> error:%d %s",
> >> >  addr.nl_pid, errno, 
> >> > strerror(errno));
> >> > +
> >> > +if (saved_errno == ENOBUFS) {
> >>
> >> is it possible to meet EAGAIN (or EWOULDBLOCK) here as well? I'd
> suggest
> >> we ignore these as well in such case. Ignoring ENOMEM here is doubtful,
> >> I think. But possible.
> >>
> >>   Vitaly
> >
> > I don't think EAGAIN is possible  because "man recvfrom" says
> >"If  no messages are available at the socket, the receive calls wait for 
> > a
> >  message to arrive, unless the socket is nonblocking (see fcntl(2)), in
> which
> >  case the value -1 is returned and  the  external variable  errno is 
> > set to
> > EAGAIN or EWOULDBLOCK".
> >
> > The same man page mention ENOMEM for recvmsg(), but not recvfrom().
> 
> Ah, sorry, I though your patch patches the other place: call to
> netlink_send() which does sendmsg() (and my
> EAGAIN/EWOULDBLOCK/ENOMEM
> comment was about it). It could also make sense to patch them both as I
> think it is possible to hit these as well.
> 
> > -- Dexuan
> --
>   Vitaly

OK, I can add this new check:
(I'll send out the v2 tomorrow in case  people have new comments)

--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1770,8 +1770,15 @@ kvp_done:

len = netlink_send(fd, incoming_cn_msg);
if (len < 0) {
+   int saved_errno = errno;
syslog(LOG_ERR, "net_link send failed; error: %d %s", 
errno,
strerror(errno));
+
+   if (saved_errno == ENOMEM || saved_errno == EAGAIN) {
+   syslog(LOG_ERR, "send error: ignored");
+   continue;
+   }
+
exit(EXIT_FAILURE);
}
}

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] tools: hv: ignore ENOBUFS in the KVP daemon

2014-11-19 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov 
> >> --
> >>   Vitaly
> >
> > OK, I can add this new check:
> > (I'll send out the v2 tomorrow in case  people have new comments)
> >
> 
> Thanks!
> 
> > --- a/tools/hv/hv_kvp_daemon.c
> > +++ b/tools/hv/hv_kvp_daemon.c
> > @@ -1770,8 +1770,15 @@ kvp_done:
> >
> > len = netlink_send(fd, incoming_cn_msg);
> > if (len < 0) {
> > +   int saved_errno = errno;
> > syslog(LOG_ERR, "net_link send failed; error: %d 
> > %s", errno,
> > strerror(errno));
> > +
> > +   if (saved_errno == ENOMEM || saved_errno ==  
> > EAGAIN) {
> 
> Sorry for being pushy, but it seems ENOBUFS is also possible here (at
> least man sendmsg mentions it).
OK, I'll add this too. :-)

BTW, I realized sendmsg() can't return EAGAIN here as that's for non-blocking
socket.

Here I simply ignore the error, hoping the other end will re-try.

> 
> > +   syslog(LOG_ERR, "send error: ignored");
> > +   continue;
> > +   }
> > +
> > exit(EXIT_FAILURE);
> > }
> > }
> >
> > Thanks,
> > -- Dexuan
> 
>   Vitaly

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2] tools: hv: ignore ENOBUFS and ENOMEM in the KVP daemon

2014-11-19 Thread Dexuan Cui
Under high memory pressure and very high KVP R/W test pressure, the netlink
recvfrom() may transiently return ENOBUFS to the daemon -- we found this
during a 2-week stress test.

We'd better not terminate the daemon on the failure, because a typical KVP
user will re-try the R/W and hopefully it will succeed next time.

We can also ignore the errors on sending.

Cc: Vitaly Kuznetsov 
Cc: K. Y. Srinivasan 
Signed-off-by: Dexuan Cui 
---

v2: I also ignore the errors on sending, as Vitaly suggested.

 tools/hv/hv_kvp_daemon.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 22b0764..6a6432a 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1559,8 +1559,15 @@ int main(int argc, char *argv[])
addr_p, &addr_l);
 
if (len < 0) {
+   int saved_errno = errno;
syslog(LOG_ERR, "recvfrom failed; pid:%u error:%d %s",
addr.nl_pid, errno, strerror(errno));
+
+   if (saved_errno == ENOBUFS) {
+   syslog(LOG_ERR, "receive error: ignored");
+   continue;
+   }
+
close(fd);
return -1;
}
@@ -1763,8 +1770,15 @@ kvp_done:
 
len = netlink_send(fd, incoming_cn_msg);
if (len < 0) {
+   int saved_errno = errno;
syslog(LOG_ERR, "net_link send failed; error: %d %s", 
errno,
strerror(errno));
+
+   if (saved_errno == ENOMEM || saved_errno == ENOBUFS) {
+   syslog(LOG_ERR, "send error: ignored");
+   continue;
+   }
+
exit(EXIT_FAILURE);
}
}
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-19 Thread Dexuan Cui
> -Original Message-
> From: KY Srinivasan
> Sent: Thursday, November 20, 2014 6:59 AM
> > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index
> > 23b2ce2..177122a 100644
> > --- a/drivers/hv/hv_fcopy.c
> > +++ b/drivers/hv/hv_fcopy.c
> > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct
> > *dummy)
> >   * process the pending transaction.
> >   */
> >  fcopy_respond_to_host(HV_E_FAIL);
> > +
> > +/* In the case the user-space daemon crashes, hangs or is killed, we
> > + * need to down the semaphore, otherwise, after the daemon starts
> > next
> > + * time, the obsolete data in fcopy_transaction.message or
> > + * fcopy_transaction.fcopy_msg will be used immediately.
> > + */
> > +if (down_trylock(&fcopy_transaction.read_sema))
> > +pr_debug("FCP: failed to acquire the semaphore\n");
> > +
> >  }
> 
> When the daemon is killed, we currently reset the state in the release
> function. Why can't we cleanup the semaphore state (initialize) here as well.
> 
> K. Y

Hi KY,
1) The down_trylock() here is necessary: the daemon can fail to respond in 5
seconds due to many reasons, e.g., the VM's CPU and I/O are too busy. In this
case, the daemon may become running later(NOTE: in this example, the daemon
is not killed), but from the host user's point of view, the PowerShell
copy-vmfile command has failed, so here we have to 'down' the semaphore
anyway, otherwise, the daemon can get obsolete data.

2) If we add a line
sema_init(&fcopy_transaction.read_sema, 0);
in fcopy_release(), it seems OK at a glance, but we have to handle the race
condition: the above down_trylock() and the sema_init() can, in theory, run
simultaneously on different virtual CPUs.  It's tricky to address this.

3) So I think we can reuse the same semaphore without an actually unnecessary
re-initialization. :-)

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-20 Thread Dexuan Cui
> -Original Message-
> From: KY Srinivasan
> Sent: Friday, November 21, 2014 1:58 AM
> To: Dexuan Cui; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> driverdev-devel@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com
> Cc: Haiyang Zhang
> Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> > -Original Message-
> > From: Dexuan Cui
> > Sent: Wednesday, November 19, 2014 11:48 PM
> > To: KY Srinivasan; gre...@linuxfoundation.org; linux-
> > ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org;
> > o...@aepfle.de; a...@canonical.com; jasow...@redhat.com
> > Cc: Haiyang Zhang
> > Subject: RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> > failure
> >
> > > -Original Message-
> > > From: KY Srinivasan
> > > Sent: Thursday, November 20, 2014 6:59 AM
> > > > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c index
> > > > 23b2ce2..177122a 100644
> > > > --- a/drivers/hv/hv_fcopy.c
> > > > +++ b/drivers/hv/hv_fcopy.c
> > > > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct
> work_struct
> > > > *dummy)
> > > >   * process the pending transaction.
> > > >   */
> > > >  fcopy_respond_to_host(HV_E_FAIL);
> > > > +
> > > > +/* In the case the user-space daemon crashes, hangs or is killed,
> > > > +we
> > > > + * need to down the semaphore, otherwise, after the daemon starts
> > > > next
> > > > + * time, the obsolete data in fcopy_transaction.message or
> > > > + * fcopy_transaction.fcopy_msg will be used immediately.
> > > > + */
> > > > +if (down_trylock(&fcopy_transaction.read_sema))
> > > > +pr_debug("FCP: failed to acquire the semaphore\n");
> > > > +
> > > >  }
> > >
> > > When the daemon is killed, we currently reset the state in the release
> > > function. Why can't we cleanup the semaphore state (initialize) here as
> > well.
> > >
> > > K. Y
> >
> > Hi KY,
> > 1) The down_trylock() here is necessary: the daemon can fail to respond
> in 5
> > seconds due to many reasons, e.g., the VM's CPU and I/O are too busy. In
> > this case, the daemon may become running later(NOTE: in this example,
> the
> > daemon is not killed), but from the host user's point of view, the
> PowerShell
> > copy-vmfile command has failed, so here we have to 'down' the
> semaphore
> > anyway, otherwise, the daemon can get obsolete data.
> >
> > 2) If we add a line
> > sema_init(&fcopy_transaction.read_sema, 0); in fcopy_release(), it seems
> > OK at a glance, but we have to handle the race
> > condition: the above down_trylock() and the sema_init() can, in theory,
> run
> > simultaneously on different virtual CPUs.  It's tricky to address this.
> >
> > 3) So I think we can reuse the same semaphore without an actually
> > unnecessary re-initialization. :-)
> 
> Agreed; you may want to get rid of the pr_debug() call though.
> 
> Thanks,
> 
> K. Y

The pr_debug() is added intentionally according to suggestion of 
Redhat's Vitaly Kuznetsov in the bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1162100#c5

The function is declared with__must_check in include/linux/semaphore.h:
extern int __must_check down_trylock(struct semaphore *sem);

Without checking the return value, we'll get these warning if the
"Kernel hacking" options are enabled:

drivers/hv/hv_fcopy.c: In function 'fcopy_work_func':
drivers/hv/hv_fcopy.c:95:2: warning: ignoring return value of 'down_trylock', 
declared with attribute warn_unused_result [-Wunused-result]
  (void)down_trylock(&fcopy_transaction.read_sema);
  ^

In practice, the message I add should be very rare since it's very unlikely to 
fail
to get the semaphore in this timeout case -- and in case this happens, it's 
actually
OK, because the driver has told the host user the PowerShell command should 
fail.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] hv: hv_balloon: avoid memory leak on alloc_error of 2MB memory block

2014-11-23 Thread Dexuan Cui
If num_ballooned is not 0, we shouldn't neglect the already-allocated 2MB
memory block(s).

Cc: K. Y. Srinivasan 
Cc: 
Signed-off-by: Dexuan Cui 
---
 drivers/hv/hv_balloon.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5e90c5d..cba2d3b 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1091,6 +1091,8 @@ static void balloon_up(struct work_struct *dummy)
bool done = false;
int i;
 
+   /* The host does balloon_up in 2MB. */
+   WARN_ON(num_pages % PAGES_IN_2M != 0);
 
/*
 * We will attempt 2M allocations. However, if we fail to
@@ -,7 +1113,7 @@ static void balloon_up(struct work_struct *dummy)
bl_resp, alloc_unit,
 &alloc_error);
 
-   if ((alloc_error) && (alloc_unit != 1)) {
+   if (alloc_error && (alloc_unit != 1) && num_ballooned == 0) {
alloc_unit = 1;
continue;
}
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] hv: hv_balloon: avoid memory leak on alloc_error of 2MB memory block

2014-11-23 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, November 24, 2014 13:18 PM
> To: Dexuan Cui; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> driverdev-devel@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; KY Srinivasan
> Cc: Haiyang Zhang
> Subject: Re: [PATCH] hv: hv_balloon: avoid memory leak on alloc_error of
> 2MB memory block
> 
> On 11/24/2014 01:56 PM, Dexuan Cui wrote:
> > If num_ballooned is not 0, we shouldn't neglect the already-allocated
> 2MB
> > memory block(s).
> >
> > Cc: K. Y. Srinivasan 
> > Cc: 
> > Signed-off-by: Dexuan Cui 
> > ---
> >  drivers/hv/hv_balloon.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> > index 5e90c5d..cba2d3b 100644
> > --- a/drivers/hv/hv_balloon.c
> > +++ b/drivers/hv/hv_balloon.c
> > @@ -1091,6 +1091,8 @@ static void balloon_up(struct work_struct
> *dummy)
> > bool done = false;
> > int i;
> >
> > +   /* The host does balloon_up in 2MB. */
> > +   WARN_ON(num_pages % PAGES_IN_2M != 0);
> >
> > /*
> >  * We will attempt 2M allocations. However, if we fail to
> > @@ -,7 +1113,7 @@ static void balloon_up(struct work_struct
> *dummy)
> > bl_resp, alloc_unit,
> >  &alloc_error);
> >
> > -   if ((alloc_error) && (alloc_unit != 1)) {
> > +   if (alloc_error && (alloc_unit != 1) && num_ballooned == 0)
> {
> > alloc_unit = 1;
> > continue;
> > }
> 
> Before the change, we may retry the 4K allocation when part or all 2M
> allocations were failed. This makes sense when memory is fragmented. But
Yes, but all the partially-allocated 2MB memory blocks are lost(mem leak).

> after the change, if part of 2M allocation were failed, we won't retry
> 4K allocation. Is this expected?
Hi Jason,
The patch doesn't break the "try 2MB first; then try 4K" logic:

With the change, we'll retry the 2MB allocation in the next iteration of the
same while (!done) loop -- we expect this retry will cause
"alloc_error && (alloc_unit != 1) && num_ballooned == 0" to be true,
so we'll later try 4K allocation, as we did before.


> Btw, can host just require 1M? If yes, should alloc_balloon_pages() set
Hi KY,
Can you please clarify this?
You know the host much more than me. :-)

> alloc_error if num_pages < alloc_unit for caller to catch this and retry
> 4K allocation?
 
-- Dexuan

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] hv: hv_balloon: avoid memory leak on alloc_error of 2MB memory block

2014-11-23 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, November 24, 2014 15:28 PM
> To: Dexuan Cui; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> driverdev-devel@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; KY Srinivasan
> Cc: Haiyang Zhang
> Subject: Re: [PATCH] hv: hv_balloon: avoid memory leak on alloc_error of
> 2MB memory block
> 
> On 11/24/2014 02:08 PM, Dexuan Cui wrote:
> >> -Original Message-
> >> > From: Jason Wang [mailto:jasow...@redhat.com]
> >> > Sent: Monday, November 24, 2014 13:18 PM
> >> > To: Dexuan Cui; gre...@linuxfoundation.org; linux-
> ker...@vger.kernel.org;
> >> > driverdev-devel@linuxdriverproject.org; o...@aepfle.de;
> >> > a...@canonical.com; KY Srinivasan
> >> > Cc: Haiyang Zhang
> >> > Subject: Re: [PATCH] hv: hv_balloon: avoid memory leak on
> alloc_error of
> >> > 2MB memory block
> >> >
> >> > On 11/24/2014 01:56 PM, Dexuan Cui wrote:
> >>> > > If num_ballooned is not 0, we shouldn't neglect the already-
> allocated
> >> > 2MB
> >>> > > memory block(s).
> >>> > >
> >>> > > Cc: K. Y. Srinivasan 
> >>> > > Cc: 
> >>> > > Signed-off-by: Dexuan Cui 
> >>> > > ---
> >>> > >  drivers/hv/hv_balloon.c | 4 +++-
> >>> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> >>> > >
> >>> > > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> >>> > > index 5e90c5d..cba2d3b 100644
> >>> > > --- a/drivers/hv/hv_balloon.c
> >>> > > +++ b/drivers/hv/hv_balloon.c
> >>> > > @@ -1091,6 +1091,8 @@ static void balloon_up(struct
> work_struct
> >> > *dummy)
> >>> > >   bool done = false;
> >>> > >   int i;
> >>> > >
> >>> > > + /* The host does balloon_up in 2MB. */
> >>> > > + WARN_ON(num_pages % PAGES_IN_2M != 0);
> >>> > >
> >>> > >   /*
> >>> > >* We will attempt 2M allocations. However, if we fail to
> >>> > > @@ -,7 +1113,7 @@ static void balloon_up(struct
> work_struct
> >> > *dummy)
> >>> > >   bl_resp, alloc_unit,
> >>> > >&alloc_error);
> >>> > >
> >>> > > - if ((alloc_error) && (alloc_unit != 1)) {
> >>> > > + if (alloc_error && (alloc_unit != 1) &&
> num_ballooned == 0)
> >> > {
> >>> > >   alloc_unit = 1;
> >>> > >   continue;
> >>> > >   }
> >> >
> >> > Before the change, we may retry the 4K allocation when part or all 2M
> >> > allocations were failed. This makes sense when memory is fragmented.
> But
> > Yes, but all the partially-allocated 2MB memory blocks are lost(mem leak).
> >
> >> > after the change, if part of 2M allocation were failed, we won't retry
> >> > 4K allocation. Is this expected?
> > Hi Jason,
> > The patch doesn't break the "try 2MB first; then try 4K" logic:
> >
> > With the change, we'll retry the 2MB allocation in the next iteration of the
> > same while (!done) loop -- we expect this retry will cause
> > "alloc_error && (alloc_unit != 1) && num_ballooned == 0" to be true,
> > so we'll later try 4K allocation, as we did before.
> 
> If I read the code correctly, if part of 2M allocation fails.
> alloc_balloon_pages() will have a non zero return value with alloc_error
> set. Then it will match the following check:
> 
> if ((alloc_error) || (num_ballooned == num_pages))
> {
> 
> which will set done to true. So there's no second iteration of while
> (!done) loop?
Oh... I see the issue in my patch.
Thanks for pointing this out, Jason!

> Probably you need something like:
> 
> if ((alloc_unit != 1) && (num_ballooned == 0)) {
> alloc_unit = 1;
> continue;
> }
> 
> if ((alloc_unit == 1) || (num_ballooned == num_pages)) {
> ...
> }
Your code is good, except for one thing:
alloc_balloon_pages() can return due to:

if (bl_resp->hdr.size + sizeof(union dm_

RE: [PATCH] hv: hv_balloon: avoid memory leak on alloc_error of 2MB memory block

2014-11-24 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, November 24, 2014 16:48 PM
> To: Dexuan Cui; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> driverdev-devel@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; KY Srinivasan
> Cc: Haiyang Zhang
> Subject: Re: [PATCH] hv: hv_balloon: avoid memory leak on alloc_error of
> 2MB memory block
> 
> On 11/24/2014 03:54 PM, Dexuan Cui wrote:
> >> -Original Message-
> >> From: Jason Wang [mailto:jasow...@redhat.com]
> >> Sent: Monday, November 24, 2014 15:28 PM
> >> To: Dexuan Cui; gre...@linuxfoundation.org; linux-
> ker...@vger.kernel.org;
> >> driverdev-devel@linuxdriverproject.org; o...@aepfle.de;
> >> a...@canonical.com; KY Srinivasan
> >> Cc: Haiyang Zhang
> >> Subject: Re: [PATCH] hv: hv_balloon: avoid memory leak on alloc_error
> of
> >> 2MB memory block
> >>
> >> On 11/24/2014 02:08 PM, Dexuan Cui wrote:
> >>>> -Original Message-
> >>>>> From: Jason Wang [mailto:jasow...@redhat.com]
> >>>>> Sent: Monday, November 24, 2014 13:18 PM
> >>>>> To: Dexuan Cui; gre...@linuxfoundation.org; linux-
> >> ker...@vger.kernel.org;
> >>>>> driverdev-devel@linuxdriverproject.org; o...@aepfle.de;
> >>>>> a...@canonical.com; KY Srinivasan
> >>>>> Cc: Haiyang Zhang
> >>>>> Subject: Re: [PATCH] hv: hv_balloon: avoid memory leak on
> >> alloc_error of
> >>>>> 2MB memory block
> >>>>>
> >>>>> On 11/24/2014 01:56 PM, Dexuan Cui wrote:
> >>>>>>> If num_ballooned is not 0, we shouldn't neglect the already-
> >> allocated
> >>>>> 2MB
> >>>>>>> memory block(s).
> >>>>>>>
> >>>>>>> Cc: K. Y. Srinivasan 
> >>>>>>> Cc: 
> >>>>>>> Signed-off-by: Dexuan Cui 
> >>>>>>> ---
> >>>>>>>  drivers/hv/hv_balloon.c | 4 +++-
> >>>>>>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> >>>>>>> index 5e90c5d..cba2d3b 100644
> >>>>>>> --- a/drivers/hv/hv_balloon.c
> >>>>>>> +++ b/drivers/hv/hv_balloon.c
> >>>>>>> @@ -1091,6 +1091,8 @@ static void balloon_up(struct
> >> work_struct
> >>>>> *dummy)
> >>>>>>>   bool done = false;
> >>>>>>>   int i;
> >>>>>>>
> >>>>>>> + /* The host does balloon_up in 2MB. */
> >>>>>>> + WARN_ON(num_pages % PAGES_IN_2M != 0);
> >>>>>>>
> >>>>>>>   /*
> >>>>>>>* We will attempt 2M allocations. However, if we fail to
> >>>>>>> @@ -,7 +1113,7 @@ static void balloon_up(struct
> >> work_struct
> >>>>> *dummy)
> >>>>>>>   bl_resp, alloc_unit,
> >>>>>>>&alloc_error);
> >>>>>>>
> >>>>>>> - if ((alloc_error) && (alloc_unit != 1)) {
> >>>>>>> + if (alloc_error && (alloc_unit != 1) &&
> >> num_ballooned == 0)
> >>>>> {
> >>>>>>>   alloc_unit = 1;
> >>>>>>>   continue;
> >>>>>>>   }
> >>>>> Before the change, we may retry the 4K allocation when part or all
> 2M
> >>>>> allocations were failed. This makes sense when memory is
> fragmented.
> >> But
> >>> Yes, but all the partially-allocated 2MB memory blocks are lost(mem
> leak).
> >>>
> >>>>> after the change, if part of 2M allocation were failed, we won't retry
> >>>>> 4K allocation. Is this expected?
> >>> Hi Jason,
> >>> The patch doesn't break the "try 2MB first; then try 4K" logic:
> >>>
> >>> With the change, we'll retry the 2MB allocation in the next iteration of
> the
> >>> same while (!done) loop -- we expect this retry will cause

[PATCH v2] hv: hv_balloon: avoid memory leak on alloc_error of 2MB memory block

2014-11-24 Thread Dexuan Cui
If num_ballooned is not 0, we shouldn't neglect the
already-partially-allocated 2MB memory block(s).

Cc: Jason Wang 
Cc: K. Y. Srinivasan 
Signed-off-by: Dexuan Cui 
---

v2: I fixed the logic error in v1, pointed by Jason Wang:
In v1: in the case of partially-allocated 2MB, alloc_error is true,
so we'll run "done = true" and hence we won't proceed with
the next iteration of trying 4K allocation.

I also changed the WARN_ON to WARN_ON_ONCE in case the host behavior
changes in the future.

 drivers/hv/hv_balloon.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5e90c5d..b958ded 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1087,10 +1087,12 @@ static void balloon_up(struct work_struct *dummy)
struct dm_balloon_response *bl_resp;
int alloc_unit;
int ret;
-   bool alloc_error = false;
+   bool alloc_error;
bool done = false;
int i;
 
+   /* The host balloons pages in 2M granularity. */
+   WARN_ON_ONCE(num_pages % PAGES_IN_2M != 0);
 
/*
 * We will attempt 2M allocations. However, if we fail to
@@ -1107,16 +1109,18 @@ static void balloon_up(struct work_struct *dummy)
 
 
num_pages -= num_ballooned;
+   alloc_error = false;
num_ballooned = alloc_balloon_pages(&dm_device, num_pages,
bl_resp, alloc_unit,
 &alloc_error);
 
-   if ((alloc_error) && (alloc_unit != 1)) {
+   if (alloc_unit != 1 && num_ballooned == 0) {
alloc_unit = 1;
continue;
}
 
-   if ((alloc_error) || (num_ballooned == num_pages)) {
+   if ((alloc_unit == 1 && alloc_error) ||
+   (num_ballooned == num_pages)) {
bl_resp->more_pages = 0;
done = true;
dm_device.state = DM_INITIALIZED;
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] Drivers: hv: vmbus: prevent cpu offlining on newer hypervisors

2014-11-26 Thread Dexuan Cui
> -Original Message-
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On
> Behalf Of Greg Kroah-Hartman
> Sent: Thursday, November 27, 2014 11:03 AM
> To: Vitaly Kuznetsov
> Cc: de...@linuxdriverproject.org; Haiyang Zhang; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] Drivers: hv: vmbus: prevent cpu offlining on newer
> hypervisors
> 
> On Wed, Nov 26, 2014 at 02:52:22PM +0100, Vitaly Kuznetsov wrote:
> > When an SMP Hyper-V guest is running on top of 2012R2 Server and
> secondary
> > cpus are sent offline (with echo 0 >
> /sys/devices/system/cpu/cpu$cpu/online)
> > the system freeze is observed. This happens due to the fact that on newer
> > hypervisors (Win8, WS2012R2, ...) vmbus channel handlers are
> distributed
> > across all cpus (see init_vp_index() function in
> drivers/hv/channel_mgmt.c)
> > and on cpu offlining nobody reassigns them to CPU0. Prevent cpu
> offlining
> > when vmbus is loaded until the issue is fixed host-side.
> >
> > This patch also disables hibernation but it is OK as it is also broken (MCE
> > error is hit on resume). Suspend still works.
> >
> > Tested with WS2008R2 and WS2012R2.
> >
> > Signed-off-by: Vitaly Kuznetsov 
> > ---
> >  drivers/hv/vmbus_drv.c | 19 +++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > index 4d6b269..9a82249 100644
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -32,6 +32,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -671,6 +672,13 @@ static void vmbus_isr(void)
> > tasklet_schedule(&msg_dpc);
> >  }
> >
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +static int hyperv_cpu_disable(void)
> > +{
> > +   return -1;
> > +}
> > +#endif
> > +
> >  /*
> >   * vmbus_bus_init -Main vmbus driver initialization routine.
> >   *
> > @@ -711,6 +719,12 @@ static int vmbus_bus_init(int irq)
> > if (ret)
> > goto err_alloc;
> >
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +   if ((vmbus_proto_version != VERSION_WS2008) &&
> > +   (vmbus_proto_version != VERSION_WIN7))
> > +   smp_ops.cpu_disable = hyperv_cpu_disable;
> > +#endif
> > +
> > vmbus_request_offers();
> >
> > return 0;
> > @@ -964,6 +978,11 @@ static void __exit vmbus_exit(void)
> > bus_unregister(&hv_bus);
> > hv_cleanup();
> > acpi_bus_unregister_driver(&vmbus_acpi_driver);
> > +#ifdef CONFIG_HOTPLUG_CPU
> > +   if ((vmbus_proto_version != VERSION_WS2008) &&
> > +   (vmbus_proto_version != VERSION_WIN7))
> > +   smp_ops.cpu_disable = native_cpu_disable;
> > +#endif
> >  }
> 
> #ifdef in a .c file is not a good idea to do if at all possible, please
> only put this in one place, using a function call to "hide" the mess.
> 
> greg k-h

Hi Vitaly,
The idea of the patch is good to me.

I agree with Greg.
BTW, maybe hv_cpu_hotplug_quirk() is a better name?

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-26 Thread Dexuan Cui
> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Thursday, November 27, 2014 7:54 AM
> To: Dexuan Cui
> Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org;
> o...@aepfle.de; a...@canonical.com; jasow...@redhat.com; KY Srinivasan;
> Haiyang Zhang; vkuzn...@redhat.com
> Subject: Re: [PATCH] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> 
> On Tue, Nov 11, 2014 at 09:03:26PM -0800, Dexuan Cui wrote:
> > In the case the user-space daemon crashes, hangs or is killed, we
> > need to down the semaphore, otherwise, after the daemon starts next
> > time, the obsolete data in fcopy_transaction.message or
> > fcopy_transaction.fcopy_msg will be used immediately.
> >
> > Cc: K. Y. Srinivasan 
> > Signed-off-by: Dexuan Cui 
> > ---
> >  drivers/hv/hv_fcopy.c | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> > index 23b2ce2..177122a 100644
> > --- a/drivers/hv/hv_fcopy.c
> > +++ b/drivers/hv/hv_fcopy.c
> > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct
> *dummy)
> >  * process the pending transaction.
> >  */
> > fcopy_respond_to_host(HV_E_FAIL);
> > +
> > +   /* In the case the user-space daemon crashes, hangs or is killed, we
> > +* need to down the semaphore, otherwise, after the daemon starts
> next
> > +* time, the obsolete data in fcopy_transaction.message or
> > +* fcopy_transaction.fcopy_msg will be used immediately.
> > +*/
> > +   if (down_trylock(&fcopy_transaction.read_sema))
> > +   pr_debug("FCP: failed to acquire the semaphore\n");
> 
> Why is "FCP:" needed?  pr_debug() should never need any type of prefix.
> 
> Please fix.
> 
> thanks,
> 
> greg k-h

Ok, I'll send a v2 soon.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-26 Thread Dexuan Cui
In the case the user-space daemon crashes, hangs or is killed, we
need to down the semaphore, otherwise, after the daemon starts next
time, the obsolete data in fcopy_transaction.message or
fcopy_transaction.fcopy_msg will be used immediately.

Reviewed-by: Vitaly Kuznetsov 
Cc: K. Y. Srinivasan 
Signed-off-by: Dexuan Cui 
---

v2: I removed the "FCP" prefix as Greg asked.

I also updated the output message a little:
"FCP: failed to acquire the semaphore" --> 
"can not acquire the semaphore: it is benign"

 drivers/hv/hv_fcopy.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index 23b2ce2..c518ad9 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct *dummy)
 * process the pending transaction.
 */
fcopy_respond_to_host(HV_E_FAIL);
+
+   /* In the case the user-space daemon crashes, hangs or is killed, we
+* need to down the semaphore, otherwise, after the daemon starts next
+* time, the obsolete data in fcopy_transaction.message or
+* fcopy_transaction.fcopy_msg will be used immediately.
+*/
+   if (down_trylock(&fcopy_transaction.read_sema))
+   pr_debug("can not acquire the semaphore: it is benign\n");
+
 }
 
 static int fcopy_handle_handshake(u32 version)
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-27 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, November 27, 2014 15:15 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> Subject: Re: [PATCH v2] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> - Original Message -
> > In the case the user-space daemon crashes, hangs or is killed, we
> > need to down the semaphore, otherwise, after the daemon starts next
> > time, the obsolete data in fcopy_transaction.message or
> > fcopy_transaction.fcopy_msg will be used immediately.
> >
> > Reviewed-by: Vitaly Kuznetsov 
> > Cc: K. Y. Srinivasan 
> > Signed-off-by: Dexuan Cui 
> > ---
> >
> > v2: I removed the "FCP" prefix as Greg asked.
> >
> > I also updated the output message a little:
> > "FCP: failed to acquire the semaphore" -->
> > "can not acquire the semaphore: it is benign"
> >
> >  drivers/hv/hv_fcopy.c | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> > index 23b2ce2..c518ad9 100644
> > --- a/drivers/hv/hv_fcopy.c
> > +++ b/drivers/hv/hv_fcopy.c
> > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct
> *dummy)
> >  * process the pending transaction.
> >  */
> > fcopy_respond_to_host(HV_E_FAIL);
> > +
> > +   /* In the case the user-space daemon crashes, hangs or is killed, we
> > +* need to down the semaphore, otherwise, after the daemon starts
> next
> > +* time, the obsolete data in fcopy_transaction.message or
> > +* fcopy_transaction.fcopy_msg will be used immediately.
> > +*/
> 
> Looks still racy, what happens if the daemon start before down_trylock()
> but after fcopy_respont_to_host() here?
Jason,
Thanks for pointing this out!
IMO we can resolve this by adding down_trylock() in fcopy_release().
What's your opinion?

> 
> > +   if (down_trylock(&fcopy_transaction.read_sema))
> > +   pr_debug("can not acquire the semaphore: it is benign\n");
> 
> typo
> > +
> >  }
Sorry -- what typo do you mean?

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-27 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, November 27, 2014 17:01 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> Subject: RE: [PATCH v2] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> On Thu, Nov 27, 2014 at 4:50 PM, Dexuan Cui  wrote:
> >>  -Original Message-
> >>  From: Jason Wang [mailto:jasow...@redhat.com]
> >>  Sent: Thursday, November 27, 2014 15:15 PM
> >>  To: Dexuan Cui
> >>  Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> >> driverdev-
> >>  de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> >>  Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> >>  Subject: Re: [PATCH v2] hv: hv_fcopy: drop the obsolete message on
> >> transfer
> >>  failure
> >>  - Original Message -
> >>  > In the case the user-space daemon crashes, hangs or is killed, we
> >>  > need to down the semaphore, otherwise, after the daemon starts
> >> next
> >>  > time, the obsolete data in fcopy_transaction.message or
> >>  > fcopy_transaction.fcopy_msg will be used immediately.
> >>  >
> >>  > Reviewed-by: Vitaly Kuznetsov 
> >>  > Cc: K. Y. Srinivasan 
> >>  > Signed-off-by: Dexuan Cui 
> >>  > ---
> >>  >
> >>  > v2: I removed the "FCP" prefix as Greg asked.
> >>  >
> >>  > I also updated the output message a little:
> >>  > "FCP: failed to acquire the semaphore" -->
> >>  > "can not acquire the semaphore: it is benign"
> >>  >
> >>  >  drivers/hv/hv_fcopy.c | 9 +
> >>  >  1 file changed, 9 insertions(+)
> >>  >
> >>  > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> >>  > index 23b2ce2..c518ad9 100644
> >>  > --- a/drivers/hv/hv_fcopy.c
> >>  > +++ b/drivers/hv/hv_fcopy.c
> >>  > @@ -86,6 +86,15 @@ static void fcopy_work_func(struct work_struct
> >>  *dummy)
> >>  >  * process the pending transaction.
> >>  >  */
> >>  > fcopy_respond_to_host(HV_E_FAIL);
> >>  > +
> >>  > +   /* In the case the user-space daemon crashes, hangs or is
> >> killed, we
> >>  > +* need to down the semaphore, otherwise, after the daemon
> >> starts
> >>  next
> >>  > +* time, the obsolete data in fcopy_transaction.message or
> >>  > +* fcopy_transaction.fcopy_msg will be used immediately.
> >>  > +*/
> >>
> >>  Looks still racy, what happens if the daemon start before
> >> down_trylock()
> >>  but after fcopy_respont_to_host() here?
> > Jason,
> > Thanks for pointing this out!
> > IMO we can resolve this by adding down_trylock() in fcopy_release().
> > What's your opinion?
> 
> 
> Looks better and need to cancel the timeout also here?
OK, let me post a v3.

> >
> >
> >>
> >>  > +   if (down_trylock(&fcopy_transaction.read_sema))
> >>  > +   pr_debug("can not acquire the semaphore: it is 
> >> benign\n");
> >>
> >>  typo
> >>  > +
> >>  >  }
> > Sorry -- what typo do you mean?
> 
> s/benign/begin/ ?
I meant the issue(can't get the semaphore) is benign.

I think we can just remove the message, as KY suggested.
Instead, I'll add a comment for it.

Thanks,
-- Dexuan

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-27 Thread Dexuan Cui
In the case the user-space daemon crashes, hangs or is killed, we
need to down the semaphore, otherwise, after the daemon starts next
time, the obsolete data in fcopy_transaction.message or
fcopy_transaction.fcopy_msg will be used immediately.

Cc: Jason Wang 
Cc: Vitaly Kuznetsov 
Cc: K. Y. Srinivasan 
Signed-off-by: Dexuan Cui 
---

v2: I removed the "FCP" prefix as Greg asked.

I also updated the output message a little:
"FCP: failed to acquire the semaphore" --> 
"can not acquire the semaphore: it is benign"

v3: I added the code in fcopy_release() as Jason Wang suggested.
I removed the pr_debug (it isn't so meaningful)and added a comment instead.

 drivers/hv/hv_fcopy.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
index 23b2ce2..faa6ba6 100644
--- a/drivers/hv/hv_fcopy.c
+++ b/drivers/hv/hv_fcopy.c
@@ -86,6 +86,18 @@ static void fcopy_work_func(struct work_struct *dummy)
 * process the pending transaction.
 */
fcopy_respond_to_host(HV_E_FAIL);
+
+   /* In the case the user-space daemon crashes, hangs or is killed, we
+* need to down the semaphore, otherwise, after the daemon starts next
+* time, the obsolete data in fcopy_transaction.message or
+* fcopy_transaction.fcopy_msg will be used immediately.
+*
+* NOTE: fcopy_read() happens to get the semaphore (very rare)? We're
+* still OK, because we've reported the failure to the host.
+*/
+   if (down_trylock(&fcopy_transaction.read_sema))
+   ;
+
 }
 
 static int fcopy_handle_handshake(u32 version)
@@ -351,6 +363,13 @@ static int fcopy_release(struct inode *inode, struct file 
*f)
 */
in_hand_shake = true;
opened = false;
+
+   if (cancel_delayed_work_sync(&fcopy_work)) {
+   /* We haven't up()-ed the semaphore(very rare)? */
+   if (down_trylock(&fcopy_transaction.read_sema))
+   ;
+   fcopy_respond_to_host(HV_E_FAIL);
+   }
return 0;
 }
 
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-28 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Friday, November 28, 2014 14:47 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> Subject: Re: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> On Thu, Nov 27, 2014 at 9:09 PM, Dexuan Cui  wrote:
> > In the case the user-space daemon crashes, hangs or is killed, we
> > need to down the semaphore, otherwise, after the daemon starts next
> > time, the obsolete data in fcopy_transaction.message or
> > fcopy_transaction.fcopy_msg will be used immediately.
> >
> > Cc: Jason Wang 
> > Cc: Vitaly Kuznetsov 
> > Cc: K. Y. Srinivasan 
> > Signed-off-by: Dexuan Cui 
> > ---
> >
> > v2: I removed the "FCP" prefix as Greg asked.
> >
> > I also updated the output message a little:
> > "FCP: failed to acquire the semaphore" -->
> > "can not acquire the semaphore: it is benign"
> >
> > v3: I added the code in fcopy_release() as Jason Wang suggested.
> > I removed the pr_debug (it isn't so meaningful)and added a
> > comment instead.
> >
> >  drivers/hv/hv_fcopy.c | 19 +++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> > index 23b2ce2..faa6ba6 100644
> > --- a/drivers/hv/hv_fcopy.c
> > +++ b/drivers/hv/hv_fcopy.c
> > @@ -86,6 +86,18 @@ static void fcopy_work_func(struct work_struct
> > *dummy)
> >  * process the pending transaction.
> >  */
> > fcopy_respond_to_host(HV_E_FAIL);
> > +
> > +   /* In the case the user-space daemon crashes, hangs or is killed, we
> > +* need to down the semaphore, otherwise, after the daemon starts
> > next
> > +* time, the obsolete data in fcopy_transaction.message or
> > +* fcopy_transaction.fcopy_msg will be used immediately.
> > +*
> > +* NOTE: fcopy_read() happens to get the semaphore (very rare)?
> > We're
> > +* still OK, because we've reported the failure to the host.
> > +*/
> > +   if (down_trylock(&fcopy_transaction.read_sema))
> > +   ;
> 
> Sorry, I'm not quite understand how if () ; can help here.
> 
> Btw, a question not relate to this patch.
> 
> What happens if a daemon is resume from SIGSTOP and expires the check
> here?
Hi Jason,
My idea is: here we need down_trylock(), but in case we can't get the
semaphore, it's OK anyway:

Scenario 1):
1.1: when the daemon is blocked on the pread(), the daemon receives SIGSTOP;
1.2: the host user runs the PowerShell Copy-VMFile command;
1.3.1: the driver reports the failure to the host user in 5s and
1.3.2: the driver down()-es the semaphore;
1.4: the daemon receives SIGCONT and it will be still blocked on the pread().
Without the down_trylock(), in 1.4, the daemon can receive an obsolete message.
NOTE: in this scenario, the daemon is not killed.

Scenario 2):
In senario 1), if the daemon receives SIGCONT between 1.3.1 and 1.3.2 and
do down() in fcopy_read(), it will receive the message but: the driver has
reported the failure to the host user and the driver's 1.3.2 can't get the
semaphore -- IMO this is acceptably OK, though in the VM, an incomplete
file will be left there.
BTW, I think in the daemon's hv_start_fcopy() we should add a
close(target_fd) before open()-ing a new one.

> >
> > +
> >  }
> >
> >  static int fcopy_handle_handshake(u32 version)
> > @@ -351,6 +363,13 @@ static int fcopy_release(struct inode *inode,
> > struct file *f)
> >  */
> > in_hand_shake = true;
> > opened = false;
> > +
> > +   if (cancel_delayed_work_sync(&fcopy_work)) {
> > +   /* We haven't up()-ed the semaphore(very rare)? */
> > +   if (down_trylock(&fcopy_transaction.read_sema))
> > +   ;
> 
> And this.

Scenario 3):
When the daemon exits(e.g., SIGKILL received), if there is a fcopy_work
pending (scheduled but not start to run yet), we should cancel the
work (as you suggested) and down() the semaphore, otherwise, the
obsolete message will be received by the next instance of the daemon.

Scenario 4):  in the driver's hv_fcopy_onchannelcallback():
schedule_delayed_work(&fcopy_work, 5*HZ);
> if fcopy_release() is running on another vcpu, just before the 
next line?
fcopy_send_data();

In this case, fcopy_release() can cancel fcopy_w

RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-11-28 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Friday, November 28, 2014 18:13 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> On Fri, Nov 28, 2014 at 4:36 PM, Dexuan Cui  wrote:
> >>  -Original Message-
> >>  From: Jason Wang [mailto:jasow...@redhat.com]
> >>  Sent: Friday, November 28, 2014 14:47 PM
> >>  To: Dexuan Cui
> >>  Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> >> driverdev-
> >>  de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> >>  Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> >>  Subject: Re: [PATCH v3] hv: hv_fcopy: drop the obsolete message on
> >> transfer
> >>  failure
> >>  On Thu, Nov 27, 2014 at 9:09 PM, Dexuan Cui 
> >> wrote:
> >>  > In the case the user-space daemon crashes, hangs or is killed, we
> >>  > need to down the semaphore, otherwise, after the daemon starts
> >> next
> >>  > time, the obsolete data in fcopy_transaction.message or
> >>  > fcopy_transaction.fcopy_msg will be used immediately.
> >>  >
> >>  > Cc: Jason Wang 
> >>  > Cc: Vitaly Kuznetsov 
> >>  > Cc: K. Y. Srinivasan 
> >>  > Signed-off-by: Dexuan Cui 
> >>  > ---
> >>  >
> >>  > v2: I removed the "FCP" prefix as Greg asked.
> >>  >
> >>  > I also updated the output message a little:
> >>  > "FCP: failed to acquire the semaphore" -->
> >>  > "can not acquire the semaphore: it is benign"
> >>  >
> >>  > v3: I added the code in fcopy_release() as Jason Wang suggested.
> >>  > I removed the pr_debug (it isn't so meaningful)and added a
> >>  > comment instead.
> >>  >
> >>  >  drivers/hv/hv_fcopy.c | 19 +++
> >>  >  1 file changed, 19 insertions(+)
> >>  >
> >>  > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> >>  > index 23b2ce2..faa6ba6 100644
> >>  > --- a/drivers/hv/hv_fcopy.c
> >>  > +++ b/drivers/hv/hv_fcopy.c
> >>  > @@ -86,6 +86,18 @@ static void fcopy_work_func(struct work_struct
> >>  > *dummy)
> >>  >  * process the pending transaction.
> >>  >  */
> >>  > fcopy_respond_to_host(HV_E_FAIL);
> >>  > +
> >>  > +   /* In the case the user-space daemon crashes, hangs or is
> >> killed, we
> >>  > +* need to down the semaphore, otherwise, after the daemon
> >> starts
> >>  > next
> >>  > +* time, the obsolete data in fcopy_transaction.message or
> >>  > +* fcopy_transaction.fcopy_msg will be used immediately.
> >>  > +*
> >>  > +* NOTE: fcopy_read() happens to get the semaphore (very rare)?
> >>  > We're
> >>  > +* still OK, because we've reported the failure to the host.
> >>  > +*/
> >>  > +   if (down_trylock(&fcopy_transaction.read_sema))
> >>  > +   ;
> >>
> >>  Sorry, I'm not quite understand how if () ; can help here.
> >>
> >>  Btw, a question not relate to this patch.
> >>
> >>  What happens if a daemon is resume from SIGSTOP and expires the
> >> check
> >>  here?
> > Hi Jason,
> > My idea is: here we need down_trylock(), but in case we can't get the
> > semaphore, it's OK anyway:
> >
> > Scenario 1):
> > 1.1: when the daemon is blocked on the pread(), the daemon receives
> > SIGSTOP;
> > 1.2: the host user runs the PowerShell Copy-VMFile command;
> > 1.3.1: the driver reports the failure to the host user in 5s and
> > 1.3.2: the driver down()-es the semaphore;
> > 1.4: the daemon receives SIGCONT and it will be still blocked on the
> > pread().
> > Without the down_trylock(), in 1.4, the daemon can receive an
> > obsolete message.
> > NOTE: in this scenario, the daemon is not killed.
> >
> > Scenario 2):
> > In senario 1), if the daemon receives SIGCONT between 1.3.1 and 1.3.2
> > and
> > do down() in fcopy_read(), it will receive the message but: 

RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-12-01 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, December 1, 2014 16:23 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> On Fri, Nov 28, 2014 at 7:54 PM, Dexuan Cui  wrote:
> >>  -Original Message-
> >>  From: Jason Wang [mailto:jasow...@redhat.com]
> >>  Sent: Friday, November 28, 2014 18:13 PM
> >>  To: Dexuan Cui
> >>  Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> >> driverdev-
> >>  de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> >>  Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> >>  Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on
> >> transfer
> >>  failure
> >>  On Fri, Nov 28, 2014 at 4:36 PM, Dexuan Cui 
> >> wrote:
> >>  >>  -Original Message-
> >>  >>  From: Jason Wang [mailto:jasow...@redhat.com]
> >>  >>  Sent: Friday, November 28, 2014 14:47 PM
> >>  >>  To: Dexuan Cui
> >>  >>  Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> >>  >> driverdev-
> >>  >>  de...@linuxdriverproject.org; o...@aepfle.de;
> >> a...@canonical.com; KY
> >>  >>  Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> >>  >>  Subject: Re: [PATCH v3] hv: hv_fcopy: drop the obsolete message
> >> on
> >>  >> transfer
> >>  >>  failure
> >>  >>  On Thu, Nov 27, 2014 at 9:09 PM, Dexuan Cui
> >> 
> >>  >> wrote:
> >>  >>  > In the case the user-space daemon crashes, hangs or is
> >> killed, we
> >>  >>  > need to down the semaphore, otherwise, after the daemon starts
> >>  >> next
> >>  >>  > time, the obsolete data in fcopy_transaction.message or
> >>  >>  > fcopy_transaction.fcopy_msg will be used immediately.
> >>  >>  >
> >>  >>  > Cc: Jason Wang 
> >>  >>  > Cc: Vitaly Kuznetsov 
> >>  >>  > Cc: K. Y. Srinivasan 
> >>  >>  > Signed-off-by: Dexuan Cui 
> >>  >>  > ---
> >>  >>  >
> >>  >>  > v2: I removed the "FCP" prefix as Greg asked.
> >>  >>  >
> >>  >>  > I also updated the output message a little:
> >>  >>  > "FCP: failed to acquire the semaphore" -->
> >>  >>  > "can not acquire the semaphore: it is benign"
> >>  >>  >
> >>  >>  > v3: I added the code in fcopy_release() as Jason Wang
> >> suggested.
> >>  >>  > I removed the pr_debug (it isn't so meaningful)and added a
> >>  >>  > comment instead.
> >>  >>  >
> >>  >>  >  drivers/hv/hv_fcopy.c | 19 +++
> >>  >>  >  1 file changed, 19 insertions(+)
> >>  >>  >
> >>  >>  > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> >>  >>  > index 23b2ce2..faa6ba6 100644
> >>  >>  > --- a/drivers/hv/hv_fcopy.c
> >>  >>  > +++ b/drivers/hv/hv_fcopy.c
> >>  >>  > @@ -86,6 +86,18 @@ static void fcopy_work_func(struct
> >> work_struct
> >>  >>  > *dummy)
> >>  >>  >  * process the pending transaction.
> >>  >>  >  */
> >>  >>  > fcopy_respond_to_host(HV_E_FAIL);
> >>  >>  > +
> >>  >>  > +   /* In the case the user-space daemon crashes, hangs or is
> >>  >> killed, we
> >>  >>  > +* need to down the semaphore, otherwise, after the daemon
> >>  >> starts
> >>  >>  > next
> >>  >>  > +* time, the obsolete data in fcopy_transaction.message or
> >>  >>  > +* fcopy_transaction.fcopy_msg will be used immediately.
> >>  >>  > +*
> >>  >>  > +* NOTE: fcopy_read() happens to get the semaphore (very
> >> rare)?
> >>  >>  > We're
> >>  >>  > +* still OK, because we've reported the failure to the host.
> >>  >>  > +*/
> >>  >>  > +   if (down_t

RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-12-01 Thread Dexuan Cui
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, December 1, 2014 18:18 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> 
> 
> 
> On Mon, Dec 1, 2014 at 5:47 PM, Dexuan Cui  wrote:
> >>  -Original Message-
> >>  From: Jason Wang [mailto:jasow...@redhat.com]
> >>  Sent: Monday, December 1, 2014 16:23 PM
> >>  To: Dexuan Cui
> >>  Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> >> driverdev-
> >>  de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> >>  Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> >>  Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on
> >> transfer
> >>  failure
> >>  On Fri, Nov 28, 2014 at 7:54 PM, Dexuan Cui 
> >> wrote:
> >>  >>  -Original Message-
> >>  >>  From: Jason Wang [mailto:jasow...@redhat.com]
> >>  >>  Sent: Friday, November 28, 2014 18:13 PM
> >>  >>  To: Dexuan Cui
> >>  >>  Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> >>  >> driverdev-
> >>  >>  de...@linuxdriverproject.org; o...@aepfle.de;
> >> a...@canonical.com; KY
> >>  >>  Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> >>  >>  Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message
> >> on
> >>  >> transfer
> >>  >>  failure
> >>  >>  On Fri, Nov 28, 2014 at 4:36 PM, Dexuan Cui
> >> 
> >>  >> wrote:
> >>  >>  >>  -Original Message-
> >>  >>  >>  From: Jason Wang [mailto:jasow...@redhat.com]
> >>  >>  >>  Sent: Friday, November 28, 2014 14:47 PM
> >>  >>  >>  To: Dexuan Cui
> >>  >>  >>  Cc: gre...@linuxfoundation.org;
> >> linux-ker...@vger.kernel.org;
> >>  >>  >> driverdev-
> >>  >>  >>  de...@linuxdriverproject.org; o...@aepfle.de;
> >>  >> a...@canonical.com; KY
> >>  >>  >>  Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> >>  >>  >>  Subject: Re: [PATCH v3] hv: hv_fcopy: drop the obsolete
> >> message
> >>  >> on
> >>  >>  >> transfer
> >>  >>  >>  failure
> >>  >>  >>  On Thu, Nov 27, 2014 at 9:09 PM, Dexuan Cui
> >>  >> 
> >>  >>  >> wrote:
> >>  >>  >>  > In the case the user-space daemon crashes, hangs or is
> >>  >> killed, we
> >>  >>  >>  > need to down the semaphore, otherwise, after the daemon
> >> starts
> >>  >>  >> next
> >>  >>  >>  > time, the obsolete data in fcopy_transaction.message or
> >>  >>  >>  > fcopy_transaction.fcopy_msg will be used immediately.
> >>  >>  >>  >
> >>  >>  >>  > Cc: Jason Wang 
> >>  >>  >>  > Cc: Vitaly Kuznetsov 
> >>  >>  >>  > Cc: K. Y. Srinivasan 
> >>  >>  >>  > Signed-off-by: Dexuan Cui 
> >>  >>  >>  > ---
> >>  >>  >>  >
> >>  >>  >>  > v2: I removed the "FCP" prefix as Greg asked.
> >>  >>  >>  >
> >>  >>  >>  > I also updated the output message a little:
> >>  >>  >>  > "FCP: failed to acquire the semaphore" -->
> >>  >>  >>  > "can not acquire the semaphore: it is benign"
> >>  >>  >>  >
> >>  >>  >>  > v3: I added the code in fcopy_release() as Jason Wang
> >>  >> suggested.
> >>  >>  >>  > I removed the pr_debug (it isn't so meaningful)and
> >> added a
> >>  >>  >>  > comment instead.
> >>  >>  >>  >
> >>  >>  >>  >  drivers/hv/hv_fcopy.c | 19 +++
> >>  >>  >>  >  1 file changed, 19 insertions(+)
> >>  >>  >>  >
> >>  >>  >>  > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
> >>  >>  >>  > 

RE: [PATCH v2] Drivers: hv: vmbus: prevent cpu offlining on newer hypervisors

2014-12-01 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> Sent: Monday, December 1, 2014 18:53 PM
> To: KY Srinivasan; Haiyang Zhang
> Cc: de...@linuxdriverproject.org; Greg Kroah-Hartman; linux-
> ker...@vger.kernel.org; Dexuan Cui
> Subject: [PATCH v2] Drivers: hv: vmbus: prevent cpu offlining on newer
> hypervisors
> 
> When an SMP Hyper-V guest is running on top of 2012R2 Server and
> secondary
> cpus are sent offline (with echo 0 > /sys/devices/system/cpu/cpu$cpu/online)
> the system freeze is observed. This happens due to the fact that on newer
> hypervisors (Win8, WS2012R2, ...) vmbus channel handlers are distributed
> across all cpus (see init_vp_index() function in drivers/hv/channel_mgmt.c)
> and on cpu offlining nobody reassigns them to CPU0. Prevent cpu offlining
> when vmbus is loaded until the issue is fixed host-side.
> 
> This patch also disables hibernation but it is OK as it is also broken (MCE
> error is hit on resume). Suspend still works.
> 
> Tested with WS2008R2 and WS2012R2.
> 
> Signed-off-by: Vitaly Kuznetsov 
> 
> ---
> Changes since v1:
> - introduce hv_cpu_hotplug_quirk() function to not spread #ifdefs [Greg KH]
> - add pr_notice() message "hv_vmbus: CPU offlining is not supported by
> hypervisor"
> ---
>  drivers/hv/vmbus_drv.c | 33 +
>  1 file changed, 33 insertions(+)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 4d6b269..2e6b38e 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -32,6 +32,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -671,6 +672,36 @@ static void vmbus_isr(void)
>   tasklet_schedule(&msg_dpc);
>  }
> 
> +#ifdef CONFIG_HOTPLUG_CPU
> +static int hyperv_cpu_disable(void)
> +{
> + return -1;
> +}
> +
> +static void hv_cpu_hotplug_quirk(bool vmbus_loaded)
> +{
> + /*
> +  * Offlining a CPU when running on newer hypervisors (WS2012R2,
> Win8,
> +  * ...) is not supported at this moment as channel interrupts are
> +  * distributed across all of them.
> +  */
> +
> + if ((vmbus_proto_version == VERSION_WS2008) ||
> + (vmbus_proto_version == VERSION_WIN7))
> + return;
> +
> + if (vmbus_loaded) {
> + smp_ops.cpu_disable = hyperv_cpu_disable;
> + pr_notice("CPU offlining is not supported by hypervisor");
> + } else
> + smp_ops.cpu_disable = native_cpu_disable;
> +}
> +#else
> +static void hv_cpu_hotplug_quirk(bool vmbus_loaded)
> +{
> +}
> +#endif
> +
>  /*
>   * vmbus_bus_init -Main vmbus driver initialization routine.
>   *
> @@ -711,6 +742,7 @@ static int vmbus_bus_init(int irq)
>   if (ret)
>   goto err_alloc;
> 
> + hv_cpu_hotplug_quirk(true);
>   vmbus_request_offers();
> 
>   return 0;
> @@ -964,6 +996,7 @@ static void __exit vmbus_exit(void)
>   bus_unregister(&hv_bus);
>   hv_cleanup();
>   acpi_bus_unregister_driver(&vmbus_acpi_driver);
> + hv_cpu_hotplug_quirk(false);
>  }

Acked-by: Dexuan Cui 

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer failure

2014-12-01 Thread Dexuan Cui
> -Original Message-
> From: KY Srinivasan
> Sent: Monday, December 1, 2014 23:55 PM
> To: Dexuan Cui; Jason Wang
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> vkuzn...@redhat.com; Haiyang Zhang
> Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer
> failure
> 
> 
> 
> > -----Original Message-
> > From: Dexuan Cui
> > Sent: Monday, December 1, 2014 3:01 AM
> > To: Jason Wang
> > Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; KY
> > Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> > Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on transfer
> > failure
> >
> > > -Original Message-
> > > From: Jason Wang [mailto:jasow...@redhat.com]
> > > Sent: Monday, December 1, 2014 18:18 PM
> > > To: Dexuan Cui
> > > Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> > > driverdev- de...@linuxdriverproject.org; o...@aepfle.de;
> > > a...@canonical.com; KY Srinivasan; vkuzn...@redhat.com; Haiyang
> Zhang
> > > Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on
> > > transfer failure
> > >
> > >
> > >
> > > On Mon, Dec 1, 2014 at 5:47 PM, Dexuan Cui 
> > wrote:
> > > >>  -Original Message-
> > > >>  From: Jason Wang [mailto:jasow...@redhat.com]
> > > >>  Sent: Monday, December 1, 2014 16:23 PM
> > > >>  To: Dexuan Cui
> > > >>  Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> > > >> driverdev-
> > > >>  de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> > > >> KY  Srinivasan; vkuzn...@redhat.com; Haiyang Zhang
> > > >>  Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on
> > > >> transfer  failure  On Fri, Nov 28, 2014 at 7:54 PM, Dexuan Cui
> > > >> 
> > > >> wrote:
> > > >>  >>  -Original Message-
> > > >>  >>  From: Jason Wang [mailto:jasow...@redhat.com]  >>  Sent:
> > > >> Friday, November 28, 2014 18:13 PM  >>  To: Dexuan Cui  >>  Cc:
> > > >> gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;  >>
> > > >> driverdev-  >>  de...@linuxdriverproject.org; o...@aepfle.de;
> > > >> a...@canonical.com; KY  >>  Srinivasan; vkuzn...@redhat.com;
> Haiyang
> > > >> Zhang  >>  Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete
> > > >> message on  >> transfer  >>  failure  >>  On Fri, Nov 28, 2014 at
> > > >> 4:36 PM, Dexuan Cui   >> wrote:
> > > >>  >>  >>  -Original Message-  >>  >>  From: Jason Wang
> > > >> [mailto:jasow...@redhat.com]  >>  >>  Sent: Friday, November 28,
> > > >> 2014 14:47 PM  >>  >>  To: Dexuan Cui  >>  >>  Cc:
> > > >> gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;  >>  >>
> > > >> driverdev-  >>  >>  de...@linuxdriverproject.org; o...@aepfle.de;
> > > >> >> a...@canonical.com; KY  >>  >>  Srinivasan; vkuzn...@redhat.com;
> > > >> Haiyang Zhang  >>  >>  Subject: Re: [PATCH v3] hv: hv_fcopy: drop
> > > >> the obsolete message  >> on  >>  >> transfer  >>  >>  failure  >>
> > > >> >>  On Thu, Nov 27, 2014 at 9:09 PM, Dexuan Cui  >>
> > > >>   >>  >> wrote:
> > > >>  >>  >>  > In the case the user-space daemon crashes, hangs or is
> > > >> >> killed, we  >>  >>  > need to down the semaphore, otherwise,
> > > >> after the daemon starts  >>  >> next  >>  >>  > time, the obsolete
> > > >> data in fcopy_transaction.message or  >>  >>  >
> > > >> fcopy_transaction.fcopy_msg will be used immediately.
> > > >>  >>  >>  >
> > > >>  >>  >>  > Cc: Jason Wang   >>  >>  > Cc:
> > > >> Vitaly Kuznetsov   >>  >>  > Cc: K. Y.
> > > >> Srinivasan   >>  >>  > Signed-off-by: Dexuan Cui
> > > >>   >>  >>  > ---  >

[PATCH] tools: hv: kvp_daemon: make IPv6-only-injection work

2014-12-09 Thread Dexuan Cui
Currently IPv6-only-injection doesn't work because the daemon doesn't parse
any IPv6 information at all once it finds the dhcp_enabled flag is true.

But according to the Hyper-v host team, the flag is only for IPv4.
In the case the host only injects 1 IPv6 address, the dhcp flag is true, but
we shouldn't ignore the IPv6 address and we should pass BOOTPROTO=none
to the distro-specific script hv_set_ipconfig.

Tested in Ubuntu 14.10 and RHEL7.

Cc: K. Y. Srinivasan 
Signed-off-by: Dexuan Cui 
---
 tools/hv/hv_kvp_daemon.c | 47 +++
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 6a6432a..6ef6c04 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1145,6 +1145,9 @@ static int kvp_write_file(FILE *f, char *s1, char *s2, 
char *s3)
 }
 
 
+/* How many ipv6 addresses the host is trying to inject? */
+static int num_ipv6_injected;
+
 static int process_ip_string(FILE *f, char *ip_string, int type)
 {
int error = 0;
@@ -1190,6 +1193,7 @@ static int process_ip_string(FILE *f, char *ip_string, 
int type)
switch (type) {
case IPADDR:
snprintf(str, sizeof(str), "%s", "IPV6ADDR");
+   num_ipv6_injected++;
break;
case NETMASK:
snprintf(str, sizeof(str), "%s", "IPV6NETMASK");
@@ -1308,27 +1312,12 @@ static int kvp_set_ip_info(char *if_name, struct 
hv_kvp_ipaddr_value *new_val)
if (error)
goto setval_error;
 
-   if (new_val->dhcp_enabled) {
-   error = kvp_write_file(file, "BOOTPROTO", "", "dhcp");
-   if (error)
-   goto setval_error;
-
-   /*
-* We are done!.
-*/
-   goto setval_done;
-
-   } else {
-   error = kvp_write_file(file, "BOOTPROTO", "", "none");
-   if (error)
-   goto setval_error;
-   }
-
/*
 * Write the configuration for ipaddress, netmask, gateway and
 * name servers.
 */
 
+   num_ipv6_injected = 0;
error = process_ip_string(file, (char *)new_val->ip_addr, IPADDR);
if (error)
goto setval_error;
@@ -1345,6 +1334,32 @@ static int kvp_set_ip_info(char *if_name, struct 
hv_kvp_ipaddr_value *new_val)
if (error)
goto setval_error;
 
+   /*
+* Here "dhcp_enabled" is only for IPv4 according to Hyper-V host team.
+*
+* In the case the host only injects 1 IPv6 address:
+* new_val->dhcp_enabled is true, but we can't pass BOOTPROTO=dhcp to
+* the script hv_set_ifconfig, because in some distros (like RHEL7)
+* BOOTPROTO=dhcp has a special meaning in the config file (e.g.,
+* /etc/sysconfig/network-scripts/ifcfg-eth0): the network init program
+* ignores any static IP addr information once there is
+* BOOTPROTO=dhcp; as a result, IPv6-only injection can't work.
+*
+* In the case of IPv6-only injection, BOOTPROTO=dhcp doesn't affect
+* Ubuntu because it's ignored by the Ubuntu version of
+* hv_set_ifconfig and it doesn't seem to have special meaning in
+* Ubuntu.
+*/
+   if (new_val->dhcp_enabled && num_ipv6_injected == 0) {
+   error = kvp_write_file(file, "BOOTPROTO", "", "dhcp");
+   if (error)
+   goto setval_error;
+   } else {
+   error = kvp_write_file(file, "BOOTPROTO", "", "none");
+   if (error)
+   goto setval_error;
+   }
+
 setval_done:
fclose(file);
 
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH 0/5] Tools: hv: fix compiler warnings and do minor cleanup

2014-12-09 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> Sent: Tuesday, December 9, 2014 23:48 PM
> To: KY Srinivasan
> Cc: Haiyang Zhang; de...@linuxdriverproject.org; linux-
> ker...@vger.kernel.org; Dexuan Cui
> Subject: [PATCH 0/5] Tools: hv: fix compiler warnings and do minor cleanup
> 
> When someone does 'make' in tools/hv/ issues appear:
> - hv_fcopy_daemon is not being built;
> - lots of compiler warnings.
> 
> This is just a cleanup. Compile-tested by myself on top of linux-next/master.
> 
> Piggyback this series and send "[PATCH 5/5] Tools: hv: do not add redundant
> '/'
>  in hv_start_fcopy()"
> 
> Vitaly Kuznetsov (5):
>   Tools: hv: add mising fcopyd to the Makefile
>   Tools: hv: remove unused bytes_written from kvp_update_file()
>   Tools: hv: address compiler warnings for hv_kvp_daemon.c
>   Tools: hv: address compiler warnings for hv_fcopy_daemon.c
>   Tools: hv: do not add redundant '/' in hv_start_fcopy()
> 
>  tools/hv/Makefile  |  4 ++--
>  tools/hv/hv_fcopy_daemon.c | 10 ++
>  tools/hv/hv_kvp_daemon.c   | 29 +----
>  3 files changed, 17 insertions(+), 26 deletions(-)
> 
> --
> 1.9.3

Hi Vitaly,
Thanks for the patchset!

Acked-by: Dexuan Cui 

PS, I added Greg into the TO list.
The hv code in drivers/hv/ and tools/hv/ usually has to go into
Greg's tree first.

-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection work

2014-12-09 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> Sent: Tuesday, December 9, 2014 21:06 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; KY Srinivasan; Haiyang Zhang
> Subject: Re: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection work
>  ..
> > +* Here "dhcp_enabled" is only for IPv4 according to Hyper-V host
> team.
> > +*
> > +* In the case the host only injects 1 IPv6 address:
> > +* new_val->dhcp_enabled is true, but we can't pass
> BOOTPROTO=dhcp to
> > +* the script hv_set_ifconfig, because in some distros (like RHEL7)
> > +* BOOTPROTO=dhcp has a special meaning in the config file (e.g.,
> > +* /etc/sysconfig/network-scripts/ifcfg-eth0): the network init
> program
> > +* ignores any static IP addr information once there is
> > +* BOOTPROTO=dhcp; as a result, IPv6-only injection can't work.
> > +*
> > +* In the case of IPv6-only injection, BOOTPROTO=dhcp doesn't affect
> > +* Ubuntu because it's ignored by the Ubuntu version of
> > +* hv_set_ifconfig and it doesn't seem to have special meaning in
> > +* Ubuntu.
> > +*/
> 
> I just checked and adding "IPV6ADDR=something" when "BOOTPROTO=dhcp"
> works for me with both RHEL7 and Fedora21.
It doesn't work in my side. :-(
Running 'ifup eth0' shows some errors(I use "set -x")
...
+ /sbin/dhclient -H localhost -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease 
-pf /var/run/dhclient-eth0.pid eth0
grep: /etc/sysconfig/network-scripts/ifcfg-eth0: Permission dinied.
grep: /etc/sysconfig/network-scripts/ifcfg-eth0: Permission dinied.
grep: /etc/sysconfig/network-scripts/ifcfg-eth0: Permission dinied.
grep: /etc/sysconfig/network-scripts/ifcfg-eth0: Permission dinied.
grep: /etc/sysconfig/network-scripts/ifcfg-eth0: Permission dinied.
grep: /etc/sysconfig/network-scripts/ifcfg-eth0: Permission dinied.
done.

I'm trying to find out the cause.

> Other than that I think bringing distribution specifics into kernel.git
> is not a good idea. /etc/sysconfig/network-scripts/ifcfg-* format is
> distro-specific and not all Linux distros support it. Moreover,
I agree.

> different distros can treat setting differently. I think it was wrong to
> stick to this format in kvp daemon from very beginning.
We can also think the current format used in kvp daemon is already
distro-agnostic -- it just happens to look like the style of network config file
used in RHEL :-)

> 
> As a solution I would suggest doing the following: kvp daemon writes all
> received request details in distro-agnostic format in some temporary
> place and then calls distro-specific script to set things up. Actually,
> we already have such script: tools/hv/hv_set_ifconfig.sh
Yeah, this is exactly what we already have today.

> As for this bug I propose the following: remove skipping all
> IPADDR/MASK/... settings in case of "BOOTPROTO=dhcp" and let
> distro-specific script deal with the rest.
> --
>   Vitaly
OK, so the patch would be 1-line only:

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 22b0764..53fdaad 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1314,10 +1314,8 @@ static int kvp_set_ip_info(char *if_name, struct 
hv_kvp_ipaddr_value *new_val)
goto setval_error;

/*
-* We are done!.
+* We are not done... TODO: add comment here.
 */
-   goto setval_done;
-
} else {
error = kvp_write_file(file, "BOOTPROTO", "", "none");
if (error)

I'll send out a v2 after I resolve the "grep ... Permission dinied" issue.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection work

2014-12-09 Thread Dexuan Cui


Thanks,
-- Dexuan

> -Original Message-
> From: Dexuan Cui
> Sent: Wednesday, December 10, 2014 15:34 PM
> To: 'Vitaly Kuznetsov'
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; KY Srinivasan; Haiyang Zhang
> Subject: RE: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection work
> 
> > -Original Message-
> > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> > Sent: Tuesday, December 9, 2014 21:06 PM
> > To: Dexuan Cui
> > Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> > jasow...@redhat.com; KY Srinivasan; Haiyang Zhang
> > Subject: Re: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection work
> >  ..
> > > +  * Here "dhcp_enabled" is only for IPv4 according to Hyper-V host
> > team.
> > > +  *
> > > +  * In the case the host only injects 1 IPv6 address:
> > > +  * new_val->dhcp_enabled is true, but we can't pass
> > BOOTPROTO=dhcp to
> > > +  * the script hv_set_ifconfig, because in some distros (like RHEL7)
> > > +  * BOOTPROTO=dhcp has a special meaning in the config file (e.g.,
> > > +  * /etc/sysconfig/network-scripts/ifcfg-eth0): the network init
> > program
> > > +  * ignores any static IP addr information once there is
> > > +  * BOOTPROTO=dhcp; as a result, IPv6-only injection can't work.
> > > +  *
> > > +  * In the case of IPv6-only injection, BOOTPROTO=dhcp doesn't affect
> > > +  * Ubuntu because it's ignored by the Ubuntu version of
> > > +  * hv_set_ifconfig and it doesn't seem to have special meaning in
> > > +  * Ubuntu.
> > > +  */
> >
> > I just checked and adding "IPV6ADDR=something" when
> "BOOTPROTO=dhcp"
> > works for me with both RHEL7 and Fedora21.
> It doesn't work in my side. :-(
> Running 'ifup eth0' shows some errors(I use "set -x")
> ...
> + /sbin/dhclient -H localhost -1 -q -lf 
> /var/lib/dhclient/dhclient--eth0.lease -pf
> /var/run/dhclient-eth0.pid eth0
> grep: /etc/sysconfig/network-scripts/ifcfg-eth0: Permission dinied.
BTW, I run with root, and
'chown 777 /etc/sysconfig/network-scripts/ifcfg-eth0" doesn't help.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection work

2014-12-10 Thread Dexuan Cui
> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> Sent: Wednesday, December 10, 2014 17:08 PM
> To: Dexuan Cui
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; KY Srinivasan; Haiyang Zhang
> Subject: Re: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection work
> 
> Dexuan Cui  writes:
> 
> > Thanks,
> > -- Dexuan
> >
> >> -Original Message-
> >> From: Dexuan Cui
> >> Sent: Wednesday, December 10, 2014 15:34 PM
> >> To: 'Vitaly Kuznetsov'
> >> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> >> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> >> jasow...@redhat.com; KY Srinivasan; Haiyang Zhang
> >> Subject: RE: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection
> work
> >>
> >> > -Original Message-
> >> > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> >> > Sent: Tuesday, December 9, 2014 21:06 PM
> >> > To: Dexuan Cui
> >> > Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> driverdev-
> >> > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> >> > jasow...@redhat.com; KY Srinivasan; Haiyang Zhang
> >> > Subject: Re: [PATCH] tools: hv: kvp_daemon: make IPv6-only-injection
> work
> >> >  ..
> >> > > +   * Here "dhcp_enabled" is only for IPv4 according to Hyper-V 
> >> > > host
> >> > team.
> >> > > +   *
> >> > > +   * In the case the host only injects 1 IPv6 address:
> >> > > +   * new_val->dhcp_enabled is true, but we can't pass
> >> > BOOTPROTO=dhcp to
> >> > > +   * the script hv_set_ifconfig, because in some distros (like 
> >> > > RHEL7)
> >> > > +   * BOOTPROTO=dhcp has a special meaning in the config file 
> >> > > (e.g.,
> >> > > +   * /etc/sysconfig/network-scripts/ifcfg-eth0): the network init
> >> > program
> >> > > +   * ignores any static IP addr information once there is
> >> > > +   * BOOTPROTO=dhcp; as a result, IPv6-only injection can't work.
> >> > > +   *
> >> > > +   * In the case of IPv6-only injection, BOOTPROTO=dhcp doesn't 
> >> > > affect
> >> > > +   * Ubuntu because it's ignored by the Ubuntu version of
> >> > > +   * hv_set_ifconfig and it doesn't seem to have special meaning 
> >> > > in
> >> > > +   * Ubuntu.
> >> > > +   */
> >> >
> >> > I just checked and adding "IPV6ADDR=something" when
> >> "BOOTPROTO=dhcp"
> >> > works for me with both RHEL7 and Fedora21.
> >> It doesn't work in my side. :-(
> >> Running 'ifup eth0' shows some errors(I use "set -x")
> >> ...
> >> + /sbin/dhclient -H localhost -1 -q -lf 
> >> /var/lib/dhclient/dhclient--eth0.lease
> -pf
> >> /var/run/dhclient-eth0.pid eth0
> >> grep: /etc/sysconfig/network-scripts/ifcfg-eth0: Permission dinied.
> > BTW, I run with root, and
> > 'chown 777 /etc/sysconfig/network-scripts/ifcfg-eth0" doesn't help.
> >
> 
> s,chown,chmod, :-) But it won't help in case of SELinux mislabeling.
> 
> > Thanks,
> > -- Dexuan
> 
> --
>   Vitaly

Hi Vitally,
I think you're correct:
BOOTPROTO=dhcp + "no IPADDR" +  IPV6ADDR in RHEL7's 
/etc/sysconfig/network-scripts/ifcfg-eth0 works for me too.

My previous "grep: ... ifcfg-eth0: Permission denied" issue can't repro now.
Maybe it's because I messed up the config file somehow(?).

I'll send out a v2 patch according to your suggestion.

Thanks,
-- Dexuan 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH v2] tools: hv: kvp_daemon: make IPv6-only-injection work

2014-12-10 Thread Dexuan Cui
In the case the host only injects an IPv6 address, the dhcp_enabled flag is
true (it's only for IPv4 according to Hyper-V host team), but we still need to
proceed to parse the IPv6 information.

Cc: Vitaly Kuznetsov 
Cc: K. Y. Srinivasan 
Signed-off-by: Dexuan Cui 
---

v2: removed the distro-specific logic as Vitaly suggested.

 tools/hv/hv_kvp_daemon.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
index 6a6432a..4b3ee35 100644
--- a/tools/hv/hv_kvp_daemon.c
+++ b/tools/hv/hv_kvp_daemon.c
@@ -1308,16 +1308,17 @@ static int kvp_set_ip_info(char *if_name, struct 
hv_kvp_ipaddr_value *new_val)
if (error)
goto setval_error;
 
+   /*
+* The dhcp_enabled flag is only for IPv4. In the case the host only
+* injects an IPv6 address, the flag is true, but we still need to
+* proceed to parse and pass the IPv6 information to the
+* disto-specific script hv_set_ifconfig.
+*/
if (new_val->dhcp_enabled) {
error = kvp_write_file(file, "BOOTPROTO", "", "dhcp");
if (error)
goto setval_error;
 
-   /*
-* We are done!.
-*/
-   goto setval_done;
-
} else {
error = kvp_write_file(file, "BOOTPROTO", "", "none");
if (error)
@@ -1345,7 +1346,6 @@ static int kvp_set_ip_info(char *if_name, struct 
hv_kvp_ipaddr_value *new_val)
if (error)
goto setval_error;
 
-setval_done:
fclose(file);
 
/*
-- 
1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH v2] tools: hv: kvp_daemon: make IPv6-only-injection work

2014-12-16 Thread Dexuan Cui
> -Original Message-
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On
> Behalf Of Dexuan Cui
> Sent: Wednesday, December 10, 2014 19:33 PM
> To: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; driverdev-
> de...@linuxdriverproject.org; vkuzn...@redhat.com; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; KY Srinivasan
> Cc: Haiyang Zhang
> Subject: [PATCH v2] tools: hv: kvp_daemon: make IPv6-only-injection work
> 
> In the case the host only injects an IPv6 address, the dhcp_enabled flag is
> true (it's only for IPv4 according to Hyper-V host team), but we still need to
> proceed to parse the IPv6 information.
> 
> Cc: Vitaly Kuznetsov 
> Cc: K. Y. Srinivasan 
> Signed-off-by: Dexuan Cui 
> ---
> 
> v2: removed the distro-specific logic as Vitaly suggested.
> 
>  tools/hv/hv_kvp_daemon.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c
> index 6a6432a..4b3ee35 100644
> --- a/tools/hv/hv_kvp_daemon.c
> +++ b/tools/hv/hv_kvp_daemon.c
> @@ -1308,16 +1308,17 @@ static int kvp_set_ip_info(char *if_name, struct
> hv_kvp_ipaddr_value *new_val)
>   if (error)
>   goto setval_error;
> 
> + /*
> +  * The dhcp_enabled flag is only for IPv4. In the case the host only
> +  * injects an IPv6 address, the flag is true, but we still need to
> +  * proceed to parse and pass the IPv6 information to the
> +  * disto-specific script hv_set_ifconfig.
> +  */
>   if (new_val->dhcp_enabled) {
>   error = kvp_write_file(file, "BOOTPROTO", "", "dhcp");
>   if (error)
>   goto setval_error;
> 
> - /*
> -  * We are done!.
> -  */
> - goto setval_done;
> -
>   } else {
>   error = kvp_write_file(file, "BOOTPROTO", "", "none");
>   if (error)
> @@ -1345,7 +1346,6 @@ static int kvp_set_ip_info(char *if_name, struct
> hv_kvp_ipaddr_value *new_val)
>   if (error)
>   goto setval_error;
> 
> -setval_done:
>   fclose(file);
> 
>   /*
> --
> 1.9.1

Hi Vitaly,
Can you please ACK the v2 patch?
Or, please let me know if you have new comments.

Thanks,
-- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


  1   2   3   4   5   6   >