Re: [PATCH] mtd: spi-nor: add support for Microchip 25LC256

2018-05-18 Thread Marek Vasut
On 05/18/2018 03:00 PM, Radu Pirea wrote:
> 
> 
> On 05/18/2018 01:03 PM, Marek Vasut wrote:
>> On 05/18/2018 11:50 AM, Radu Pirea wrote:
>>>
>>>
>>> On 05/16/2018 04:47 PM, Marek Vasut wrote:
 On 05/16/2018 12:05 PM, Radu Pirea wrote:
> On Wed, 2018-05-16 at 00:17 +0200, Marek Vasut wrote:
>> On 05/15/2018 06:22 PM, Radu Pirea wrote:
>>> On Fri, 2018-05-04 at 20:40 +0200, Boris Brezillon wrote:
 On Fri, 4 May 2018 18:54:04 +0300
 Radu Pirea  wrote:

> Added geometry description for Microchip 25LC256 memory.

 Same as for the dataflash stuff you posted a few weeks ago: I
 don't
 think this device belongs in the SPI NOR framework.
>>>
>>> Hi Boris,
>>>
>>> 25lc256 memory is similar with mr25h256, the only difference is the
>>> page size(64 vs 256). Because mr25h256 is already in SPI NOR
>>> framework
>>> I added here 25lc256.
>>
>> I think I must be reading the wrong datasheet, but can you show me
>> how
>> does it support things like READID opcode ?
>>
> Hi Marek,
>
> I read the datasheet for 25lc256 and for mr25h256 and none of them
> supports READID. Is this required for a chip to be included in spi-nor
> framework? I just followed the mr25h256 as an example.

 So I thought until you pointed out the MR25 devices.

 Does the 25LC device need erase or not ? I think the MR25s didn't,
 but I
 might be wrong.
>>>
>>> You are right. MR25s does not need erase and the same thing is true for
>>> 25LC.
>>
>> Oh. And the command set is (except for readid) comparable to SPI NOR ?
>>
 Maybe the framework could support the 25LC afterall.

>>> Yes, this was my impression too.
>>
>> I am still thinking that the AT25 driver might be a better fit for such
>> devices. Can you take a look ?
> 
> I tested the memory with at25 driver and it works with no line changed
> in driver. :)

So AT25 it is then ?

-- 
Best regards,
Marek Vasut


Re: [PATCH] mtd: spi-nor: add support for Microchip 25LC256

2018-05-18 Thread Marek Vasut
On 05/18/2018 03:00 PM, Radu Pirea wrote:
> 
> 
> On 05/18/2018 01:03 PM, Marek Vasut wrote:
>> On 05/18/2018 11:50 AM, Radu Pirea wrote:
>>>
>>>
>>> On 05/16/2018 04:47 PM, Marek Vasut wrote:
 On 05/16/2018 12:05 PM, Radu Pirea wrote:
> On Wed, 2018-05-16 at 00:17 +0200, Marek Vasut wrote:
>> On 05/15/2018 06:22 PM, Radu Pirea wrote:
>>> On Fri, 2018-05-04 at 20:40 +0200, Boris Brezillon wrote:
 On Fri, 4 May 2018 18:54:04 +0300
 Radu Pirea  wrote:

> Added geometry description for Microchip 25LC256 memory.

 Same as for the dataflash stuff you posted a few weeks ago: I
 don't
 think this device belongs in the SPI NOR framework.
>>>
>>> Hi Boris,
>>>
>>> 25lc256 memory is similar with mr25h256, the only difference is the
>>> page size(64 vs 256). Because mr25h256 is already in SPI NOR
>>> framework
>>> I added here 25lc256.
>>
>> I think I must be reading the wrong datasheet, but can you show me
>> how
>> does it support things like READID opcode ?
>>
> Hi Marek,
>
> I read the datasheet for 25lc256 and for mr25h256 and none of them
> supports READID. Is this required for a chip to be included in spi-nor
> framework? I just followed the mr25h256 as an example.

 So I thought until you pointed out the MR25 devices.

 Does the 25LC device need erase or not ? I think the MR25s didn't,
 but I
 might be wrong.
>>>
>>> You are right. MR25s does not need erase and the same thing is true for
>>> 25LC.
>>
>> Oh. And the command set is (except for readid) comparable to SPI NOR ?
>>
 Maybe the framework could support the 25LC afterall.

>>> Yes, this was my impression too.
>>
>> I am still thinking that the AT25 driver might be a better fit for such
>> devices. Can you take a look ?
> 
> I tested the memory with at25 driver and it works with no line changed
> in driver. :)

So AT25 it is then ?

-- 
Best regards,
Marek Vasut


Re: [PATCH] audit: use existing session info function

2018-05-18 Thread Paul Moore
On Thu, May 17, 2018 at 10:01 PM, Richard Guy Briggs  wrote:
> Use the existing audit_log_session_info() function rather than
> hardcoding its functionality.
>
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/auditfilter.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Merged into audit/next, thanks.

> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index d7a807e..9e87377 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -1089,8 +1089,6 @@ static void audit_list_rules(int seq, struct 
> sk_buff_head *q)
>  static void audit_log_rule_change(char *action, struct audit_krule *rule, 
> int res)
>  {
> struct audit_buffer *ab;
> -   uid_t loginuid = from_kuid(_user_ns, 
> audit_get_loginuid(current));
> -   unsigned int sessionid = audit_get_sessionid(current);
>
> if (!audit_enabled)
> return;
> @@ -1098,7 +1096,7 @@ static void audit_log_rule_change(char *action, struct 
> audit_krule *rule, int re
> ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> if (!ab)
> return;
> -   audit_log_format(ab, "auid=%u ses=%u" ,loginuid, sessionid);
> +   audit_log_session_info(ab);
> audit_log_task_context(ab);
> audit_log_format(ab, " op=%s", action);
> audit_log_key(ab, rule->filterkey);
> --
> 1.8.3.1
>



-- 
paul moore
www.paul-moore.com


Re: [PATCH] audit: use existing session info function

2018-05-18 Thread Paul Moore
On Thu, May 17, 2018 at 10:01 PM, Richard Guy Briggs  wrote:
> Use the existing audit_log_session_info() function rather than
> hardcoding its functionality.
>
> Signed-off-by: Richard Guy Briggs 
> ---
>  kernel/auditfilter.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Merged into audit/next, thanks.

> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index d7a807e..9e87377 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -1089,8 +1089,6 @@ static void audit_list_rules(int seq, struct 
> sk_buff_head *q)
>  static void audit_log_rule_change(char *action, struct audit_krule *rule, 
> int res)
>  {
> struct audit_buffer *ab;
> -   uid_t loginuid = from_kuid(_user_ns, 
> audit_get_loginuid(current));
> -   unsigned int sessionid = audit_get_sessionid(current);
>
> if (!audit_enabled)
> return;
> @@ -1098,7 +1096,7 @@ static void audit_log_rule_change(char *action, struct 
> audit_krule *rule, int re
> ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_CONFIG_CHANGE);
> if (!ab)
> return;
> -   audit_log_format(ab, "auid=%u ses=%u" ,loginuid, sessionid);
> +   audit_log_session_info(ab);
> audit_log_task_context(ab);
> audit_log_format(ab, " op=%s", action);
> audit_log_key(ab, rule->filterkey);
> --
> 1.8.3.1
>



-- 
paul moore
www.paul-moore.com


Re: [PATCH INTERNAL 3/3] PCI: iproc: Disable MSI parsing in certain PAXC blocks

2018-05-18 Thread Ray Jui

Hi Lorenzo,

On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:

On Fri, May 18, 2018 at 02:53:41PM +0530, p...@codeaurora.org wrote:

On 2018-05-17 22:51, Ray Jui wrote:

The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
drivers/pci/host/pcie-iproc.c | 34 --
1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index 3c76c5f..b906d80 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
iproc_pcie *pcie, u64 msi_addr)
return ret;
}

-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
msi_addr,
+bool enable)
{
u32 val;

+   if (!enable) {
+   /*
+* Disable PAXC MSI steering. All write transfers will be
+* treated as non-MSI transfers
+*/
+   val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+   val &= ~MSI_ENABLE_CFG;
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+   return;
+   }
+
/*
 * Program bits [43:13] of address of GITS_TRANSLATER register into
 * bits [30:0] of the MSI base address register.  In fact, in all iProc
@@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie
*pcie,
return ret;
break;
case IPROC_PCIE_PAXC_V2:
-   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
+   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);


Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with 'false' ?
I see its getting called only from one place in current code
iproc_pcie_msi_steer().


It is called below with the false field to disable MSIs. That's probably
one reason more to create a function to enable/disable MSIs instead of
adding a parameter to iproc_pcie_paxc_v2_msi_steer().


Correct. Thanks for helping to explain.



Which brings me to the question: what happens to those MSIs writes
when MSIs are disabled according to this patch ? Are they terminated
at the root complex ?


Note the PAXC block parses MSI writes from our internally connected 
endpoints (i.e., an embedded network processor). The PAXC block examines 
these MSI writes and modifies the memory attributes (to DEVICE) of these 
data and then send them out to the AXI bus. These MSI writes will then 
be forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further 
processing. This is saying, PAXC itself does not process these MSI 
writes. They are processed by the GIC and associated interrupt will be 
generated form the GIC. PAXC's job is to parse and tag them properly so 
these MSI writes can reach the GIC, and at the same, reach the GIC at 
the right order.


On some of these problematic PAXCs, we are being forced to disable this 
PAXC internal parsing logic. In this case, we set up static mapping with 
the IOMMU to modify the memory attributes of these MSI writes. These MSI 
writes are always designated towards a specific memory address (e.g., on 
the GICv3 based system, it's the address of the translator register), 
and that's why static mapping table can be set up to work around this.




Lorenzo


break;
default:
return -EINVAL;
@@ -1427,6 +1439,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
}
EXPORT_SYMBOL(iproc_pcie_remove);

+/*
+ * The MSI parsing logic in certain revisions of Broadcom PAXC based root
+ * complex does not work and needs to be disabled
+ */
+static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
+{
+   struct iproc_pcie *pcie = iproc_data(pdev->bus);
+
+   if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
+   quirk_paxc_disable_msi_parsing);
+
MODULE_AUTHOR("Ray Jui ");
MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
MODULE_LICENSE("GPL v2");


Re: [PATCH INTERNAL 3/3] PCI: iproc: Disable MSI parsing in certain PAXC blocks

2018-05-18 Thread Ray Jui

Hi Lorenzo,

On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:

On Fri, May 18, 2018 at 02:53:41PM +0530, p...@codeaurora.org wrote:

On 2018-05-17 22:51, Ray Jui wrote:

The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled

Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
---
drivers/pci/host/pcie-iproc.c | 34 --
1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index 3c76c5f..b906d80 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
iproc_pcie *pcie, u64 msi_addr)
return ret;
}

-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
msi_addr,
+bool enable)
{
u32 val;

+   if (!enable) {
+   /*
+* Disable PAXC MSI steering. All write transfers will be
+* treated as non-MSI transfers
+*/
+   val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+   val &= ~MSI_ENABLE_CFG;
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+   return;
+   }
+
/*
 * Program bits [43:13] of address of GITS_TRANSLATER register into
 * bits [30:0] of the MSI base address register.  In fact, in all iProc
@@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie
*pcie,
return ret;
break;
case IPROC_PCIE_PAXC_V2:
-   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
+   iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);


Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with 'false' ?
I see its getting called only from one place in current code
iproc_pcie_msi_steer().


It is called below with the false field to disable MSIs. That's probably
one reason more to create a function to enable/disable MSIs instead of
adding a parameter to iproc_pcie_paxc_v2_msi_steer().


Correct. Thanks for helping to explain.



Which brings me to the question: what happens to those MSIs writes
when MSIs are disabled according to this patch ? Are they terminated
at the root complex ?


Note the PAXC block parses MSI writes from our internally connected 
endpoints (i.e., an embedded network processor). The PAXC block examines 
these MSI writes and modifies the memory attributes (to DEVICE) of these 
data and then send them out to the AXI bus. These MSI writes will then 
be forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further 
processing. This is saying, PAXC itself does not process these MSI 
writes. They are processed by the GIC and associated interrupt will be 
generated form the GIC. PAXC's job is to parse and tag them properly so 
these MSI writes can reach the GIC, and at the same, reach the GIC at 
the right order.


On some of these problematic PAXCs, we are being forced to disable this 
PAXC internal parsing logic. In this case, we set up static mapping with 
the IOMMU to modify the memory attributes of these MSI writes. These MSI 
writes are always designated towards a specific memory address (e.g., on 
the GICv3 based system, it's the address of the translator register), 
and that's why static mapping table can be set up to work around this.




Lorenzo


break;
default:
return -EINVAL;
@@ -1427,6 +1439,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
}
EXPORT_SYMBOL(iproc_pcie_remove);

+/*
+ * The MSI parsing logic in certain revisions of Broadcom PAXC based root
+ * complex does not work and needs to be disabled
+ */
+static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
+{
+   struct iproc_pcie *pcie = iproc_data(pdev->bus);
+
+   if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+   iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
+   quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
+   quirk_paxc_disable_msi_parsing);
+
MODULE_AUTHOR("Ray Jui ");
MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
MODULE_LICENSE("GPL v2");


Re: [PATCH] isdn: eicon: fix a missing-check bug

2018-05-18 Thread Wenwen Wang
Thanks for your suggestion, David! I will revise the patch and resubmit it.

Wenwen

On Fri, May 11, 2018 at 2:50 PM, David Miller  wrote:
> From: Wenwen Wang 
> Date: Sat,  5 May 2018 14:32:46 -0500
>
>> To avoid such issues, this patch adds a check after the second copy in the
>> function diva_xdi_write(). If the adapter number is not equal to the one
>> obtained in the first copy, (-4) will be returned to divas_write(), which
>> will then return an error code -EINVAL.
>
> Better fix is to copy the msg header once into an on-stack buffer supplied
> by diva_write() to diva_xdi_open_adapter(), which is then passed on to
> diva_xdi_write() with an adjusted src pointer and length.


Re: [PATCH] isdn: eicon: fix a missing-check bug

2018-05-18 Thread Wenwen Wang
Thanks for your suggestion, David! I will revise the patch and resubmit it.

Wenwen

On Fri, May 11, 2018 at 2:50 PM, David Miller  wrote:
> From: Wenwen Wang 
> Date: Sat,  5 May 2018 14:32:46 -0500
>
>> To avoid such issues, this patch adds a check after the second copy in the
>> function diva_xdi_write(). If the adapter number is not equal to the one
>> obtained in the first copy, (-4) will be returned to divas_write(), which
>> will then return an error code -EINVAL.
>
> Better fix is to copy the msg header once into an on-stack buffer supplied
> by diva_write() to diva_xdi_open_adapter(), which is then passed on to
> diva_xdi_write() with an adjusted src pointer and length.


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread hpa
On May 18, 2018 12:36:20 PM PDT, Nadav Amit  wrote:
>h...@zytor.com wrote:
>
>> On May 18, 2018 12:21:00 PM PDT, Linus Torvalds
> wrote:
>>> On Fri, May 18, 2018 at 12:18 PM Nadav Amit 
>wrote:
>>> 
 Gnu ASM manual says: "Each time you run as it assembles exactly one
>>> source
 program. The source program is made up of one or more files.”
>>> 
>>> Ok, that counts as documentation, although it's confusing as hell.
>Is
>>> it
>>> one source program or multiple files again? I see what they are
>trying
>>> to
>>> say, but it really could be phrased more clearly ;)
>>> 
>>>Linus
>> 
>> At least we have a solution!
>
>Thanks for all your help. I will try to do it over over the weekend.
>
>Funny, I see that this “trick” (-Wa,[filename.s]) is already being used
>to
>include arch/x86/boot/code16gcc.h so I feel “safer” to use it.
>
>Regards,
>Nadav

Oh. I don't remember when we did that... might even be my doing and then I just 
forgot about it :)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread hpa
On May 18, 2018 12:36:20 PM PDT, Nadav Amit  wrote:
>h...@zytor.com wrote:
>
>> On May 18, 2018 12:21:00 PM PDT, Linus Torvalds
> wrote:
>>> On Fri, May 18, 2018 at 12:18 PM Nadav Amit 
>wrote:
>>> 
 Gnu ASM manual says: "Each time you run as it assembles exactly one
>>> source
 program. The source program is made up of one or more files.”
>>> 
>>> Ok, that counts as documentation, although it's confusing as hell.
>Is
>>> it
>>> one source program or multiple files again? I see what they are
>trying
>>> to
>>> say, but it really could be phrased more clearly ;)
>>> 
>>>Linus
>> 
>> At least we have a solution!
>
>Thanks for all your help. I will try to do it over over the weekend.
>
>Funny, I see that this “trick” (-Wa,[filename.s]) is already being used
>to
>include arch/x86/boot/code16gcc.h so I feel “safer” to use it.
>
>Regards,
>Nadav

Oh. I don't remember when we did that... might even be my doing and then I just 
forgot about it :)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH v6 17/28] x86/asm: use SYM_INNER_LABEL instead of GLOBAL

2018-05-18 Thread Andy Lutomirski
On Fri, May 18, 2018 at 2:17 AM Jiri Slaby  wrote:

> GLOBAL had several meanings and is going away. In this patch, convert
> all the inner function labels marked with GLOBAL to use SYM_INNER_LABEL
> instead.

> Note that retint_user needs not be global, perhaps since commit
> 2ec67971facc ("x86/entry/64/compat: Remove most of the fast system call
> machinery"), where entry_64_compat's caller was removed. So mark the
> label as LOCAL.


> -GLOBAL(entry_SYSCALL_64_after_hwframe)
> +SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)

I've missed all the context here.   I agree that GLOBAL is misleading, and
"inner label" is nice.  But this is a rather wordy macro.  Would:

INNER_LABEL_GLOBAL(name)

be better?  (With just INNER_LABEL(name) for the local version?)


Re: [PATCH v6 17/28] x86/asm: use SYM_INNER_LABEL instead of GLOBAL

2018-05-18 Thread Andy Lutomirski
On Fri, May 18, 2018 at 2:17 AM Jiri Slaby  wrote:

> GLOBAL had several meanings and is going away. In this patch, convert
> all the inner function labels marked with GLOBAL to use SYM_INNER_LABEL
> instead.

> Note that retint_user needs not be global, perhaps since commit
> 2ec67971facc ("x86/entry/64/compat: Remove most of the fast system call
> machinery"), where entry_64_compat's caller was removed. So mark the
> label as LOCAL.


> -GLOBAL(entry_SYSCALL_64_after_hwframe)
> +SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, SYM_L_GLOBAL)

I've missed all the context here.   I agree that GLOBAL is misleading, and
"inner label" is nice.  But this is a rather wordy macro.  Would:

INNER_LABEL_GLOBAL(name)

be better?  (With just INNER_LABEL(name) for the local version?)


Re: [PATCH v4 2/2] vfio/mdev: Re-order sysfs attribute creation

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> There exists a gap at the end of mdev_device_create() where the device
> is visible to userspace, but we're not yet ready to handle removal, as
> triggered through the 'remove' attribute.  We handle this properly in
> mdev_device_remove() with an -EAGAIN return, but we can marginally
> reduce this gap by adding this attribute as a final step of our sysfs
> setup.
> 
> Signed-off-by: Alex Williamson 

Looks good.

Reviewed by: Kirti Wankhede 

> ---
>  drivers/vfio/mdev/mdev_sysfs.c |   14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
> index 802df210929b..249472f05509 100644
> --- a/drivers/vfio/mdev/mdev_sysfs.c
> +++ b/drivers/vfio/mdev/mdev_sysfs.c
> @@ -257,24 +257,24 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
> mdev_type *type)
>  {
>   int ret;
>  
> - ret = sysfs_create_files(>kobj, mdev_device_attrs);
> - if (ret)
> - return ret;
> -
>   ret = sysfs_create_link(type->devices_kobj, >kobj, dev_name(dev));
>   if (ret)
> - goto device_link_failed;
> + return ret;
>  
>   ret = sysfs_create_link(>kobj, >kobj, "mdev_type");
>   if (ret)
>   goto type_link_failed;
>  
> + ret = sysfs_create_files(>kobj, mdev_device_attrs);
> + if (ret)
> + goto create_files_failed;
> +
>   return ret;
>  
> +create_files_failed:
> + sysfs_remove_link(>kobj, "mdev_type");
>  type_link_failed:
>   sysfs_remove_link(type->devices_kobj, dev_name(dev));
> -device_link_failed:
> - sysfs_remove_files(>kobj, mdev_device_attrs);
>   return ret;
>  }
>  
> 


Re: [PATCH v4 2/2] vfio/mdev: Re-order sysfs attribute creation

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> There exists a gap at the end of mdev_device_create() where the device
> is visible to userspace, but we're not yet ready to handle removal, as
> triggered through the 'remove' attribute.  We handle this properly in
> mdev_device_remove() with an -EAGAIN return, but we can marginally
> reduce this gap by adding this attribute as a final step of our sysfs
> setup.
> 
> Signed-off-by: Alex Williamson 

Looks good.

Reviewed by: Kirti Wankhede 

> ---
>  drivers/vfio/mdev/mdev_sysfs.c |   14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
> index 802df210929b..249472f05509 100644
> --- a/drivers/vfio/mdev/mdev_sysfs.c
> +++ b/drivers/vfio/mdev/mdev_sysfs.c
> @@ -257,24 +257,24 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
> mdev_type *type)
>  {
>   int ret;
>  
> - ret = sysfs_create_files(>kobj, mdev_device_attrs);
> - if (ret)
> - return ret;
> -
>   ret = sysfs_create_link(type->devices_kobj, >kobj, dev_name(dev));
>   if (ret)
> - goto device_link_failed;
> + return ret;
>  
>   ret = sysfs_create_link(>kobj, >kobj, "mdev_type");
>   if (ret)
>   goto type_link_failed;
>  
> + ret = sysfs_create_files(>kobj, mdev_device_attrs);
> + if (ret)
> + goto create_files_failed;
> +
>   return ret;
>  
> +create_files_failed:
> + sysfs_remove_link(>kobj, "mdev_type");
>  type_link_failed:
>   sysfs_remove_link(type->devices_kobj, dev_name(dev));
> -device_link_failed:
> - sysfs_remove_files(>kobj, mdev_device_attrs);
>   return ret;
>  }
>  
> 


Re: [PATCH v4 0/2] vfio/mdev: Device namespace protection

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> v4: Fix the 'create' racing 'remove' gap noted by Kirti by moving
> removal from mdev_list to mdev_device_release().  Fix missing
> mdev_put_parent() cases in mdev_device_create(), also noted
> by Kirti.  Added documention update regarding serialization as
> noted by Cornelia.  Added additional commit log comment about
> -EAGAIN vs -ENODEV for 'remove' racing 'create'.  Added second
> patch to re-order sysfs attributes, with this my targeted
> scripts can no longer hit the gap where -EAGAIN is regurned.
> BTW, the gap where the current code returns -ENODEV in this
> race condition is about 50% easier to hit than it exists in
> this series with patch 1 alone.
> 

Thanks for fixing this. This patch set looks good to me.

Thanks,
Kirti

> Thanks,
> Alex
> 
> ---
> 
> Alex Williamson (2):
>   vfio/mdev: Check globally for duplicate devices
>   vfio/mdev: Re-order sysfs attribute creation
> 
> 
>  Documentation/vfio-mediated-device.txt |5 ++
>  drivers/vfio/mdev/mdev_core.c  |  102 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |2 -
>  drivers/vfio/mdev/mdev_sysfs.c |   14 ++--
>  4 files changed, 49 insertions(+), 74 deletions(-)
> 


Re: [PATCH v4 0/2] vfio/mdev: Device namespace protection

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> v4: Fix the 'create' racing 'remove' gap noted by Kirti by moving
> removal from mdev_list to mdev_device_release().  Fix missing
> mdev_put_parent() cases in mdev_device_create(), also noted
> by Kirti.  Added documention update regarding serialization as
> noted by Cornelia.  Added additional commit log comment about
> -EAGAIN vs -ENODEV for 'remove' racing 'create'.  Added second
> patch to re-order sysfs attributes, with this my targeted
> scripts can no longer hit the gap where -EAGAIN is regurned.
> BTW, the gap where the current code returns -ENODEV in this
> race condition is about 50% easier to hit than it exists in
> this series with patch 1 alone.
> 

Thanks for fixing this. This patch set looks good to me.

Thanks,
Kirti

> Thanks,
> Alex
> 
> ---
> 
> Alex Williamson (2):
>   vfio/mdev: Check globally for duplicate devices
>   vfio/mdev: Re-order sysfs attribute creation
> 
> 
>  Documentation/vfio-mediated-device.txt |5 ++
>  drivers/vfio/mdev/mdev_core.c  |  102 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |2 -
>  drivers/vfio/mdev/mdev_sysfs.c |   14 ++--
>  4 files changed, 49 insertions(+), 74 deletions(-)
> 


Re: [PATCH v4 1/2] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> When we create an mdev device, we check for duplicates against the
> parent device and return -EEXIST if found, but the mdev device
> namespace is global since we'll link all devices from the bus.  We do
> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
> with it comes a kernel warning and stack trace for trying to create
> duplicate sysfs links, which makes it an undesirable response.
> 
> Therefore we should really be looking for duplicates across all mdev
> parent devices, or as implemented here, against our mdev device list.
> Using mdev_list to prevent duplicates means that we can remove
> mdev_parent.lock, but in order not to serialize mdev device creation
> and removal globally, we add mdev_device.active which allows UUIDs to
> be reserved such that we can drop the mdev_list_lock before the mdev
> device is fully in place.
> 
> Two behavioral notes; first, mdev_parent.lock had the side-effect of
> serializing mdev create and remove ops per parent device.  This was
> an implementation detail, not an intentional guarantee provided to
> the mdev vendor drivers.  Vendor drivers can trivially provide this
> serialization internally if necessary.  Second, review comments note
> the new -EAGAIN behavior when the device, and in particular the remove
> attribute, becomes visible in sysfs.  If a remove is triggered prior
> to completion of mdev_device_create() the user will see a -EAGAIN
> error.  While the errno is different, receiving an error during this
> period is not, the previous implementation returned -ENODEV for the
> same condition.  Furthermore, the consistency to the user is improved
> in the case where mdev_device_remove_ops() returns error.  Previously
> concurrent calls to mdev_device_remove() could see the device
> disappear with -ENODEV and return in the case of error.  Now a user
> would see -EAGAIN while the device is in this transitory state.
> 
> Signed-off-by: Alex Williamson 

Looks good to me.

Reviewed by: Kirti Wankhede 

> ---
>  Documentation/vfio-mediated-device.txt |5 ++
>  drivers/vfio/mdev/mdev_core.c  |  102 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |2 -
>  3 files changed, 42 insertions(+), 67 deletions(-)
> 
> diff --git a/Documentation/vfio-mediated-device.txt 
> b/Documentation/vfio-mediated-device.txt
> index 1b3950346532..c3f69bcaf96e 100644
> --- a/Documentation/vfio-mediated-device.txt
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -145,6 +145,11 @@ The functions in the mdev_parent_ops structure are as 
> follows:
>  * create: allocate basic resources in a driver for a mediated device
>  * remove: free resources in a driver when a mediated device is destroyed
>  
> +(Note that mdev-core provides no implicit serialization of create/remove
> +callbacks per mdev parent device, per mdev type, or any other categorization.
> +Vendor drivers are expected to be fully asynchronous in this respect or
> +provide their own internal resource protection.)
> +
>  The callbacks in the mdev_parent_ops structure are as follows:
>  
>  * open: open callback of mediated device
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 126991046eb7..0212f0ee8aea 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_uuid);
>  
> -static int _find_mdev_device(struct device *dev, void *data)
> -{
> - struct mdev_device *mdev;
> -
> - if (!dev_is_mdev(dev))
> - return 0;
> -
> - mdev = to_mdev_device(dev);
> -
> - if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
> - return 1;
> -
> - return 0;
> -}
> -
> -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
> -{
> - struct device *dev;
> -
> - dev = device_find_child(parent->dev, , _find_mdev_device);
> - if (dev) {
> - put_device(dev);
> - return true;
> - }
> -
> - return false;
> -}
> -
>  /* Should be called holding parent_list_lock */
>  static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   }
>  
>   kref_init(>ref);
> - mutex_init(>lock);
>  
>   parent->dev = dev;
>   parent->ops = ops;
> @@ -297,6 +268,10 @@ static void mdev_device_release(struct device *dev)
>  {
>   struct mdev_device *mdev = to_mdev_device(dev);
>  
> + mutex_lock(_list_lock);
> + list_del(>next);
> + mutex_unlock(_list_lock);
> +
>   dev_dbg(>dev, "MDEV: destroying\n");
>   kfree(mdev);
>  }
> @@ -304,7 +279,7 @@ static void mdev_device_release(struct device *dev)
>  int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le 
> uuid)
>  {
>   int 

Re: [PATCH v4 1/2] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> When we create an mdev device, we check for duplicates against the
> parent device and return -EEXIST if found, but the mdev device
> namespace is global since we'll link all devices from the bus.  We do
> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
> with it comes a kernel warning and stack trace for trying to create
> duplicate sysfs links, which makes it an undesirable response.
> 
> Therefore we should really be looking for duplicates across all mdev
> parent devices, or as implemented here, against our mdev device list.
> Using mdev_list to prevent duplicates means that we can remove
> mdev_parent.lock, but in order not to serialize mdev device creation
> and removal globally, we add mdev_device.active which allows UUIDs to
> be reserved such that we can drop the mdev_list_lock before the mdev
> device is fully in place.
> 
> Two behavioral notes; first, mdev_parent.lock had the side-effect of
> serializing mdev create and remove ops per parent device.  This was
> an implementation detail, not an intentional guarantee provided to
> the mdev vendor drivers.  Vendor drivers can trivially provide this
> serialization internally if necessary.  Second, review comments note
> the new -EAGAIN behavior when the device, and in particular the remove
> attribute, becomes visible in sysfs.  If a remove is triggered prior
> to completion of mdev_device_create() the user will see a -EAGAIN
> error.  While the errno is different, receiving an error during this
> period is not, the previous implementation returned -ENODEV for the
> same condition.  Furthermore, the consistency to the user is improved
> in the case where mdev_device_remove_ops() returns error.  Previously
> concurrent calls to mdev_device_remove() could see the device
> disappear with -ENODEV and return in the case of error.  Now a user
> would see -EAGAIN while the device is in this transitory state.
> 
> Signed-off-by: Alex Williamson 

Looks good to me.

Reviewed by: Kirti Wankhede 

> ---
>  Documentation/vfio-mediated-device.txt |5 ++
>  drivers/vfio/mdev/mdev_core.c  |  102 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |2 -
>  3 files changed, 42 insertions(+), 67 deletions(-)
> 
> diff --git a/Documentation/vfio-mediated-device.txt 
> b/Documentation/vfio-mediated-device.txt
> index 1b3950346532..c3f69bcaf96e 100644
> --- a/Documentation/vfio-mediated-device.txt
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -145,6 +145,11 @@ The functions in the mdev_parent_ops structure are as 
> follows:
>  * create: allocate basic resources in a driver for a mediated device
>  * remove: free resources in a driver when a mediated device is destroyed
>  
> +(Note that mdev-core provides no implicit serialization of create/remove
> +callbacks per mdev parent device, per mdev type, or any other categorization.
> +Vendor drivers are expected to be fully asynchronous in this respect or
> +provide their own internal resource protection.)
> +
>  The callbacks in the mdev_parent_ops structure are as follows:
>  
>  * open: open callback of mediated device
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 126991046eb7..0212f0ee8aea 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_uuid);
>  
> -static int _find_mdev_device(struct device *dev, void *data)
> -{
> - struct mdev_device *mdev;
> -
> - if (!dev_is_mdev(dev))
> - return 0;
> -
> - mdev = to_mdev_device(dev);
> -
> - if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
> - return 1;
> -
> - return 0;
> -}
> -
> -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
> -{
> - struct device *dev;
> -
> - dev = device_find_child(parent->dev, , _find_mdev_device);
> - if (dev) {
> - put_device(dev);
> - return true;
> - }
> -
> - return false;
> -}
> -
>  /* Should be called holding parent_list_lock */
>  static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   }
>  
>   kref_init(>ref);
> - mutex_init(>lock);
>  
>   parent->dev = dev;
>   parent->ops = ops;
> @@ -297,6 +268,10 @@ static void mdev_device_release(struct device *dev)
>  {
>   struct mdev_device *mdev = to_mdev_device(dev);
>  
> + mutex_lock(_list_lock);
> + list_del(>next);
> + mutex_unlock(_list_lock);
> +
>   dev_dbg(>dev, "MDEV: destroying\n");
>   kfree(mdev);
>  }
> @@ -304,7 +279,7 @@ static void mdev_device_release(struct device *dev)
>  int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le 
> uuid)
>  {
>   int ret;
> - struct mdev_device *mdev;
> + struct 

Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Nadav Amit
h...@zytor.com wrote:

> On May 18, 2018 12:21:00 PM PDT, Linus Torvalds 
>  wrote:
>> On Fri, May 18, 2018 at 12:18 PM Nadav Amit  wrote:
>> 
>>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>> source
>>> program. The source program is made up of one or more files.”
>> 
>> Ok, that counts as documentation, although it's confusing as hell. Is
>> it
>> one source program or multiple files again? I see what they are trying
>> to
>> say, but it really could be phrased more clearly ;)
>> 
>>Linus
> 
> At least we have a solution!

Thanks for all your help. I will try to do it over over the weekend.

Funny, I see that this “trick” (-Wa,[filename.s]) is already being used to
include arch/x86/boot/code16gcc.h so I feel “safer” to use it.

Regards,
Nadav

Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Nadav Amit
h...@zytor.com wrote:

> On May 18, 2018 12:21:00 PM PDT, Linus Torvalds 
>  wrote:
>> On Fri, May 18, 2018 at 12:18 PM Nadav Amit  wrote:
>> 
>>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>> source
>>> program. The source program is made up of one or more files.”
>> 
>> Ok, that counts as documentation, although it's confusing as hell. Is
>> it
>> one source program or multiple files again? I see what they are trying
>> to
>> say, but it really could be phrased more clearly ;)
>> 
>>Linus
> 
> At least we have a solution!

Thanks for all your help. I will try to do it over over the weekend.

Funny, I see that this “trick” (-Wa,[filename.s]) is already being used to
include arch/x86/boot/code16gcc.h so I feel “safer” to use it.

Regards,
Nadav

Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Arnaldo Carvalho de Melo
Em Fri, May 18, 2018 at 04:29:16PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 18, 2018 at 04:21:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Here it went from failing with:
> > 
> > [root@seventh ~]# perf test -v pton
> > 64: probe libc's inet_pton & backtrace it with ping   :
> > --- start ---
> > test child forked, pid 22590
> > ping 22607 [001] 12782.425689: probe_libc:inet_pton: (7f8686da4e40)
> > 7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> > FAIL: expected backtrace entry 1 
> > ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" got 
> > "7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)"
> > test child finished with -1
> >  end 
> > probe libc's inet_pton & backtrace it with ping: FAILED!
> > [root@seventh ~]# 
> > 
> > To failing with:
> > 
> > [root@seventh ~]# perf test -v pton
> > 64: probe libc's inet_pton & backtrace it with ping   :
> > --- start ---
> > test child forked, pid 28954
> > ping 28971 [002] 14277.711200: probe_libc:inet_pton: (7fc9d66e3e40)
> > 7fc9d66e3e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> > 7fc9d66b02b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> > 56075cb98f40 [unknown] (/usr/bin/ping)
> > FAIL: expected backtrace entry 3 
> > ".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" got "56075cb98f40 
> > [unknown] (/usr/bin/ping)"
> > test child finished with -1
> >  end 
> > probe libc's inet_pton & backtrace it with ping: FAILED!
> > [root@seventh ~]# 
> > 
> > Trying to figure this out...
> 
> [root@seventh perf]# perf script
> ping 29170 [001] 14644.810782: probe_libc:inet_pton: (7f0bac71be40)
> 7f0bac71be40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> 7f0bac6e82b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> 5585e10adf40 [unknown] (/usr/bin/ping)
> 
> [root@seventh perf]# perf report --mmaps
> #  pid  tid ppid  comm
>  00   -1 |swapper
>  2917029170   -1 |ping
> 5585e10ab000-5585e12dd000 r-xp  
> 2771393 /usr/bin/ping
> 7f0babfcf000-7f0bac1ed000 r-xp  
> 2762913 /usr/lib64/libpthread-2.26.so
> 7f0bac1ed000-7f0bac3f1000 r-xp  
> 2753363 /usr/lib64/libdl-2.26.so
> 7f0bac3f1000-7f0bac608000 r-xp  
> 2760078 /usr/lib64/libz.so.1.2.11
> 7f0bac608000-7f0bac9be000 r-xp  
> 2753359 /usr/lib64/libc-2.26.so
> 7f0bac9be000-7f0bacd09000 r-xp  
> 2762901 /usr/lib64/libm-2.26.so
> 7f0bacd09000-7f0bacf2 r-xp  
> 2762915 /usr/lib64/libresolv-2.26.so
> 7f0bacf2-7f0bad3a8000 r-xp  
> 2764213 /usr/lib64/libcrypto.so.1.1.0h
> 7f0bad3a8000-7f0bad5dc000 r-xp  
> 2761070 /usr/lib64/libidn.so.11.6.18
> 7f0bad5dc000-7f0bad7e1000 r-xp  
> 2760189 /usr/lib64/libcap.so.2.25
> 7f0bad7e1000-7f0bada08000 r-xp  
> 2753353 /usr/lib64/ld-2.26.so
> 7ffe23d99000-7ffe23d9b000 r-xp  0 
> [vdso]
> [root@seventh perf]# 
> 
> So it is there, but I don't have debuginfo for ping, lets see if I add
> it...
> 
> [root@seventh perf]# rpm -qf `which ping`
> iputils-20161105-7.fc27.x86_64
> [root@seventh perf]# 
> [root@seventh perf]# dnf debuginfo-install iputils
> 
> [root@seventh perf]# perf script
> ping 29170 [001] 14644.810782: probe_libc:inet_pton: (7f0bac71be40)
> 7f0bac71be40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> 7f0bac6e82b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> 5585e10adf40 main+0x880 (/usr/bin/ping)
> 
> [root@seventh perf]# 
> 
> So we need to check if debuginfo is available, and if so, expect it to
> resolve that entry to main, if not, expect it to _not_ resolve and have
> [unknown]  instead...

But tis is not related to your recent patches, so I'm folding this last
one with the one adding the offsets and go on from there...

- Arnaldo


Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Arnaldo Carvalho de Melo
Em Fri, May 18, 2018 at 04:29:16PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 18, 2018 at 04:21:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Here it went from failing with:
> > 
> > [root@seventh ~]# perf test -v pton
> > 64: probe libc's inet_pton & backtrace it with ping   :
> > --- start ---
> > test child forked, pid 22590
> > ping 22607 [001] 12782.425689: probe_libc:inet_pton: (7f8686da4e40)
> > 7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> > FAIL: expected backtrace entry 1 
> > ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" got 
> > "7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)"
> > test child finished with -1
> >  end 
> > probe libc's inet_pton & backtrace it with ping: FAILED!
> > [root@seventh ~]# 
> > 
> > To failing with:
> > 
> > [root@seventh ~]# perf test -v pton
> > 64: probe libc's inet_pton & backtrace it with ping   :
> > --- start ---
> > test child forked, pid 28954
> > ping 28971 [002] 14277.711200: probe_libc:inet_pton: (7fc9d66e3e40)
> > 7fc9d66e3e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> > 7fc9d66b02b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> > 56075cb98f40 [unknown] (/usr/bin/ping)
> > FAIL: expected backtrace entry 3 
> > ".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" got "56075cb98f40 
> > [unknown] (/usr/bin/ping)"
> > test child finished with -1
> >  end 
> > probe libc's inet_pton & backtrace it with ping: FAILED!
> > [root@seventh ~]# 
> > 
> > Trying to figure this out...
> 
> [root@seventh perf]# perf script
> ping 29170 [001] 14644.810782: probe_libc:inet_pton: (7f0bac71be40)
> 7f0bac71be40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> 7f0bac6e82b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> 5585e10adf40 [unknown] (/usr/bin/ping)
> 
> [root@seventh perf]# perf report --mmaps
> #  pid  tid ppid  comm
>  00   -1 |swapper
>  2917029170   -1 |ping
> 5585e10ab000-5585e12dd000 r-xp  
> 2771393 /usr/bin/ping
> 7f0babfcf000-7f0bac1ed000 r-xp  
> 2762913 /usr/lib64/libpthread-2.26.so
> 7f0bac1ed000-7f0bac3f1000 r-xp  
> 2753363 /usr/lib64/libdl-2.26.so
> 7f0bac3f1000-7f0bac608000 r-xp  
> 2760078 /usr/lib64/libz.so.1.2.11
> 7f0bac608000-7f0bac9be000 r-xp  
> 2753359 /usr/lib64/libc-2.26.so
> 7f0bac9be000-7f0bacd09000 r-xp  
> 2762901 /usr/lib64/libm-2.26.so
> 7f0bacd09000-7f0bacf2 r-xp  
> 2762915 /usr/lib64/libresolv-2.26.so
> 7f0bacf2-7f0bad3a8000 r-xp  
> 2764213 /usr/lib64/libcrypto.so.1.1.0h
> 7f0bad3a8000-7f0bad5dc000 r-xp  
> 2761070 /usr/lib64/libidn.so.11.6.18
> 7f0bad5dc000-7f0bad7e1000 r-xp  
> 2760189 /usr/lib64/libcap.so.2.25
> 7f0bad7e1000-7f0bada08000 r-xp  
> 2753353 /usr/lib64/ld-2.26.so
> 7ffe23d99000-7ffe23d9b000 r-xp  0 
> [vdso]
> [root@seventh perf]# 
> 
> So it is there, but I don't have debuginfo for ping, lets see if I add
> it...
> 
> [root@seventh perf]# rpm -qf `which ping`
> iputils-20161105-7.fc27.x86_64
> [root@seventh perf]# 
> [root@seventh perf]# dnf debuginfo-install iputils
> 
> [root@seventh perf]# perf script
> ping 29170 [001] 14644.810782: probe_libc:inet_pton: (7f0bac71be40)
> 7f0bac71be40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> 7f0bac6e82b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> 5585e10adf40 main+0x880 (/usr/bin/ping)
> 
> [root@seventh perf]# 
> 
> So we need to check if debuginfo is available, and if so, expect it to
> resolve that entry to main, if not, expect it to _not_ resolve and have
> [unknown]  instead...

But tis is not related to your recent patches, so I'm folding this last
one with the one adding the offsets and go on from there...

- Arnaldo


Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Arnaldo Carvalho de Melo
Em Fri, May 18, 2018 at 04:21:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 18, 2018 at 12:54:17PM +0530, Sandipan Das escreveu:
> > This test currently fails because the regular expressions for
> > matching the output of perf script do not consider the symbol
> > offsets to be part of the output.
> > 
> > The symbol offsets are seen because of the default behaviour
> > introduced by commit 4140d2ea74b3 ("perf script: Show symbol
> > offsets by default").
> > 
> > Before applying this patch:
> > 
> >   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> > 
> >   62: probe libc's inet_pton & backtrace it with ping   :
> >   --- start ---
> >   test child forked, pid 30389
> >   ping 30406 [002] 307144.280983: probe_libc:inet_pton: (7f4117adf220)
> >   7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
> >   FAIL: expected backtrace entry 1 
> > ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.25.so|inlined\)$" got 
> > "7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)"
> >   test child finished with -1
> >    end 
> >   probe libc's inet_pton & backtrace it with ping: FAILED!
> > 
> > After applying this patch:
> > 
> >   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> > 
> >   62: probe libc's inet_pton & backtrace it with ping   :
> >   --- start ---
> >   test child forked, pid 30539
> >   ping 30556 [003] 307254.313217: probe_libc:inet_pton: (7fe19ab10220)
> >   7fe19ab10220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
> >   7fe19aad5ebd getaddrinfo+0x11d (/usr/lib64/libc-2.25.so)
> >   56351e3c1c71 main+0x891 (/usr/bin/ping)
> >   test child finished with 0
> >    end 
> >   probe libc's inet_pton & backtrace it with ping: Ok
> 
> Here it went from failing with:
> 
> [root@seventh ~]# perf test -v pton
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 22590
> ping 22607 [001] 12782.425689: probe_libc:inet_pton: (7f8686da4e40)
> 7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> FAIL: expected backtrace entry 1 
> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" got 
> "7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)"
> test child finished with -1
>  end 
> probe libc's inet_pton & backtrace it with ping: FAILED!
> [root@seventh ~]# 
> 
> To failing with:
> 
> [root@seventh ~]# perf test -v pton
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 28954
> ping 28971 [002] 14277.711200: probe_libc:inet_pton: (7fc9d66e3e40)
> 7fc9d66e3e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> 7fc9d66b02b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> 56075cb98f40 [unknown] (/usr/bin/ping)
> FAIL: expected backtrace entry 3 
> ".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" got "56075cb98f40 
> [unknown] (/usr/bin/ping)"
> test child finished with -1
>  end 
> probe libc's inet_pton & backtrace it with ping: FAILED!
> [root@seventh ~]# 
> 
> Trying to figure this out...

[root@seventh perf]# perf script
ping 29170 [001] 14644.810782: probe_libc:inet_pton: (7f0bac71be40)
7f0bac71be40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
7f0bac6e82b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
5585e10adf40 [unknown] (/usr/bin/ping)

[root@seventh perf]# perf report --mmaps
#  pid  tid ppid  comm
 00   -1 |swapper
 2917029170   -1 |ping
5585e10ab000-5585e12dd000 r-xp  2771393 
/usr/bin/ping
7f0babfcf000-7f0bac1ed000 r-xp  2762913 
/usr/lib64/libpthread-2.26.so
7f0bac1ed000-7f0bac3f1000 r-xp  2753363 
/usr/lib64/libdl-2.26.so
7f0bac3f1000-7f0bac608000 r-xp  2760078 
/usr/lib64/libz.so.1.2.11
7f0bac608000-7f0bac9be000 r-xp  2753359 
/usr/lib64/libc-2.26.so
7f0bac9be000-7f0bacd09000 r-xp  2762901 
/usr/lib64/libm-2.26.so
7f0bacd09000-7f0bacf2 r-xp  2762915 
/usr/lib64/libresolv-2.26.so
7f0bacf2-7f0bad3a8000 r-xp  2764213 
/usr/lib64/libcrypto.so.1.1.0h
7f0bad3a8000-7f0bad5dc000 r-xp  2761070 
/usr/lib64/libidn.so.11.6.18
7f0bad5dc000-7f0bad7e1000 r-xp  2760189 
/usr/lib64/libcap.so.2.25
7f0bad7e1000-7f0bada08000 r-xp  2753353 
/usr/lib64/ld-2.26.so
7ffe23d99000-7ffe23d9b000 r-xp  0 [vdso]
[root@seventh perf]# 

So it is there, but I don't have debuginfo for ping, lets see if I add
it...

[root@seventh perf]# rpm -qf `which ping`
iputils-20161105-7.fc27.x86_64
[root@seventh perf]# 
[root@seventh perf]# dnf debuginfo-install 

Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Arnaldo Carvalho de Melo
Em Fri, May 18, 2018 at 04:21:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 18, 2018 at 12:54:17PM +0530, Sandipan Das escreveu:
> > This test currently fails because the regular expressions for
> > matching the output of perf script do not consider the symbol
> > offsets to be part of the output.
> > 
> > The symbol offsets are seen because of the default behaviour
> > introduced by commit 4140d2ea74b3 ("perf script: Show symbol
> > offsets by default").
> > 
> > Before applying this patch:
> > 
> >   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> > 
> >   62: probe libc's inet_pton & backtrace it with ping   :
> >   --- start ---
> >   test child forked, pid 30389
> >   ping 30406 [002] 307144.280983: probe_libc:inet_pton: (7f4117adf220)
> >   7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
> >   FAIL: expected backtrace entry 1 
> > ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.25.so|inlined\)$" got 
> > "7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)"
> >   test child finished with -1
> >    end 
> >   probe libc's inet_pton & backtrace it with ping: FAILED!
> > 
> > After applying this patch:
> > 
> >   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> > 
> >   62: probe libc's inet_pton & backtrace it with ping   :
> >   --- start ---
> >   test child forked, pid 30539
> >   ping 30556 [003] 307254.313217: probe_libc:inet_pton: (7fe19ab10220)
> >   7fe19ab10220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
> >   7fe19aad5ebd getaddrinfo+0x11d (/usr/lib64/libc-2.25.so)
> >   56351e3c1c71 main+0x891 (/usr/bin/ping)
> >   test child finished with 0
> >    end 
> >   probe libc's inet_pton & backtrace it with ping: Ok
> 
> Here it went from failing with:
> 
> [root@seventh ~]# perf test -v pton
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 22590
> ping 22607 [001] 12782.425689: probe_libc:inet_pton: (7f8686da4e40)
> 7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> FAIL: expected backtrace entry 1 
> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" got 
> "7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)"
> test child finished with -1
>  end 
> probe libc's inet_pton & backtrace it with ping: FAILED!
> [root@seventh ~]# 
> 
> To failing with:
> 
> [root@seventh ~]# perf test -v pton
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 28954
> ping 28971 [002] 14277.711200: probe_libc:inet_pton: (7fc9d66e3e40)
> 7fc9d66e3e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> 7fc9d66b02b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> 56075cb98f40 [unknown] (/usr/bin/ping)
> FAIL: expected backtrace entry 3 
> ".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" got "56075cb98f40 
> [unknown] (/usr/bin/ping)"
> test child finished with -1
>  end 
> probe libc's inet_pton & backtrace it with ping: FAILED!
> [root@seventh ~]# 
> 
> Trying to figure this out...

[root@seventh perf]# perf script
ping 29170 [001] 14644.810782: probe_libc:inet_pton: (7f0bac71be40)
7f0bac71be40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
7f0bac6e82b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
5585e10adf40 [unknown] (/usr/bin/ping)

[root@seventh perf]# perf report --mmaps
#  pid  tid ppid  comm
 00   -1 |swapper
 2917029170   -1 |ping
5585e10ab000-5585e12dd000 r-xp  2771393 
/usr/bin/ping
7f0babfcf000-7f0bac1ed000 r-xp  2762913 
/usr/lib64/libpthread-2.26.so
7f0bac1ed000-7f0bac3f1000 r-xp  2753363 
/usr/lib64/libdl-2.26.so
7f0bac3f1000-7f0bac608000 r-xp  2760078 
/usr/lib64/libz.so.1.2.11
7f0bac608000-7f0bac9be000 r-xp  2753359 
/usr/lib64/libc-2.26.so
7f0bac9be000-7f0bacd09000 r-xp  2762901 
/usr/lib64/libm-2.26.so
7f0bacd09000-7f0bacf2 r-xp  2762915 
/usr/lib64/libresolv-2.26.so
7f0bacf2-7f0bad3a8000 r-xp  2764213 
/usr/lib64/libcrypto.so.1.1.0h
7f0bad3a8000-7f0bad5dc000 r-xp  2761070 
/usr/lib64/libidn.so.11.6.18
7f0bad5dc000-7f0bad7e1000 r-xp  2760189 
/usr/lib64/libcap.so.2.25
7f0bad7e1000-7f0bada08000 r-xp  2753353 
/usr/lib64/ld-2.26.so
7ffe23d99000-7ffe23d9b000 r-xp  0 [vdso]
[root@seventh perf]# 

So it is there, but I don't have debuginfo for ping, lets see if I add
it...

[root@seventh perf]# rpm -qf `which ping`
iputils-20161105-7.fc27.x86_64
[root@seventh perf]# 
[root@seventh perf]# dnf debuginfo-install 

Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Sandipan Das
Hi Arnaldo,

On 05/19/2018 12:51 AM, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 18, 2018 at 12:54:17PM +0530, Sandipan Das escreveu:
>> This test currently fails because the regular expressions for
>> matching the output of perf script do not consider the symbol
>> offsets to be part of the output.
>>
>> The symbol offsets are seen because of the default behaviour
>> introduced by commit 4140d2ea74b3 ("perf script: Show symbol
>> offsets by default").
>>
>> Before applying this patch:
>>
>>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
>>
>>   62: probe libc's inet_pton & backtrace it with ping   :
>>   --- start ---
>>   test child forked, pid 30389
>>   ping 30406 [002] 307144.280983: probe_libc:inet_pton: (7f4117adf220)
>>   7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>>   FAIL: expected backtrace entry 1 
>> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.25.so|inlined\)$" got 
>> "7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)"
>>   test child finished with -1
>>    end 
>>   probe libc's inet_pton & backtrace it with ping: FAILED!
>>
>> After applying this patch:
>>
>>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
>>
>>   62: probe libc's inet_pton & backtrace it with ping   :
>>   --- start ---
>>   test child forked, pid 30539
>>   ping 30556 [003] 307254.313217: probe_libc:inet_pton: (7fe19ab10220)
>>   7fe19ab10220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>>   7fe19aad5ebd getaddrinfo+0x11d (/usr/lib64/libc-2.25.so)
>>   56351e3c1c71 main+0x891 (/usr/bin/ping)
>>   test child finished with 0
>>    end 
>>   probe libc's inet_pton & backtrace it with ping: Ok
> 
> Here it went from failing with:
> 
> [root@seventh ~]# perf test -v pton
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 22590
> ping 22607 [001] 12782.425689: probe_libc:inet_pton: (7f8686da4e40)
> 7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> FAIL: expected backtrace entry 1 
> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" got 
> "7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)"
> test child finished with -1
>  end 
> probe libc's inet_pton & backtrace it with ping: FAILED!
> [root@seventh ~]# 
> 
> To failing with:
> 
> [root@seventh ~]# perf test -v pton
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 28954
> ping 28971 [002] 14277.711200: probe_libc:inet_pton: (7fc9d66e3e40)
> 7fc9d66e3e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> 7fc9d66b02b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> 56075cb98f40 [unknown] (/usr/bin/ping)
> FAIL: expected backtrace entry 3 
> ".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" got "56075cb98f40 
> [unknown] (/usr/bin/ping)"
> test child finished with -1
>  end 
> probe libc's inet_pton & backtrace it with ping: FAILED!
> [root@seventh ~]# 
> 
> Trying to figure this out...
> 

Looks like perf failed to resolve the symbol's name for the last entry
in the callchain. I did not consider this case. So, if this happens, we
would be better off using the original regex for the last line:

expected[3]=".*\(.*/bin/ping.*\)$"

- Sandipan



Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Sandipan Das
Hi Arnaldo,

On 05/19/2018 12:51 AM, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 18, 2018 at 12:54:17PM +0530, Sandipan Das escreveu:
>> This test currently fails because the regular expressions for
>> matching the output of perf script do not consider the symbol
>> offsets to be part of the output.
>>
>> The symbol offsets are seen because of the default behaviour
>> introduced by commit 4140d2ea74b3 ("perf script: Show symbol
>> offsets by default").
>>
>> Before applying this patch:
>>
>>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
>>
>>   62: probe libc's inet_pton & backtrace it with ping   :
>>   --- start ---
>>   test child forked, pid 30389
>>   ping 30406 [002] 307144.280983: probe_libc:inet_pton: (7f4117adf220)
>>   7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>>   FAIL: expected backtrace entry 1 
>> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.25.so|inlined\)$" got 
>> "7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)"
>>   test child finished with -1
>>    end 
>>   probe libc's inet_pton & backtrace it with ping: FAILED!
>>
>> After applying this patch:
>>
>>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
>>
>>   62: probe libc's inet_pton & backtrace it with ping   :
>>   --- start ---
>>   test child forked, pid 30539
>>   ping 30556 [003] 307254.313217: probe_libc:inet_pton: (7fe19ab10220)
>>   7fe19ab10220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>>   7fe19aad5ebd getaddrinfo+0x11d (/usr/lib64/libc-2.25.so)
>>   56351e3c1c71 main+0x891 (/usr/bin/ping)
>>   test child finished with 0
>>    end 
>>   probe libc's inet_pton & backtrace it with ping: Ok
> 
> Here it went from failing with:
> 
> [root@seventh ~]# perf test -v pton
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 22590
> ping 22607 [001] 12782.425689: probe_libc:inet_pton: (7f8686da4e40)
> 7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> FAIL: expected backtrace entry 1 
> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" got 
> "7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)"
> test child finished with -1
>  end 
> probe libc's inet_pton & backtrace it with ping: FAILED!
> [root@seventh ~]# 
> 
> To failing with:
> 
> [root@seventh ~]# perf test -v pton
> 64: probe libc's inet_pton & backtrace it with ping   :
> --- start ---
> test child forked, pid 28954
> ping 28971 [002] 14277.711200: probe_libc:inet_pton: (7fc9d66e3e40)
> 7fc9d66e3e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
> 7fc9d66b02b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
> 56075cb98f40 [unknown] (/usr/bin/ping)
> FAIL: expected backtrace entry 3 
> ".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" got "56075cb98f40 
> [unknown] (/usr/bin/ping)"
> test child finished with -1
>  end 
> probe libc's inet_pton & backtrace it with ping: FAILED!
> [root@seventh ~]# 
> 
> Trying to figure this out...
> 

Looks like perf failed to resolve the symbol's name for the last entry
in the callchain. I did not consider this case. So, if this happens, we
would be better off using the original regex for the last line:

expected[3]=".*\(.*/bin/ping.*\)$"

- Sandipan



Re: [PATCH v2 1/2] i2c: core-smbus: fix a potential uninitialization bug

2018-05-18 Thread Wenwen Wang
Yes, this patch does not aim to "fix" all potential driver bugs but
adds an additional protection in case the implementation of
.master_xfer is incorrect.

>From this perspective, it is still necessary to apply this patch, as
pointed out by Peter.

Thanks,
Wenwen

On Mon, May 14, 2018 at 3:31 PM, Peter Rosin  wrote:
> On 2018-05-10 13:17, Wolfram Sang wrote:
>> On Sat, May 05, 2018 at 07:57:10AM -0500, Wenwen Wang wrote:
>>> In i2c_smbus_xfer_emulated(), there are two buffers: msgbuf0 and msgbuf1,
>>> which are used to save a series of messages, as mentioned in the comment.
>>> According to the value of the variable 'size', msgbuf0 is initialized to
>>> various values. In contrast, msgbuf1 is left uninitialized until the
>>> function i2c_transfer() is invoked. However, msgbuf1 is not always
>>> initialized on all possible execution paths (implementation) of
>>> i2c_transfer(). Thus, it is possible that msgbuf1 may still be
>>> uninitialized even after the invocation of the function i2c_transfer(),
>>> especially when the return value of ic2_transfer() is not checked properly.
>>> In the following execution, the uninitialized msgbuf1 will be used, such as
>>> for security checks. Since uninitialized values can be random and
>>> arbitrary, this will cause undefined behaviors or even check bypass. For
>>> example, it is expected that if the value of 'size' is
>>> I2C_SMBUS_BLOCK_PROC_CALL, the value of data->block[0] should not be larger
>>> than I2C_SMBUS_BLOCK_MAX. But, at the end of i2c_smbus_xfer_emulated(), the
>>> value read from msgbuf1 is assigned to data->block[0], which can
>>> potentially lead to invalid block write size, as demonstrated in the error
>>> message.
>>>
>>> This patch initializes the first byte of msgbuf1 with 0 to avoid such
>>> undefined behaviors or security issues.
>>>
>>> Signed-off-by: Wenwen Wang 
>>
>> From what I can tell, this patch is not needed anymore after patch 2 is
>> applied. Correct?
>
> AFAIU, it is only needed if there are bugs elsewhere. I.e. it's for extra
> protection. If all drivers implement .master_xfer correctly, msgbuf1 will
> be filled in and the return value will be the number of messages (i.e. 2)
> OR you get a negative return value and the msgbuf1 content will not matter.
>
> The patch does not magically fix all possible driver bugs, so in that
> sense this patch is still "needed".
>
> Also - again AFAIU - there is no known bug that actually gets caught by
> this extra check.
>
> Cheers,
> Peter


Re: [PATCH v2 1/2] i2c: core-smbus: fix a potential uninitialization bug

2018-05-18 Thread Wenwen Wang
Yes, this patch does not aim to "fix" all potential driver bugs but
adds an additional protection in case the implementation of
.master_xfer is incorrect.

>From this perspective, it is still necessary to apply this patch, as
pointed out by Peter.

Thanks,
Wenwen

On Mon, May 14, 2018 at 3:31 PM, Peter Rosin  wrote:
> On 2018-05-10 13:17, Wolfram Sang wrote:
>> On Sat, May 05, 2018 at 07:57:10AM -0500, Wenwen Wang wrote:
>>> In i2c_smbus_xfer_emulated(), there are two buffers: msgbuf0 and msgbuf1,
>>> which are used to save a series of messages, as mentioned in the comment.
>>> According to the value of the variable 'size', msgbuf0 is initialized to
>>> various values. In contrast, msgbuf1 is left uninitialized until the
>>> function i2c_transfer() is invoked. However, msgbuf1 is not always
>>> initialized on all possible execution paths (implementation) of
>>> i2c_transfer(). Thus, it is possible that msgbuf1 may still be
>>> uninitialized even after the invocation of the function i2c_transfer(),
>>> especially when the return value of ic2_transfer() is not checked properly.
>>> In the following execution, the uninitialized msgbuf1 will be used, such as
>>> for security checks. Since uninitialized values can be random and
>>> arbitrary, this will cause undefined behaviors or even check bypass. For
>>> example, it is expected that if the value of 'size' is
>>> I2C_SMBUS_BLOCK_PROC_CALL, the value of data->block[0] should not be larger
>>> than I2C_SMBUS_BLOCK_MAX. But, at the end of i2c_smbus_xfer_emulated(), the
>>> value read from msgbuf1 is assigned to data->block[0], which can
>>> potentially lead to invalid block write size, as demonstrated in the error
>>> message.
>>>
>>> This patch initializes the first byte of msgbuf1 with 0 to avoid such
>>> undefined behaviors or security issues.
>>>
>>> Signed-off-by: Wenwen Wang 
>>
>> From what I can tell, this patch is not needed anymore after patch 2 is
>> applied. Correct?
>
> AFAIU, it is only needed if there are bugs elsewhere. I.e. it's for extra
> protection. If all drivers implement .master_xfer correctly, msgbuf1 will
> be filled in and the return value will be the number of messages (i.e. 2)
> OR you get a negative return value and the msgbuf1 content will not matter.
>
> The patch does not magically fix all possible driver bugs, so in that
> sense this patch is still "needed".
>
> Also - again AFAIU - there is no known bug that actually gets caught by
> this extra check.
>
> Cheers,
> Peter


Re: [PATCH 4.16 00/55] 4.16.10-stable review

2018-05-18 Thread Naresh Kamboju
On 18 May 2018 at 13:44, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.16.10 release.
> There are 55 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun May 20 08:14:42 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.10-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.16.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary


kernel: 4.16.10-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.16.y
git commit: de63ee0383b7163aa4305ab30fd66d3e99ebfebe
git describe: v4.16.9-56-gde63ee0383b7
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.16-oe/build/v4.16.9-56-gde63ee0383b7


No regressions (compared to build v4.16.9-55-g2b695e770deb)

Boards, architectures and test suites:
-

dragonboard-410c - arm64
* boot - pass: 20, fail: 1,
* kselftest - pass: 40, fail: 1, skip: 26
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1017, skip: 133
* ltp-timers-tests - pass: 13,

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 45, skip: 22
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10, skip: 4
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1016, skip: 134
* ltp-timers-tests - pass: 13,

juno-r2 - arm64
* boot - pass: 21,
* kselftest - pass: 44, skip: 24
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10, skip: 4
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1017, skip: 133
* ltp-timers-tests - pass: 13,

qemu_arm
* boot - pass: 20,
* kselftest - pass: 37, skip: 31
* libhugetlbfs - pass: 1,
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 63, skip: 18
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 58, skip: 5
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 7, skip: 7
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1055, fail: 3, skip: 92
* ltp-timers-tests - pass: 13,

qemu_arm64
* boot - pass: 20,
* kselftest - pass: 41, skip: 29
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 8, skip: 6
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 993, fail: 2, skip: 155
* ltp-timers-tests - pass: 13,

qemu_x86_64
* boot - pass: 20,
* 

Re: [PATCH 4.16 00/55] 4.16.10-stable review

2018-05-18 Thread Naresh Kamboju
On 18 May 2018 at 13:44, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.16.10 release.
> There are 55 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun May 20 08:14:42 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.10-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.16.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary


kernel: 4.16.10-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.16.y
git commit: de63ee0383b7163aa4305ab30fd66d3e99ebfebe
git describe: v4.16.9-56-gde63ee0383b7
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.16-oe/build/v4.16.9-56-gde63ee0383b7


No regressions (compared to build v4.16.9-55-g2b695e770deb)

Boards, architectures and test suites:
-

dragonboard-410c - arm64
* boot - pass: 20, fail: 1,
* kselftest - pass: 40, fail: 1, skip: 26
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1017, skip: 133
* ltp-timers-tests - pass: 13,

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 45, skip: 22
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10, skip: 4
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1016, skip: 134
* ltp-timers-tests - pass: 13,

juno-r2 - arm64
* boot - pass: 21,
* kselftest - pass: 44, skip: 24
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10, skip: 4
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1017, skip: 133
* ltp-timers-tests - pass: 13,

qemu_arm
* boot - pass: 20,
* kselftest - pass: 37, skip: 31
* libhugetlbfs - pass: 1,
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 63, skip: 18
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 58, skip: 5
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 7, skip: 7
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1055, fail: 3, skip: 92
* ltp-timers-tests - pass: 13,

qemu_arm64
* boot - pass: 20,
* kselftest - pass: 41, skip: 29
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 8, skip: 6
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 993, fail: 2, skip: 155
* ltp-timers-tests - pass: 13,

qemu_x86_64
* boot - pass: 20,
* kselftest - pass: 50, skip: 

Re: [PATCHv2][SMB3] Add kernel trace support

2018-05-18 Thread Ralph Böhme
On Thu, May 17, 2018 at 09:36:36PM -0500, Steve French via samba-technical 
wrote:
> Patch updated with additional tracepoint locations and some formatting
> improvements. There are some obvious additional tracepoints that could
> be added, but this should be a reasonable group to start with.
> 
> From edc02d6f9dc24963d510c7ef59067428d3b082d3 Mon Sep 17 00:00:00 2001
> From: Steve French 
> Date: Thu, 17 May 2018 21:16:55 -0500
> Subject: [PATCH] smb3: Add ftrace tracepoints for improved SMB3 debugging
> 
> Although dmesg logs and wireshark network traces can be
> helpful, being able to dynamically enable/disable tracepoints
> (in this case via the kernel ftrace mechanism) can also be
> helpful in more quickly debugging problems, and more
> selectively tracing the events related to the bug report.
> 
> This patch adds 12 ftrace tracepoints to cifs.ko for SMB3 events
> in some obvious locations.  Subsequent patches will add more
> as needed.
> 
> Example use:
>trace-cmd record -e cifs
>
>trace-cmd show

pardon my ignorance, but are these tracepoints usable with other tracing
frameworks like Systemtap?

Last time I checked, Systemtap looked like *the* tool. Is there a generic trace
point infrastructure that tracing tools can consume, so we're not tied to
ftrace?

Thanks!
-slow

-- 
Ralph Boehme, Samba Team   https://samba.org/
Samba Developer, SerNet GmbH   https://sernet.de/en/samba/
GPG Key Fingerprint:   FAE2 C608 8A24 2520 51C5
   59E4 AA1E 9B71 2639 9E46


Re: [PATCHv2][SMB3] Add kernel trace support

2018-05-18 Thread Ralph Böhme
On Thu, May 17, 2018 at 09:36:36PM -0500, Steve French via samba-technical 
wrote:
> Patch updated with additional tracepoint locations and some formatting
> improvements. There are some obvious additional tracepoints that could
> be added, but this should be a reasonable group to start with.
> 
> From edc02d6f9dc24963d510c7ef59067428d3b082d3 Mon Sep 17 00:00:00 2001
> From: Steve French 
> Date: Thu, 17 May 2018 21:16:55 -0500
> Subject: [PATCH] smb3: Add ftrace tracepoints for improved SMB3 debugging
> 
> Although dmesg logs and wireshark network traces can be
> helpful, being able to dynamically enable/disable tracepoints
> (in this case via the kernel ftrace mechanism) can also be
> helpful in more quickly debugging problems, and more
> selectively tracing the events related to the bug report.
> 
> This patch adds 12 ftrace tracepoints to cifs.ko for SMB3 events
> in some obvious locations.  Subsequent patches will add more
> as needed.
> 
> Example use:
>trace-cmd record -e cifs
>
>trace-cmd show

pardon my ignorance, but are these tracepoints usable with other tracing
frameworks like Systemtap?

Last time I checked, Systemtap looked like *the* tool. Is there a generic trace
point infrastructure that tracing tools can consume, so we're not tied to
ftrace?

Thanks!
-slow

-- 
Ralph Boehme, Samba Team   https://samba.org/
Samba Developer, SerNet GmbH   https://sernet.de/en/samba/
GPG Key Fingerprint:   FAE2 C608 8A24 2520 51C5
   59E4 AA1E 9B71 2639 9E46


Re: [PATCH v2] selftests/cgroup: memory controller self-tests

2018-05-18 Thread Shuah Khan
On 05/11/2018 02:22 PM, Shuah Khan wrote:
> On 05/11/2018 02:02 PM, Tejun Heo wrote:
>> On Fri, May 11, 2018 at 01:33:26PM -0600, Shuah Khan wrote:
>>> Yeah. I see that. You have a switch for the KSFT_ values. Since there is no
>>> dependency on the cgroup tree, I would recommend having this patch go 
>>> through
>>> kselftest tree which is the normal process for tests anyway.
>>>
>>> This version is good and I can apply this to linux-kselftest next. I ran a
>>> quick test and the Skip case looks good.
>>>
>>> TAP version 13
>>> selftests: cgroup: test_memcontrol
>>> 
>>> 1..0 # Skipped: memory controller isn't available
>>> not ok 1..1 selftests: cgroup: test_memcontrol [SKIP]
>>>
>>>
>>> Tejun! Please send me your Ack.
>>
>> Sure, please feel free to add my ack.  I'll revert the original patch
>> from cgroup tree.
>>
>> Thanks.
>>
> 
> Thanks. I will apply this to linux-kselftest next for 4.18-rc1 with
> you Ack.
> 
> thanks,
> -- Shuah
> 

Hi Roman,

Could you please send me the fix for

[bug,report] selftests: cgroup: add memory controller self-tests

It is cc'ed to kselftest mailing list and not to me. Please send
it directly for me to apply it.

thanks,
-- Shuah


Re: [PATCH v2] selftests/cgroup: memory controller self-tests

2018-05-18 Thread Shuah Khan
On 05/11/2018 02:22 PM, Shuah Khan wrote:
> On 05/11/2018 02:02 PM, Tejun Heo wrote:
>> On Fri, May 11, 2018 at 01:33:26PM -0600, Shuah Khan wrote:
>>> Yeah. I see that. You have a switch for the KSFT_ values. Since there is no
>>> dependency on the cgroup tree, I would recommend having this patch go 
>>> through
>>> kselftest tree which is the normal process for tests anyway.
>>>
>>> This version is good and I can apply this to linux-kselftest next. I ran a
>>> quick test and the Skip case looks good.
>>>
>>> TAP version 13
>>> selftests: cgroup: test_memcontrol
>>> 
>>> 1..0 # Skipped: memory controller isn't available
>>> not ok 1..1 selftests: cgroup: test_memcontrol [SKIP]
>>>
>>>
>>> Tejun! Please send me your Ack.
>>
>> Sure, please feel free to add my ack.  I'll revert the original patch
>> from cgroup tree.
>>
>> Thanks.
>>
> 
> Thanks. I will apply this to linux-kselftest next for 4.18-rc1 with
> you Ack.
> 
> thanks,
> -- Shuah
> 

Hi Roman,

Could you please send me the fix for

[bug,report] selftests: cgroup: add memory controller self-tests

It is cc'ed to kselftest mailing list and not to me. Please send
it directly for me to apply it.

thanks,
-- Shuah


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread hpa
On May 18, 2018 12:21:00 PM PDT, Linus Torvalds  
wrote:
>On Fri, May 18, 2018 at 12:18 PM Nadav Amit  wrote:
>
>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>source
>> program. The source program is made up of one or more files.”
>
>Ok, that counts as documentation, although it's confusing as hell. Is
>it
>one source program or multiple files again? I see what they are trying
>to
>say, but it really could be phrased more clearly ;)
>
> Linus

At least we have a solution!
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread hpa
On May 18, 2018 12:21:00 PM PDT, Linus Torvalds  
wrote:
>On Fri, May 18, 2018 at 12:18 PM Nadav Amit  wrote:
>
>> Gnu ASM manual says: "Each time you run as it assembles exactly one
>source
>> program. The source program is made up of one or more files.”
>
>Ok, that counts as documentation, although it's confusing as hell. Is
>it
>one source program or multiple files again? I see what they are trying
>to
>say, but it really could be phrased more clearly ;)
>
> Linus

At least we have a solution!
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH] kernel: sys: fix potential Spectre v1

2018-05-18 Thread Gustavo A. R. Silva



On 05/18/2018 02:04 PM, Gustavo A. R. Silva wrote:



On 05/15/2018 05:57 PM, Dan Williams wrote:
On Tue, May 15, 2018 at 3:29 PM, Thomas Gleixner  
wrote:

On Tue, 15 May 2018, Andrew Morton wrote:
On Mon, 14 May 2018 22:00:38 -0500 "Gustavo A. R. Silva" 
 wrote:



resource can be controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential
spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre 
issue

'get_current()->signal->rlim' (local cap)

Fix this by sanitizing *resource* before using it to index
current->signal->rlim

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].


hm.  Not my area, but I'm always willing to learn ;)


--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -69,6 +69,9 @@
  #include 
  #include 

+/* Hardening for Spectre-v1 */
+#include 
+
  #include "uid16.h"

  #ifndef SET_UNALIGN_CTL
@@ -1451,6 +1454,7 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, 
resource,

 if (resource >= RLIM_NLIMITS)
 return -EINVAL;

+   resource = array_index_nospec(resource, RLIM_NLIMITS);
 task_lock(current->group_leader);
 x = current->signal->rlim[resource];


Can the speculation proceed past the task_lock()?  Or is the policy to
ignore such happy happenstances even if they are available?


Locks are not in the way of speculation. Speculation has almost no 
limits

except serializing instructions. At least they respect the magic AND
limitation in array_index_nospec().


I'd say it another way, because they don't respect the magic AND, we
just make the result in the speculation path safe. So, it's controlled
speculation.



Dan,

What do you think about adding the following function to the nospec API:

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index e791ebc..81e9a77 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -55,4 +55,17 @@ static inline unsigned long 
array_index_mask_nospec(unsigned long index,

     \
     (typeof(_i)) (_i & _mask);  \
  })
+
+
+#ifndef sanitize_index_nospec
+inline bool sanitize_index_nospec(unsigned long index,
+ unsigned long size)
+{
+   if (index >= size)
+   return false;
+   index = array_index_nospec(index, size);
+
+   return true;
+}
+#endif
  #endif /* _LINUX_NOSPEC_H */



Oops, it seems I sent the wrong patch. The function would look like this:

#ifndef sanitize_index_nospec
inline bool sanitize_index_nospec(unsigned long *index,
  unsigned long size)
{
if (*index >= size)
return false;
*index = array_index_nospec(*index, size);

return true;
}
#endif

Thanks
--
Gustavo


Re: [PATCH] kernel: sys: fix potential Spectre v1

2018-05-18 Thread Gustavo A. R. Silva



On 05/18/2018 02:04 PM, Gustavo A. R. Silva wrote:



On 05/15/2018 05:57 PM, Dan Williams wrote:
On Tue, May 15, 2018 at 3:29 PM, Thomas Gleixner  
wrote:

On Tue, 15 May 2018, Andrew Morton wrote:
On Mon, 14 May 2018 22:00:38 -0500 "Gustavo A. R. Silva" 
 wrote:



resource can be controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential
spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre 
issue

'get_current()->signal->rlim' (local cap)

Fix this by sanitizing *resource* before using it to index
current->signal->rlim

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].


hm.  Not my area, but I'm always willing to learn ;)


--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -69,6 +69,9 @@
  #include 
  #include 

+/* Hardening for Spectre-v1 */
+#include 
+
  #include "uid16.h"

  #ifndef SET_UNALIGN_CTL
@@ -1451,6 +1454,7 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, 
resource,

 if (resource >= RLIM_NLIMITS)
 return -EINVAL;

+   resource = array_index_nospec(resource, RLIM_NLIMITS);
 task_lock(current->group_leader);
 x = current->signal->rlim[resource];


Can the speculation proceed past the task_lock()?  Or is the policy to
ignore such happy happenstances even if they are available?


Locks are not in the way of speculation. Speculation has almost no 
limits

except serializing instructions. At least they respect the magic AND
limitation in array_index_nospec().


I'd say it another way, because they don't respect the magic AND, we
just make the result in the speculation path safe. So, it's controlled
speculation.



Dan,

What do you think about adding the following function to the nospec API:

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index e791ebc..81e9a77 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -55,4 +55,17 @@ static inline unsigned long 
array_index_mask_nospec(unsigned long index,

     \
     (typeof(_i)) (_i & _mask);  \
  })
+
+
+#ifndef sanitize_index_nospec
+inline bool sanitize_index_nospec(unsigned long index,
+ unsigned long size)
+{
+   if (index >= size)
+   return false;
+   index = array_index_nospec(index, size);
+
+   return true;
+}
+#endif
  #endif /* _LINUX_NOSPEC_H */



Oops, it seems I sent the wrong patch. The function would look like this:

#ifndef sanitize_index_nospec
inline bool sanitize_index_nospec(unsigned long *index,
  unsigned long size)
{
if (*index >= size)
return false;
*index = array_index_nospec(*index, size);

return true;
}
#endif

Thanks
--
Gustavo


Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Arnaldo Carvalho de Melo
Em Fri, May 18, 2018 at 12:54:17PM +0530, Sandipan Das escreveu:
> This test currently fails because the regular expressions for
> matching the output of perf script do not consider the symbol
> offsets to be part of the output.
> 
> The symbol offsets are seen because of the default behaviour
> introduced by commit 4140d2ea74b3 ("perf script: Show symbol
> offsets by default").
> 
> Before applying this patch:
> 
>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> 
>   62: probe libc's inet_pton & backtrace it with ping   :
>   --- start ---
>   test child forked, pid 30389
>   ping 30406 [002] 307144.280983: probe_libc:inet_pton: (7f4117adf220)
>   7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>   FAIL: expected backtrace entry 1 
> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.25.so|inlined\)$" got 
> "7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)"
>   test child finished with -1
>    end 
>   probe libc's inet_pton & backtrace it with ping: FAILED!
> 
> After applying this patch:
> 
>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> 
>   62: probe libc's inet_pton & backtrace it with ping   :
>   --- start ---
>   test child forked, pid 30539
>   ping 30556 [003] 307254.313217: probe_libc:inet_pton: (7fe19ab10220)
>   7fe19ab10220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>   7fe19aad5ebd getaddrinfo+0x11d (/usr/lib64/libc-2.25.so)
>   56351e3c1c71 main+0x891 (/usr/bin/ping)
>   test child finished with 0
>    end 
>   probe libc's inet_pton & backtrace it with ping: Ok

Here it went from failing with:

[root@seventh ~]# perf test -v pton
64: probe libc's inet_pton & backtrace it with ping   :
--- start ---
test child forked, pid 22590
ping 22607 [001] 12782.425689: probe_libc:inet_pton: (7f8686da4e40)
7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
FAIL: expected backtrace entry 1 
".*inet_pton[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" got "7f8686da4e40 
__GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)"
test child finished with -1
 end 
probe libc's inet_pton & backtrace it with ping: FAILED!
[root@seventh ~]# 

To failing with:

[root@seventh ~]# perf test -v pton
64: probe libc's inet_pton & backtrace it with ping   :
--- start ---
test child forked, pid 28954
ping 28971 [002] 14277.711200: probe_libc:inet_pton: (7fc9d66e3e40)
7fc9d66e3e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
7fc9d66b02b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
56075cb98f40 [unknown] (/usr/bin/ping)
FAIL: expected backtrace entry 3 
".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" got "56075cb98f40 [unknown] 
(/usr/bin/ping)"
test child finished with -1
 end 
probe libc's inet_pton & backtrace it with ping: FAILED!
[root@seventh ~]# 

Trying to figure this out...

- Arnaldo


Re: [PATCH] perf tests: Fix regex for record+probe_libc_inet_pton.sh

2018-05-18 Thread Arnaldo Carvalho de Melo
Em Fri, May 18, 2018 at 12:54:17PM +0530, Sandipan Das escreveu:
> This test currently fails because the regular expressions for
> matching the output of perf script do not consider the symbol
> offsets to be part of the output.
> 
> The symbol offsets are seen because of the default behaviour
> introduced by commit 4140d2ea74b3 ("perf script: Show symbol
> offsets by default").
> 
> Before applying this patch:
> 
>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> 
>   62: probe libc's inet_pton & backtrace it with ping   :
>   --- start ---
>   test child forked, pid 30389
>   ping 30406 [002] 307144.280983: probe_libc:inet_pton: (7f4117adf220)
>   7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>   FAIL: expected backtrace entry 1 
> ".*inet_pton[[:space:]]\(/usr/lib64/libc-2.25.so|inlined\)$" got 
> "7f4117adf220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)"
>   test child finished with -1
>    end 
>   probe libc's inet_pton & backtrace it with ping: FAILED!
> 
> After applying this patch:
> 
>   # perf test -v "probe libc's inet_pton & backtrace it with ping"
> 
>   62: probe libc's inet_pton & backtrace it with ping   :
>   --- start ---
>   test child forked, pid 30539
>   ping 30556 [003] 307254.313217: probe_libc:inet_pton: (7fe19ab10220)
>   7fe19ab10220 __GI___inet_pton+0x0 (/usr/lib64/libc-2.25.so)
>   7fe19aad5ebd getaddrinfo+0x11d (/usr/lib64/libc-2.25.so)
>   56351e3c1c71 main+0x891 (/usr/bin/ping)
>   test child finished with 0
>    end 
>   probe libc's inet_pton & backtrace it with ping: Ok

Here it went from failing with:

[root@seventh ~]# perf test -v pton
64: probe libc's inet_pton & backtrace it with ping   :
--- start ---
test child forked, pid 22590
ping 22607 [001] 12782.425689: probe_libc:inet_pton: (7f8686da4e40)
7f8686da4e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
FAIL: expected backtrace entry 1 
".*inet_pton[[:space:]]\(/usr/lib64/libc-2.26.so|inlined\)$" got "7f8686da4e40 
__GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)"
test child finished with -1
 end 
probe libc's inet_pton & backtrace it with ping: FAILED!
[root@seventh ~]# 

To failing with:

[root@seventh ~]# perf test -v pton
64: probe libc's inet_pton & backtrace it with ping   :
--- start ---
test child forked, pid 28954
ping 28971 [002] 14277.711200: probe_libc:inet_pton: (7fc9d66e3e40)
7fc9d66e3e40 __GI___inet_pton+0x0 (/usr/lib64/libc-2.26.so)
7fc9d66b02b4 getaddrinfo+0x124 (/usr/lib64/libc-2.26.so)
56075cb98f40 [unknown] (/usr/bin/ping)
FAIL: expected backtrace entry 3 
".*\+0x[[:xdigit:]]+[[:space:]]\(.*/bin/ping.*\)$" got "56075cb98f40 [unknown] 
(/usr/bin/ping)"
test child finished with -1
 end 
probe libc's inet_pton & backtrace it with ping: FAILED!
[root@seventh ~]# 

Trying to figure this out...

- Arnaldo


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Linus Torvalds
On Fri, May 18, 2018 at 12:18 PM Nadav Amit  wrote:

> Gnu ASM manual says: "Each time you run as it assembles exactly one source
> program. The source program is made up of one or more files.”

Ok, that counts as documentation, although it's confusing as hell. Is it
one source program or multiple files again? I see what they are trying to
say, but it really could be phrased more clearly ;)

 Linus


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Linus Torvalds
On Fri, May 18, 2018 at 12:18 PM Nadav Amit  wrote:

> Gnu ASM manual says: "Each time you run as it assembles exactly one source
> program. The source program is made up of one or more files.”

Ok, that counts as documentation, although it's confusing as hell. Is it
one source program or multiple files again? I see what they are trying to
say, but it really could be phrased more clearly ;)

 Linus


Re: [PATCH] selftest: intel_pstate: debug support message from aperf.c and return value fix.

2018-05-18 Thread Shuah Khan
On 05/18/2018 08:05 AM, Jeffrin Jose T wrote:
> Additional message along with an error message which is more
> verbose for debug support from aperf.c and updated with the
> new return value "KSFT_SKIP".
> 
> Signed-off-by: Jeffrin Jose T 
> ---
>  tools/testing/selftests/intel_pstate/aperf.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/intel_pstate/aperf.c 
> b/tools/testing/selftests/intel_pstate/aperf.c
> index d21edea9c560..f6cd03a87493 100644
> --- a/tools/testing/selftests/intel_pstate/aperf.c
> +++ b/tools/testing/selftests/intel_pstate/aperf.c
> @@ -9,6 +9,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include "../kselftest.h"
>  
>  void usage(char *name) {
>   printf ("Usage: %s cpunum\n", name);
> @@ -41,8 +43,8 @@ int main(int argc, char **argv) {
>   fd = open(msr_file_name, O_RDONLY);
>  
>   if (fd == -1) {
> - perror("Failed to open");
> - return 1;
> + printf("/dev/cpu/%d/msr: %s\n", cpu, strerror(errno));
> + return KSFT_SKIP;
>   }
>  
>   CPU_ZERO();
> 

Thanks. I will queue this up for 4.18-rc1

thanks,
-- Shuah


Re: [PATCH] selftest: intel_pstate: debug support message from aperf.c and return value fix.

2018-05-18 Thread Shuah Khan
On 05/18/2018 08:05 AM, Jeffrin Jose T wrote:
> Additional message along with an error message which is more
> verbose for debug support from aperf.c and updated with the
> new return value "KSFT_SKIP".
> 
> Signed-off-by: Jeffrin Jose T 
> ---
>  tools/testing/selftests/intel_pstate/aperf.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/intel_pstate/aperf.c 
> b/tools/testing/selftests/intel_pstate/aperf.c
> index d21edea9c560..f6cd03a87493 100644
> --- a/tools/testing/selftests/intel_pstate/aperf.c
> +++ b/tools/testing/selftests/intel_pstate/aperf.c
> @@ -9,6 +9,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include "../kselftest.h"
>  
>  void usage(char *name) {
>   printf ("Usage: %s cpunum\n", name);
> @@ -41,8 +43,8 @@ int main(int argc, char **argv) {
>   fd = open(msr_file_name, O_RDONLY);
>  
>   if (fd == -1) {
> - perror("Failed to open");
> - return 1;
> + printf("/dev/cpu/%d/msr: %s\n", cpu, strerror(errno));
> + return KSFT_SKIP;
>   }
>  
>   CPU_ZERO();
> 

Thanks. I will queue this up for 4.18-rc1

thanks,
-- Shuah


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Nadav Amit
Linus Torvalds  wrote:

> On Fri, May 18, 2018 at 12:02 PM Nadav Amit  wrote:
> 
>> I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.
> 
> Oh, if it assembles things together, then that sounds optimal.
> 
> And yes, like hpa says, we should make sure that behavior is acknowledged
> by the GNU as people, so that they then don't come back and say "hey, now
> we assemble things as separate units".
> 
> That said, the "separate units" model really doesn't make much sense now
> that I think about it, because 'as' supports only one output file. So I
> guess the as people wouldn't have any issues with just accepting the
> "concatenate all input files" syntax.

Gnu ASM manual says: "Each time you run as it assembles exactly one source
program. The source program is made up of one or more files.”



Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Nadav Amit
Linus Torvalds  wrote:

> On Fri, May 18, 2018 at 12:02 PM Nadav Amit  wrote:
> 
>> I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.
> 
> Oh, if it assembles things together, then that sounds optimal.
> 
> And yes, like hpa says, we should make sure that behavior is acknowledged
> by the GNU as people, so that they then don't come back and say "hey, now
> we assemble things as separate units".
> 
> That said, the "separate units" model really doesn't make much sense now
> that I think about it, because 'as' supports only one output file. So I
> guess the as people wouldn't have any issues with just accepting the
> "concatenate all input files" syntax.

Gnu ASM manual says: "Each time you run as it assembles exactly one source
program. The source program is made up of one or more files.”



[PATCH v4 2/2] vfio/mdev: Re-order sysfs attribute creation

2018-05-18 Thread Alex Williamson
There exists a gap at the end of mdev_device_create() where the device
is visible to userspace, but we're not yet ready to handle removal, as
triggered through the 'remove' attribute.  We handle this properly in
mdev_device_remove() with an -EAGAIN return, but we can marginally
reduce this gap by adding this attribute as a final step of our sysfs
setup.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/mdev/mdev_sysfs.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 802df210929b..249472f05509 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -257,24 +257,24 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
mdev_type *type)
 {
int ret;
 
-   ret = sysfs_create_files(>kobj, mdev_device_attrs);
-   if (ret)
-   return ret;
-
ret = sysfs_create_link(type->devices_kobj, >kobj, dev_name(dev));
if (ret)
-   goto device_link_failed;
+   return ret;
 
ret = sysfs_create_link(>kobj, >kobj, "mdev_type");
if (ret)
goto type_link_failed;
 
+   ret = sysfs_create_files(>kobj, mdev_device_attrs);
+   if (ret)
+   goto create_files_failed;
+
return ret;
 
+create_files_failed:
+   sysfs_remove_link(>kobj, "mdev_type");
 type_link_failed:
sysfs_remove_link(type->devices_kobj, dev_name(dev));
-device_link_failed:
-   sysfs_remove_files(>kobj, mdev_device_attrs);
return ret;
 }
 



[PATCH v4 1/2] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Alex Williamson
When we create an mdev device, we check for duplicates against the
parent device and return -EEXIST if found, but the mdev device
namespace is global since we'll link all devices from the bus.  We do
catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
with it comes a kernel warning and stack trace for trying to create
duplicate sysfs links, which makes it an undesirable response.

Therefore we should really be looking for duplicates across all mdev
parent devices, or as implemented here, against our mdev device list.
Using mdev_list to prevent duplicates means that we can remove
mdev_parent.lock, but in order not to serialize mdev device creation
and removal globally, we add mdev_device.active which allows UUIDs to
be reserved such that we can drop the mdev_list_lock before the mdev
device is fully in place.

Two behavioral notes; first, mdev_parent.lock had the side-effect of
serializing mdev create and remove ops per parent device.  This was
an implementation detail, not an intentional guarantee provided to
the mdev vendor drivers.  Vendor drivers can trivially provide this
serialization internally if necessary.  Second, review comments note
the new -EAGAIN behavior when the device, and in particular the remove
attribute, becomes visible in sysfs.  If a remove is triggered prior
to completion of mdev_device_create() the user will see a -EAGAIN
error.  While the errno is different, receiving an error during this
period is not, the previous implementation returned -ENODEV for the
same condition.  Furthermore, the consistency to the user is improved
in the case where mdev_device_remove_ops() returns error.  Previously
concurrent calls to mdev_device_remove() could see the device
disappear with -ENODEV and return in the case of error.  Now a user
would see -EAGAIN while the device is in this transitory state.

Signed-off-by: Alex Williamson 
---
 Documentation/vfio-mediated-device.txt |5 ++
 drivers/vfio/mdev/mdev_core.c  |  102 +++-
 drivers/vfio/mdev/mdev_private.h   |2 -
 3 files changed, 42 insertions(+), 67 deletions(-)

diff --git a/Documentation/vfio-mediated-device.txt 
b/Documentation/vfio-mediated-device.txt
index 1b3950346532..c3f69bcaf96e 100644
--- a/Documentation/vfio-mediated-device.txt
+++ b/Documentation/vfio-mediated-device.txt
@@ -145,6 +145,11 @@ The functions in the mdev_parent_ops structure are as 
follows:
 * create: allocate basic resources in a driver for a mediated device
 * remove: free resources in a driver when a mediated device is destroyed
 
+(Note that mdev-core provides no implicit serialization of create/remove
+callbacks per mdev parent device, per mdev type, or any other categorization.
+Vendor drivers are expected to be fully asynchronous in this respect or
+provide their own internal resource protection.)
+
 The callbacks in the mdev_parent_ops structure are as follows:
 
 * open: open callback of mediated device
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 126991046eb7..0212f0ee8aea 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
 }
 EXPORT_SYMBOL(mdev_uuid);
 
-static int _find_mdev_device(struct device *dev, void *data)
-{
-   struct mdev_device *mdev;
-
-   if (!dev_is_mdev(dev))
-   return 0;
-
-   mdev = to_mdev_device(dev);
-
-   if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
-   return 1;
-
-   return 0;
-}
-
-static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
-{
-   struct device *dev;
-
-   dev = device_find_child(parent->dev, , _find_mdev_device);
-   if (dev) {
-   put_device(dev);
-   return true;
-   }
-
-   return false;
-}
-
 /* Should be called holding parent_list_lock */
 static struct mdev_parent *__find_parent_device(struct device *dev)
 {
@@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
}
 
kref_init(>ref);
-   mutex_init(>lock);
 
parent->dev = dev;
parent->ops = ops;
@@ -297,6 +268,10 @@ static void mdev_device_release(struct device *dev)
 {
struct mdev_device *mdev = to_mdev_device(dev);
 
+   mutex_lock(_list_lock);
+   list_del(>next);
+   mutex_unlock(_list_lock);
+
dev_dbg(>dev, "MDEV: destroying\n");
kfree(mdev);
 }
@@ -304,7 +279,7 @@ static void mdev_device_release(struct device *dev)
 int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid)
 {
int ret;
-   struct mdev_device *mdev;
+   struct mdev_device *mdev, *tmp;
struct mdev_parent *parent;
struct mdev_type *type = to_mdev_type(kobj);
 
@@ -312,21 +287,28 @@ int mdev_device_create(struct kobject *kobj, struct 
device *dev, uuid_le uuid)
if (!parent)
return 

[PATCH v4 2/2] vfio/mdev: Re-order sysfs attribute creation

2018-05-18 Thread Alex Williamson
There exists a gap at the end of mdev_device_create() where the device
is visible to userspace, but we're not yet ready to handle removal, as
triggered through the 'remove' attribute.  We handle this properly in
mdev_device_remove() with an -EAGAIN return, but we can marginally
reduce this gap by adding this attribute as a final step of our sysfs
setup.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/mdev/mdev_sysfs.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 802df210929b..249472f05509 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -257,24 +257,24 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
mdev_type *type)
 {
int ret;
 
-   ret = sysfs_create_files(>kobj, mdev_device_attrs);
-   if (ret)
-   return ret;
-
ret = sysfs_create_link(type->devices_kobj, >kobj, dev_name(dev));
if (ret)
-   goto device_link_failed;
+   return ret;
 
ret = sysfs_create_link(>kobj, >kobj, "mdev_type");
if (ret)
goto type_link_failed;
 
+   ret = sysfs_create_files(>kobj, mdev_device_attrs);
+   if (ret)
+   goto create_files_failed;
+
return ret;
 
+create_files_failed:
+   sysfs_remove_link(>kobj, "mdev_type");
 type_link_failed:
sysfs_remove_link(type->devices_kobj, dev_name(dev));
-device_link_failed:
-   sysfs_remove_files(>kobj, mdev_device_attrs);
return ret;
 }
 



[PATCH v4 1/2] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Alex Williamson
When we create an mdev device, we check for duplicates against the
parent device and return -EEXIST if found, but the mdev device
namespace is global since we'll link all devices from the bus.  We do
catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
with it comes a kernel warning and stack trace for trying to create
duplicate sysfs links, which makes it an undesirable response.

Therefore we should really be looking for duplicates across all mdev
parent devices, or as implemented here, against our mdev device list.
Using mdev_list to prevent duplicates means that we can remove
mdev_parent.lock, but in order not to serialize mdev device creation
and removal globally, we add mdev_device.active which allows UUIDs to
be reserved such that we can drop the mdev_list_lock before the mdev
device is fully in place.

Two behavioral notes; first, mdev_parent.lock had the side-effect of
serializing mdev create and remove ops per parent device.  This was
an implementation detail, not an intentional guarantee provided to
the mdev vendor drivers.  Vendor drivers can trivially provide this
serialization internally if necessary.  Second, review comments note
the new -EAGAIN behavior when the device, and in particular the remove
attribute, becomes visible in sysfs.  If a remove is triggered prior
to completion of mdev_device_create() the user will see a -EAGAIN
error.  While the errno is different, receiving an error during this
period is not, the previous implementation returned -ENODEV for the
same condition.  Furthermore, the consistency to the user is improved
in the case where mdev_device_remove_ops() returns error.  Previously
concurrent calls to mdev_device_remove() could see the device
disappear with -ENODEV and return in the case of error.  Now a user
would see -EAGAIN while the device is in this transitory state.

Signed-off-by: Alex Williamson 
---
 Documentation/vfio-mediated-device.txt |5 ++
 drivers/vfio/mdev/mdev_core.c  |  102 +++-
 drivers/vfio/mdev/mdev_private.h   |2 -
 3 files changed, 42 insertions(+), 67 deletions(-)

diff --git a/Documentation/vfio-mediated-device.txt 
b/Documentation/vfio-mediated-device.txt
index 1b3950346532..c3f69bcaf96e 100644
--- a/Documentation/vfio-mediated-device.txt
+++ b/Documentation/vfio-mediated-device.txt
@@ -145,6 +145,11 @@ The functions in the mdev_parent_ops structure are as 
follows:
 * create: allocate basic resources in a driver for a mediated device
 * remove: free resources in a driver when a mediated device is destroyed
 
+(Note that mdev-core provides no implicit serialization of create/remove
+callbacks per mdev parent device, per mdev type, or any other categorization.
+Vendor drivers are expected to be fully asynchronous in this respect or
+provide their own internal resource protection.)
+
 The callbacks in the mdev_parent_ops structure are as follows:
 
 * open: open callback of mediated device
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 126991046eb7..0212f0ee8aea 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
 }
 EXPORT_SYMBOL(mdev_uuid);
 
-static int _find_mdev_device(struct device *dev, void *data)
-{
-   struct mdev_device *mdev;
-
-   if (!dev_is_mdev(dev))
-   return 0;
-
-   mdev = to_mdev_device(dev);
-
-   if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
-   return 1;
-
-   return 0;
-}
-
-static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
-{
-   struct device *dev;
-
-   dev = device_find_child(parent->dev, , _find_mdev_device);
-   if (dev) {
-   put_device(dev);
-   return true;
-   }
-
-   return false;
-}
-
 /* Should be called holding parent_list_lock */
 static struct mdev_parent *__find_parent_device(struct device *dev)
 {
@@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct 
mdev_parent_ops *ops)
}
 
kref_init(>ref);
-   mutex_init(>lock);
 
parent->dev = dev;
parent->ops = ops;
@@ -297,6 +268,10 @@ static void mdev_device_release(struct device *dev)
 {
struct mdev_device *mdev = to_mdev_device(dev);
 
+   mutex_lock(_list_lock);
+   list_del(>next);
+   mutex_unlock(_list_lock);
+
dev_dbg(>dev, "MDEV: destroying\n");
kfree(mdev);
 }
@@ -304,7 +279,7 @@ static void mdev_device_release(struct device *dev)
 int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid)
 {
int ret;
-   struct mdev_device *mdev;
+   struct mdev_device *mdev, *tmp;
struct mdev_parent *parent;
struct mdev_type *type = to_mdev_type(kobj);
 
@@ -312,21 +287,28 @@ int mdev_device_create(struct kobject *kobj, struct 
device *dev, uuid_le uuid)
if (!parent)
return -EINVAL;
 
-   

Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Linus Torvalds
On Fri, May 18, 2018 at 12:02 PM Nadav Amit  wrote:

> I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.

Oh, if it assembles things together, then that sounds optimal.

And yes, like hpa says, we should make sure that behavior is acknowledged
by the GNU as people, so that they then don't come back and say "hey, now
we assemble things as separate units".

That said, the "separate units" model really doesn't make much sense now
that I think about it, because 'as' supports only one output file. So I
guess the as people wouldn't have any issues with just accepting the
"concatenate all input files" syntax.

  Linus


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Linus Torvalds
On Fri, May 18, 2018 at 12:02 PM Nadav Amit  wrote:

> I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.

Oh, if it assembles things together, then that sounds optimal.

And yes, like hpa says, we should make sure that behavior is acknowledged
by the GNU as people, so that they then don't come back and say "hey, now
we assemble things as separate units".

That said, the "separate units" model really doesn't make much sense now
that I think about it, because 'as' supports only one output file. So I
guess the as people wouldn't have any issues with just accepting the
"concatenate all input files" syntax.

  Linus


Re: [PATCH 10/10] Dynamic fault injection

2018-05-18 Thread Kent Overstreet
On Fri, May 18, 2018 at 01:05:20PM -0600, Andreas Dilger wrote:
> On May 18, 2018, at 1:49 AM, Kent Overstreet  
> wrote:
> > 
> > Signed-off-by: Kent Overstreet 
> 
> I agree with Christoph that even if there was some explanation in the cover
> letter, there should be something at least as good in the patch itself.  The
> cover letter is not saved, but the commit stays around forever, and should
> explain how this should be added to code, and how to use it from userspace.
> 
> 
> That said, I think this is a useful functionality.  We have something similar
> in Lustre (OBD_FAIL_CHECK() and friends) that is necessary for being able to
> test a distributed filesystem, which is just a CPP macro with an unlikely()
> branch, while this looks more sophisticated.  This looks like it has some
> added functionality like having more than one fault enabled at a time.
> If this lands we could likely switch our code over to using this.

This is pretty much what I was looking for, I just wanted to know if this patch
was interesting enough to anyone that I should spend more time on it or just
drop it :) Agreed on documentation. I think it's also worth factoring out the
functionality for the elf section trick that dynamic debug uses too.

> Some things that are missing from this patch that is in our code:
> 
> - in addition to the basic "enabled" and "oneshot" mechanisms, we have:
>   - timeout: sleep for N msec to simulate network/disk/locking delays
>   - race: wait with one thread until a second thread hits matching check
> 
> We also have a "fail_val" that allows making the check conditional (e.g.
> only operation on server "N" should fail, only RPC opcode "N", etc).

Those all sound like good ideas... fail_val especially, I think with that we'd
have all the functionality the existing fault injection framework has (which is
way to heavyweight to actually get used, imo)


Re: [PATCH 10/10] Dynamic fault injection

2018-05-18 Thread Kent Overstreet
On Fri, May 18, 2018 at 01:05:20PM -0600, Andreas Dilger wrote:
> On May 18, 2018, at 1:49 AM, Kent Overstreet  
> wrote:
> > 
> > Signed-off-by: Kent Overstreet 
> 
> I agree with Christoph that even if there was some explanation in the cover
> letter, there should be something at least as good in the patch itself.  The
> cover letter is not saved, but the commit stays around forever, and should
> explain how this should be added to code, and how to use it from userspace.
> 
> 
> That said, I think this is a useful functionality.  We have something similar
> in Lustre (OBD_FAIL_CHECK() and friends) that is necessary for being able to
> test a distributed filesystem, which is just a CPP macro with an unlikely()
> branch, while this looks more sophisticated.  This looks like it has some
> added functionality like having more than one fault enabled at a time.
> If this lands we could likely switch our code over to using this.

This is pretty much what I was looking for, I just wanted to know if this patch
was interesting enough to anyone that I should spend more time on it or just
drop it :) Agreed on documentation. I think it's also worth factoring out the
functionality for the elf section trick that dynamic debug uses too.

> Some things that are missing from this patch that is in our code:
> 
> - in addition to the basic "enabled" and "oneshot" mechanisms, we have:
>   - timeout: sleep for N msec to simulate network/disk/locking delays
>   - race: wait with one thread until a second thread hits matching check
> 
> We also have a "fail_val" that allows making the check conditional (e.g.
> only operation on server "N" should fail, only RPC opcode "N", etc).

Those all sound like good ideas... fail_val especially, I think with that we'd
have all the functionality the existing fault injection framework has (which is
way to heavyweight to actually get used, imo)


Re: [PATCH 4.14 00/45] 4.14.42-stable review

2018-05-18 Thread Naresh Kamboju
On 18 May 2018 at 13:45, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.14.42 release.
> There are 45 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun May 20 08:15:14 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.42-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary


kernel: 4.14.42-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: 879085a4c525940c6c3b08d6a2f49fc064bf02a8
git describe: v4.14.41-46-g879085a4c525
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.41-46-g879085a4c525


No regressions (compared to build v4.14.41-44-g1134b348cdb5)

Boards, architectures and test suites:
-

dragonboard-410c
* boot - pass: 20
* kselftest - skip: 27, pass: 41
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 14
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

hi6220-hikey - arm64
* boot - pass: 20
* kselftest - skip: 23, pass: 44
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 135, pass: 1015
* ltp-timers-tests - pass: 13

juno-r2 - arm64
* boot - pass: 20
* kselftest - skip: 25, pass: 43
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

qemu_arm
* boot - pass: 21
* kselftest - fail: 2, skip: 29, pass: 37
* libhugetlbfs - pass: 2
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 18, pass: 63
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 5, pass: 58
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 7, pass: 7
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - fail: 3, skip: 92, pass: 1055
* ltp-timers-tests - pass: 13

qemu_arm64
* boot - pass: 21
* kselftest - skip: 29, pass: 41
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 6, pass: 8
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - fail: 2, skip: 156, pass: 992
* ltp-timers-tests - pass: 13

qemu_x86_64
* boot - pass: 20
* kselftest - skip: 30, pass: 50
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests 

Re: [PATCH 4.14 00/45] 4.14.42-stable review

2018-05-18 Thread Naresh Kamboju
On 18 May 2018 at 13:45, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.14.42 release.
> There are 45 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun May 20 08:15:14 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.42-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.14.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary


kernel: 4.14.42-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.14.y
git commit: 879085a4c525940c6c3b08d6a2f49fc064bf02a8
git describe: v4.14.41-46-g879085a4c525
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.41-46-g879085a4c525


No regressions (compared to build v4.14.41-44-g1134b348cdb5)

Boards, architectures and test suites:
-

dragonboard-410c
* boot - pass: 20
* kselftest - skip: 27, pass: 41
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 14
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

hi6220-hikey - arm64
* boot - pass: 20
* kselftest - skip: 23, pass: 44
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 135, pass: 1015
* ltp-timers-tests - pass: 13

juno-r2 - arm64
* boot - pass: 20
* kselftest - skip: 25, pass: 43
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

qemu_arm
* boot - pass: 21
* kselftest - fail: 2, skip: 29, pass: 37
* libhugetlbfs - pass: 2
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 18, pass: 63
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 5, pass: 58
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 7, pass: 7
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - fail: 3, skip: 92, pass: 1055
* ltp-timers-tests - pass: 13

qemu_arm64
* boot - pass: 21
* kselftest - skip: 29, pass: 41
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 6, pass: 8
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - fail: 2, skip: 156, pass: 992
* ltp-timers-tests - pass: 13

qemu_x86_64
* boot - pass: 20
* kselftest - skip: 30, pass: 50
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* 

[PATCH v4 0/2] vfio/mdev: Device namespace protection

2018-05-18 Thread Alex Williamson
v4: Fix the 'create' racing 'remove' gap noted by Kirti by moving
removal from mdev_list to mdev_device_release().  Fix missing
mdev_put_parent() cases in mdev_device_create(), also noted
by Kirti.  Added documention update regarding serialization as
noted by Cornelia.  Added additional commit log comment about
-EAGAIN vs -ENODEV for 'remove' racing 'create'.  Added second
patch to re-order sysfs attributes, with this my targeted
scripts can no longer hit the gap where -EAGAIN is regurned.
BTW, the gap where the current code returns -ENODEV in this
race condition is about 50% easier to hit than it exists in
this series with patch 1 alone.

Thanks,
Alex

---

Alex Williamson (2):
  vfio/mdev: Check globally for duplicate devices
  vfio/mdev: Re-order sysfs attribute creation


 Documentation/vfio-mediated-device.txt |5 ++
 drivers/vfio/mdev/mdev_core.c  |  102 +++-
 drivers/vfio/mdev/mdev_private.h   |2 -
 drivers/vfio/mdev/mdev_sysfs.c |   14 ++--
 4 files changed, 49 insertions(+), 74 deletions(-)


[PATCH v4 0/2] vfio/mdev: Device namespace protection

2018-05-18 Thread Alex Williamson
v4: Fix the 'create' racing 'remove' gap noted by Kirti by moving
removal from mdev_list to mdev_device_release().  Fix missing
mdev_put_parent() cases in mdev_device_create(), also noted
by Kirti.  Added documention update regarding serialization as
noted by Cornelia.  Added additional commit log comment about
-EAGAIN vs -ENODEV for 'remove' racing 'create'.  Added second
patch to re-order sysfs attributes, with this my targeted
scripts can no longer hit the gap where -EAGAIN is regurned.
BTW, the gap where the current code returns -ENODEV in this
race condition is about 50% easier to hit than it exists in
this series with patch 1 alone.

Thanks,
Alex

---

Alex Williamson (2):
  vfio/mdev: Check globally for duplicate devices
  vfio/mdev: Re-order sysfs attribute creation


 Documentation/vfio-mediated-device.txt |5 ++
 drivers/vfio/mdev/mdev_core.c  |  102 +++-
 drivers/vfio/mdev/mdev_private.h   |2 -
 drivers/vfio/mdev/mdev_sysfs.c |   14 ++--
 4 files changed, 49 insertions(+), 74 deletions(-)


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread hpa
On May 18, 2018 12:02:50 PM PDT, Nadav Amit  wrote:
>h...@zytor.com wrote:
>
>> On May 18, 2018 11:50:12 AM PDT, Linus Torvalds
> wrote:
>>> On Fri, May 18, 2018 at 11:34 AM  wrote:
>>> 
 On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>>> torva...@linux-foundation.org> wrote:
>>> 
 Unfortunately gcc doesn't guarantee that global assembly inlines
>will
>>> appear at the top of the file.
>>> 
>>> Yeah. It really would be better to do the "asm version of -inline".
>>> 
>>> We already do something like that for real *.S files on some
>>> architectures
>>> (because their assembly really wants it, eg
>>> 
>>> arch/arm/Makefile:
>>> KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>>> -include
>>> asm/unified.h -msoft-float
>>> 
>>> but I do want to point out that KBUILD_AFLAGS is *not* used for
>>> compiler-generated assembly, only for actual *.S files.
>>> 
>>> Sadly, I don't actually know any way to make gcc call the 'as' phase
>>> with
>>> particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>>> assembler, but there is no "-include" option for GNU as afaik.
>>> 
>>> Can you perhaps define a macro symbol for "--defsym"? Probably not.
>>> 
>>>  Linus
>> 
>> I looked at this thing a long time ago; it's not there, and the best
>would probably be to get global asm() working properly in gcc.
>
>I can add a -Wa,[filename.s] switch. It works, but sort of
>undocumented.

Ah, I thought that would have assembled each file separately. We can request it 
be documented, that's usually much easier than getting a new feature 
implemented.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 10/10] Dynamic fault injection

2018-05-18 Thread Andreas Dilger
On May 18, 2018, at 1:49 AM, Kent Overstreet  wrote:
> 
> Signed-off-by: Kent Overstreet 

I agree with Christoph that even if there was some explanation in the cover
letter, there should be something at least as good in the patch itself.  The
cover letter is not saved, but the commit stays around forever, and should
explain how this should be added to code, and how to use it from userspace.


That said, I think this is a useful functionality.  We have something similar
in Lustre (OBD_FAIL_CHECK() and friends) that is necessary for being able to
test a distributed filesystem, which is just a CPP macro with an unlikely()
branch, while this looks more sophisticated.  This looks like it has some
added functionality like having more than one fault enabled at a time.
If this lands we could likely switch our code over to using this.


Some things that are missing from this patch that is in our code:

- in addition to the basic "enabled" and "oneshot" mechanisms, we have:
  - timeout: sleep for N msec to simulate network/disk/locking delays
  - race: wait with one thread until a second thread hits matching check

We also have a "fail_val" that allows making the check conditional (e.g.
only operation on server "N" should fail, only RPC opcode "N", etc).

Cheers, Andreas

> ---
> include/asm-generic/vmlinux.lds.h |   4 +
> include/linux/dynamic_fault.h | 117 +
> lib/Kconfig.debug |   5 +
> lib/Makefile  |   2 +
> lib/dynamic_fault.c   | 760 ++
> 5 files changed, 888 insertions(+)
> create mode 100644 include/linux/dynamic_fault.h
> create mode 100644 lib/dynamic_fault.c
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 1ab0e520d6..a4c9dfcbbd 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -246,6 +246,10 @@
>   VMLINUX_SYMBOL(__start___verbose) = .;  \
>   KEEP(*(__verbose))  \
>   VMLINUX_SYMBOL(__stop___verbose) = .;   \
> + . = ALIGN(8);   \
> + VMLINUX_SYMBOL(__start___faults) = .;   \
> + *(__faults) \
> + VMLINUX_SYMBOL(__stop___faults) = .;\
>   LIKELY_PROFILE()\
>   BRANCH_PROFILE()\
>   TRACE_PRINTKS() \
> diff --git a/include/linux/dynamic_fault.h b/include/linux/dynamic_fault.h
> new file mode 100644
> index 00..6e7bb56ae8
> --- /dev/null
> +++ b/include/linux/dynamic_fault.h
> @@ -0,0 +1,117 @@
> +#ifndef _DYNAMIC_FAULT_H
> +#define _DYNAMIC_FAULT_H
> +
> +#include 
> +#include 
> +#include 
> +
> +enum dfault_enabled {
> + DFAULT_DISABLED,
> + DFAULT_ENABLED,
> + DFAULT_ONESHOT,
> +};
> +
> +union dfault_state {
> + struct {
> + unsignedenabled:2;
> + unsignedcount:30;
> + };
> +
> + struct {
> + unsignedv;
> + };
> +};
> +
> +/*
> + * An instance of this structure is created in a special
> + * ELF section at every dynamic fault callsite.  At runtime,
> + * the special section is treated as an array of these.
> + */
> +struct _dfault {
> + const char  *modname;
> + const char  *function;
> + const char  *filename;
> + const char  *class;
> +
> + const u16   line;
> +
> + unsignedfrequency;
> + union dfault_state  state;
> +
> + struct static_key   enabled;
> +} __aligned(8);
> +
> +
> +#ifdef CONFIG_DYNAMIC_FAULT
> +
> +int dfault_add_module(struct _dfault *tab, unsigned int n, const char *mod);
> +int dfault_remove_module(char *mod_name);
> +bool __dynamic_fault_enabled(struct _dfault *);
> +
> +#define dynamic_fault(_class)
> \
> +({   \
> + static struct _dfault descriptor\
> + __used __aligned(8) __attribute__((section("__faults"))) = {\
> + .modname= KBUILD_MODNAME,   \
> + .function   = __func__, \
> + .filename   = __FILE__, \
> + .line   = __LINE__, \
> + .class  = _class,   \
> + };  \
> + \
> + 

Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread hpa
On May 18, 2018 12:02:50 PM PDT, Nadav Amit  wrote:
>h...@zytor.com wrote:
>
>> On May 18, 2018 11:50:12 AM PDT, Linus Torvalds
> wrote:
>>> On Fri, May 18, 2018 at 11:34 AM  wrote:
>>> 
 On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>>> torva...@linux-foundation.org> wrote:
>>> 
 Unfortunately gcc doesn't guarantee that global assembly inlines
>will
>>> appear at the top of the file.
>>> 
>>> Yeah. It really would be better to do the "asm version of -inline".
>>> 
>>> We already do something like that for real *.S files on some
>>> architectures
>>> (because their assembly really wants it, eg
>>> 
>>> arch/arm/Makefile:
>>> KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>>> -include
>>> asm/unified.h -msoft-float
>>> 
>>> but I do want to point out that KBUILD_AFLAGS is *not* used for
>>> compiler-generated assembly, only for actual *.S files.
>>> 
>>> Sadly, I don't actually know any way to make gcc call the 'as' phase
>>> with
>>> particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>>> assembler, but there is no "-include" option for GNU as afaik.
>>> 
>>> Can you perhaps define a macro symbol for "--defsym"? Probably not.
>>> 
>>>  Linus
>> 
>> I looked at this thing a long time ago; it's not there, and the best
>would probably be to get global asm() working properly in gcc.
>
>I can add a -Wa,[filename.s] switch. It works, but sort of
>undocumented.

Ah, I thought that would have assembled each file separately. We can request it 
be documented, that's usually much easier than getting a new feature 
implemented.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 10/10] Dynamic fault injection

2018-05-18 Thread Andreas Dilger
On May 18, 2018, at 1:49 AM, Kent Overstreet  wrote:
> 
> Signed-off-by: Kent Overstreet 

I agree with Christoph that even if there was some explanation in the cover
letter, there should be something at least as good in the patch itself.  The
cover letter is not saved, but the commit stays around forever, and should
explain how this should be added to code, and how to use it from userspace.


That said, I think this is a useful functionality.  We have something similar
in Lustre (OBD_FAIL_CHECK() and friends) that is necessary for being able to
test a distributed filesystem, which is just a CPP macro with an unlikely()
branch, while this looks more sophisticated.  This looks like it has some
added functionality like having more than one fault enabled at a time.
If this lands we could likely switch our code over to using this.


Some things that are missing from this patch that is in our code:

- in addition to the basic "enabled" and "oneshot" mechanisms, we have:
  - timeout: sleep for N msec to simulate network/disk/locking delays
  - race: wait with one thread until a second thread hits matching check

We also have a "fail_val" that allows making the check conditional (e.g.
only operation on server "N" should fail, only RPC opcode "N", etc).

Cheers, Andreas

> ---
> include/asm-generic/vmlinux.lds.h |   4 +
> include/linux/dynamic_fault.h | 117 +
> lib/Kconfig.debug |   5 +
> lib/Makefile  |   2 +
> lib/dynamic_fault.c   | 760 ++
> 5 files changed, 888 insertions(+)
> create mode 100644 include/linux/dynamic_fault.h
> create mode 100644 lib/dynamic_fault.c
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index 1ab0e520d6..a4c9dfcbbd 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -246,6 +246,10 @@
>   VMLINUX_SYMBOL(__start___verbose) = .;  \
>   KEEP(*(__verbose))  \
>   VMLINUX_SYMBOL(__stop___verbose) = .;   \
> + . = ALIGN(8);   \
> + VMLINUX_SYMBOL(__start___faults) = .;   \
> + *(__faults) \
> + VMLINUX_SYMBOL(__stop___faults) = .;\
>   LIKELY_PROFILE()\
>   BRANCH_PROFILE()\
>   TRACE_PRINTKS() \
> diff --git a/include/linux/dynamic_fault.h b/include/linux/dynamic_fault.h
> new file mode 100644
> index 00..6e7bb56ae8
> --- /dev/null
> +++ b/include/linux/dynamic_fault.h
> @@ -0,0 +1,117 @@
> +#ifndef _DYNAMIC_FAULT_H
> +#define _DYNAMIC_FAULT_H
> +
> +#include 
> +#include 
> +#include 
> +
> +enum dfault_enabled {
> + DFAULT_DISABLED,
> + DFAULT_ENABLED,
> + DFAULT_ONESHOT,
> +};
> +
> +union dfault_state {
> + struct {
> + unsignedenabled:2;
> + unsignedcount:30;
> + };
> +
> + struct {
> + unsignedv;
> + };
> +};
> +
> +/*
> + * An instance of this structure is created in a special
> + * ELF section at every dynamic fault callsite.  At runtime,
> + * the special section is treated as an array of these.
> + */
> +struct _dfault {
> + const char  *modname;
> + const char  *function;
> + const char  *filename;
> + const char  *class;
> +
> + const u16   line;
> +
> + unsignedfrequency;
> + union dfault_state  state;
> +
> + struct static_key   enabled;
> +} __aligned(8);
> +
> +
> +#ifdef CONFIG_DYNAMIC_FAULT
> +
> +int dfault_add_module(struct _dfault *tab, unsigned int n, const char *mod);
> +int dfault_remove_module(char *mod_name);
> +bool __dynamic_fault_enabled(struct _dfault *);
> +
> +#define dynamic_fault(_class)
> \
> +({   \
> + static struct _dfault descriptor\
> + __used __aligned(8) __attribute__((section("__faults"))) = {\
> + .modname= KBUILD_MODNAME,   \
> + .function   = __func__, \
> + .filename   = __FILE__, \
> + .line   = __LINE__, \
> + .class  = _class,   \
> + };  \
> + \
> + static_key_false() &&\
> + 

Re: [PATCH] kernel: sys: fix potential Spectre v1

2018-05-18 Thread Gustavo A. R. Silva



On 05/15/2018 05:57 PM, Dan Williams wrote:

On Tue, May 15, 2018 at 3:29 PM, Thomas Gleixner  wrote:

On Tue, 15 May 2018, Andrew Morton wrote:

On Mon, 14 May 2018 22:00:38 -0500 "Gustavo A. R. Silva" 
 wrote:


resource can be controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential
spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue
'get_current()->signal->rlim' (local cap)

Fix this by sanitizing *resource* before using it to index
current->signal->rlim

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].


hm.  Not my area, but I'm always willing to learn ;)


--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -69,6 +69,9 @@
  #include 
  #include 

+/* Hardening for Spectre-v1 */
+#include 
+
  #include "uid16.h"

  #ifndef SET_UNALIGN_CTL
@@ -1451,6 +1454,7 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource,
 if (resource >= RLIM_NLIMITS)
 return -EINVAL;

+   resource = array_index_nospec(resource, RLIM_NLIMITS);
 task_lock(current->group_leader);
 x = current->signal->rlim[resource];


Can the speculation proceed past the task_lock()?  Or is the policy to
ignore such happy happenstances even if they are available?


Locks are not in the way of speculation. Speculation has almost no limits
except serializing instructions. At least they respect the magic AND
limitation in array_index_nospec().


I'd say it another way, because they don't respect the magic AND, we
just make the result in the speculation path safe. So, it's controlled
speculation.



Dan,

What do you think about adding the following function to the nospec API:

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index e791ebc..81e9a77 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -55,4 +55,17 @@ static inline unsigned long 
array_index_mask_nospec(unsigned long index,

\
(typeof(_i)) (_i & _mask);  \
 })
+
+
+#ifndef sanitize_index_nospec
+inline bool sanitize_index_nospec(unsigned long index,
+ unsigned long size)
+{
+   if (index >= size)
+   return false;
+   index = array_index_nospec(index, size);
+
+   return true;
+}
+#endif
 #endif /* _LINUX_NOSPEC_H */


And here is an example of its use:

diff --git a/drivers/media/i2c/tvp7002.c b/drivers/media/i2c/tvp7002.c
index 4599b7e..27b39c0 100644
--- a/drivers/media/i2c/tvp7002.c
+++ b/drivers/media/i2c/tvp7002.c
@@ -35,6 +35,8 @@
 #include 
 #include 

+#include 
+
 #include "tvp7002_reg.h"

 MODULE_DESCRIPTION("TI TVP7002 Video and Graphics Digitizer driver");
@@ -784,7 +786,7 @@ static int tvp7002_enum_dv_timings(struct 
v4l2_subdev *sd,

return -EINVAL;

/* Check requested format index is within range */
-   if (timings->index >= NUM_TIMINGS)
+   if (!sanitize_index_nospec(timings->index, NUM_TIMINGS))
return -EINVAL;

timings->timings = tvp7002_timings[timings->index].timings;

This patter is very common. So, it may be a good idea to unify both 
bounds checking and the serialization of instructions into a single 
function.


What do you think?

Thanks
--
Gustavo



Re: [PATCH] kernel: sys: fix potential Spectre v1

2018-05-18 Thread Gustavo A. R. Silva



On 05/15/2018 05:57 PM, Dan Williams wrote:

On Tue, May 15, 2018 at 3:29 PM, Thomas Gleixner  wrote:

On Tue, 15 May 2018, Andrew Morton wrote:

On Mon, 14 May 2018 22:00:38 -0500 "Gustavo A. R. Silva" 
 wrote:


resource can be controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential
spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue
'get_current()->signal->rlim' (local cap)

Fix this by sanitizing *resource* before using it to index
current->signal->rlim

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].


hm.  Not my area, but I'm always willing to learn ;)


--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -69,6 +69,9 @@
  #include 
  #include 

+/* Hardening for Spectre-v1 */
+#include 
+
  #include "uid16.h"

  #ifndef SET_UNALIGN_CTL
@@ -1451,6 +1454,7 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned int, resource,
 if (resource >= RLIM_NLIMITS)
 return -EINVAL;

+   resource = array_index_nospec(resource, RLIM_NLIMITS);
 task_lock(current->group_leader);
 x = current->signal->rlim[resource];


Can the speculation proceed past the task_lock()?  Or is the policy to
ignore such happy happenstances even if they are available?


Locks are not in the way of speculation. Speculation has almost no limits
except serializing instructions. At least they respect the magic AND
limitation in array_index_nospec().


I'd say it another way, because they don't respect the magic AND, we
just make the result in the speculation path safe. So, it's controlled
speculation.



Dan,

What do you think about adding the following function to the nospec API:

diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index e791ebc..81e9a77 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -55,4 +55,17 @@ static inline unsigned long 
array_index_mask_nospec(unsigned long index,

\
(typeof(_i)) (_i & _mask);  \
 })
+
+
+#ifndef sanitize_index_nospec
+inline bool sanitize_index_nospec(unsigned long index,
+ unsigned long size)
+{
+   if (index >= size)
+   return false;
+   index = array_index_nospec(index, size);
+
+   return true;
+}
+#endif
 #endif /* _LINUX_NOSPEC_H */


And here is an example of its use:

diff --git a/drivers/media/i2c/tvp7002.c b/drivers/media/i2c/tvp7002.c
index 4599b7e..27b39c0 100644
--- a/drivers/media/i2c/tvp7002.c
+++ b/drivers/media/i2c/tvp7002.c
@@ -35,6 +35,8 @@
 #include 
 #include 

+#include 
+
 #include "tvp7002_reg.h"

 MODULE_DESCRIPTION("TI TVP7002 Video and Graphics Digitizer driver");
@@ -784,7 +786,7 @@ static int tvp7002_enum_dv_timings(struct 
v4l2_subdev *sd,

return -EINVAL;

/* Check requested format index is within range */
-   if (timings->index >= NUM_TIMINGS)
+   if (!sanitize_index_nospec(timings->index, NUM_TIMINGS))
return -EINVAL;

timings->timings = tvp7002_timings[timings->index].timings;

This patter is very common. So, it may be a good idea to unify both 
bounds checking and the serialization of instructions into a single 
function.


What do you think?

Thanks
--
Gustavo



Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/18/2018 11:00 PM, Alex Williamson wrote:
> On Fri, 18 May 2018 12:34:03 +0530
> Kirti Wankhede  wrote:
> 
>> On 5/18/2018 3:07 AM, Alex Williamson wrote:
>>> On Fri, 18 May 2018 01:56:50 +0530
>>> Kirti Wankhede  wrote:
>>>   
 On 5/17/2018 9:50 PM, Alex Williamson wrote:  
> On Thu, 17 May 2018 21:25:22 +0530
> Kirti Wankhede  wrote:
> 
>> On 5/17/2018 1:39 PM, Cornelia Huck wrote:
>>> On Wed, 16 May 2018 21:30:19 -0600
>>> Alex Williamson  wrote:
>>>   
 When we create an mdev device, we check for duplicates against the
 parent device and return -EEXIST if found, but the mdev device
 namespace is global since we'll link all devices from the bus.  We do
 catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
 with it comes a kernel warning and stack trace for trying to create
 duplicate sysfs links, which makes it an undesirable response.

 Therefore we should really be looking for duplicates across all mdev
 parent devices, or as implemented here, against our mdev device list.
 Using mdev_list to prevent duplicates means that we can remove
 mdev_parent.lock, but in order not to serialize mdev device creation
 and removal globally, we add mdev_device.active which allows UUIDs to
 be reserved such that we can drop the mdev_list_lock before the mdev
 device is fully in place.

 NB. there was never intended to be any serialization guarantee
 provided by the mdev core with respect to creation and removal of mdev
 devices, mdev_parent.lock provided this only as a side-effect of the
 implementation for locking the namespace per parent.  That
 serialization is now removed.  
>>>   
>>
>> mdev_parent.lock is to serialize create and remove of that mdev device,
>> that handles race condition that Cornelia mentioned below.
>
> Previously it was stated:
>
> On Thu, 17 May 2018 01:01:40 +0530
> Kirti Wankhede  wrote:
>> Here lock is not for create/remove routines of vendor driver, its about
>> mdev device creation and device registration, which is a common code
>> path, and so is part of mdev core module.
>
> So the race condition was handled previously, but as a side-effect of
> protecting the namespace, aiui.  I'm trying to state above that the
> serialization of create/remove was never intended as a guarantee
> provided to mdev vendor drivers.  I don't see that there's a need to
> protect "mdev device creation and device registration" beyond conflicts
> in the UUID namespace, which is done here.  Are there others?
> 

 Sorry not being elaborative in my earlier response to
  
> If we can
> show that vendor drivers handle the create/remove paths themselves,
> perhaps we can refine the locking granularity.

 mdev_device_create() function does :
 - create mdev device
 - register device
 - call vendor driver->create
 - create sysfs files.

 mdev_device_remove() removes sysfs files, unregister device and delete
 device.

 There is common code in mdev_device_create() and mdev_device_remove()
 independent of what vendor driver does in its create()/remove()
 callback. Moving this code to each vendor driver to handle create/remove
 themselves doesn't make sense to me.  
>>>
>>> I don't see where anyone is suggesting that, I'm not.
>>>
 mdev_parent.lock here does take care of race conditions that could occur
 during mdev device creation and deletion in this common code path.  
>>>
>>> Exactly what races in the common code path is mdev_parent.lock
>>> preventing?  mdev_device_create() calls:
>>>
>>> device_register()
>>> mdev_device_create_ops()
>>>   parent->ops->create()
>>>   sysfs_create_groups()
>>> mdev_create_sysfs_files()
>>>   sysfs_create_files()
>>>   sysfs_create_link()
>>>   sysfs_create_link()
>>>
>>> mdev_parent.lock is certainly not serializing all calls across the
>>> entire kernel to device_register and sysfs_create_{groups,files,link}
>>> so what is it protecting other than serializing parent->ops->create()?
>>> Locks protect data, not code.  The data we're protecting is the shared
>>> mdev_list, there is no shared data once mdev_device_create() has its
>>> mdev_device with uuid reservation placed into that list.
>>>   
> 
> Thank you for enumerating these points below.
> 
>> This lock prevents race condition that could occur due to sysfs entries
>> 1. between write on 'create' and 'remove' sysfs file of mdev device
>>   As per current code without lock, mdev_create_sysfs_files() creates
>> 'remove' sysfs, but before adding this mdev device to mdev_list, if
>> 'remove' is 

Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/18/2018 11:00 PM, Alex Williamson wrote:
> On Fri, 18 May 2018 12:34:03 +0530
> Kirti Wankhede  wrote:
> 
>> On 5/18/2018 3:07 AM, Alex Williamson wrote:
>>> On Fri, 18 May 2018 01:56:50 +0530
>>> Kirti Wankhede  wrote:
>>>   
 On 5/17/2018 9:50 PM, Alex Williamson wrote:  
> On Thu, 17 May 2018 21:25:22 +0530
> Kirti Wankhede  wrote:
> 
>> On 5/17/2018 1:39 PM, Cornelia Huck wrote:
>>> On Wed, 16 May 2018 21:30:19 -0600
>>> Alex Williamson  wrote:
>>>   
 When we create an mdev device, we check for duplicates against the
 parent device and return -EEXIST if found, but the mdev device
 namespace is global since we'll link all devices from the bus.  We do
 catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
 with it comes a kernel warning and stack trace for trying to create
 duplicate sysfs links, which makes it an undesirable response.

 Therefore we should really be looking for duplicates across all mdev
 parent devices, or as implemented here, against our mdev device list.
 Using mdev_list to prevent duplicates means that we can remove
 mdev_parent.lock, but in order not to serialize mdev device creation
 and removal globally, we add mdev_device.active which allows UUIDs to
 be reserved such that we can drop the mdev_list_lock before the mdev
 device is fully in place.

 NB. there was never intended to be any serialization guarantee
 provided by the mdev core with respect to creation and removal of mdev
 devices, mdev_parent.lock provided this only as a side-effect of the
 implementation for locking the namespace per parent.  That
 serialization is now removed.  
>>>   
>>
>> mdev_parent.lock is to serialize create and remove of that mdev device,
>> that handles race condition that Cornelia mentioned below.
>
> Previously it was stated:
>
> On Thu, 17 May 2018 01:01:40 +0530
> Kirti Wankhede  wrote:
>> Here lock is not for create/remove routines of vendor driver, its about
>> mdev device creation and device registration, which is a common code
>> path, and so is part of mdev core module.
>
> So the race condition was handled previously, but as a side-effect of
> protecting the namespace, aiui.  I'm trying to state above that the
> serialization of create/remove was never intended as a guarantee
> provided to mdev vendor drivers.  I don't see that there's a need to
> protect "mdev device creation and device registration" beyond conflicts
> in the UUID namespace, which is done here.  Are there others?
> 

 Sorry not being elaborative in my earlier response to
  
> If we can
> show that vendor drivers handle the create/remove paths themselves,
> perhaps we can refine the locking granularity.

 mdev_device_create() function does :
 - create mdev device
 - register device
 - call vendor driver->create
 - create sysfs files.

 mdev_device_remove() removes sysfs files, unregister device and delete
 device.

 There is common code in mdev_device_create() and mdev_device_remove()
 independent of what vendor driver does in its create()/remove()
 callback. Moving this code to each vendor driver to handle create/remove
 themselves doesn't make sense to me.  
>>>
>>> I don't see where anyone is suggesting that, I'm not.
>>>
 mdev_parent.lock here does take care of race conditions that could occur
 during mdev device creation and deletion in this common code path.  
>>>
>>> Exactly what races in the common code path is mdev_parent.lock
>>> preventing?  mdev_device_create() calls:
>>>
>>> device_register()
>>> mdev_device_create_ops()
>>>   parent->ops->create()
>>>   sysfs_create_groups()
>>> mdev_create_sysfs_files()
>>>   sysfs_create_files()
>>>   sysfs_create_link()
>>>   sysfs_create_link()
>>>
>>> mdev_parent.lock is certainly not serializing all calls across the
>>> entire kernel to device_register and sysfs_create_{groups,files,link}
>>> so what is it protecting other than serializing parent->ops->create()?
>>> Locks protect data, not code.  The data we're protecting is the shared
>>> mdev_list, there is no shared data once mdev_device_create() has its
>>> mdev_device with uuid reservation placed into that list.
>>>   
> 
> Thank you for enumerating these points below.
> 
>> This lock prevents race condition that could occur due to sysfs entries
>> 1. between write on 'create' and 'remove' sysfs file of mdev device
>>   As per current code without lock, mdev_create_sysfs_files() creates
>> 'remove' sysfs, but before adding this mdev device to mdev_list, if
>> 'remove' is called, that would return -ENODEV even if the device is seen
>> in sysfs
> 
> mdev_parent.lock doesn't play a factor 

Re: [PATCH 4.9 00/33] 4.9.101-stable review

2018-05-18 Thread Naresh Kamboju
On 18 May 2018 at 13:45, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.9.101 release.
> There are 33 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun May 20 08:15:20 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.101-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary


kernel: 4.9.101-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: b0935d42e12ff5b9efbdded322b1bb7187705bd3
git describe: v4.9.100-35-gb0935d42e12f
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.100-35-gb0935d42e12f


No regressions (compared to build v4.9.100-34-g16c619d3e043)

Boards, architectures and test suites:
-

dragonboard-410c
* boot - pass: 20,
* kselftest - pass: 34, skip: 32
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1015, skip: 135
* ltp-timers-tests - pass: 13,

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 35, skip: 28
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10, skip: 4
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1014, skip: 136
* ltp-timers-tests - pass: 13,

juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 37, skip: 29
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10, skip: 4
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1015, skip: 135
* ltp-timers-tests - pass: 13,

qemu_arm
* boot - pass: 20,
* kselftest - pass: 34, skip: 32
* libhugetlbfs - pass: 1,
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 63, skip: 18
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 58, skip: 5
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 7, skip: 7
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1056, fail: 2, skip: 92
* ltp-timers-tests - pass: 13,

qemu_arm64
* boot - pass: 20, fail: 1,
* kselftest - pass: 35, skip: 33
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 8, skip: 6
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 991, fail: 2, skip: 157
* ltp-timers-tests - pass: 13,

qemu_x86_64
* boot - pass: 20,
* kselftest - pass: 

Re: [PATCH 4.9 00/33] 4.9.101-stable review

2018-05-18 Thread Naresh Kamboju
On 18 May 2018 at 13:45, Greg Kroah-Hartman  wrote:
> This is the start of the stable review cycle for the 4.9.101 release.
> There are 33 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Sun May 20 08:15:20 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.101-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.9.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

Summary


kernel: 4.9.101-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.9.y
git commit: b0935d42e12ff5b9efbdded322b1bb7187705bd3
git describe: v4.9.100-35-gb0935d42e12f
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/build/v4.9.100-35-gb0935d42e12f


No regressions (compared to build v4.9.100-34-g16c619d3e043)

Boards, architectures and test suites:
-

dragonboard-410c
* boot - pass: 20,
* kselftest - pass: 34, skip: 32
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 14,
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1015, skip: 135
* ltp-timers-tests - pass: 13,

hi6220-hikey - arm64
* boot - pass: 20,
* kselftest - pass: 35, skip: 28
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10, skip: 4
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1014, skip: 136
* ltp-timers-tests - pass: 13,

juno-r2 - arm64
* boot - pass: 20,
* kselftest - pass: 37, skip: 29
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 10, skip: 4
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1015, skip: 135
* ltp-timers-tests - pass: 13,

qemu_arm
* boot - pass: 20,
* kselftest - pass: 34, skip: 32
* libhugetlbfs - pass: 1,
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 63, skip: 18
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 58, skip: 5
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 21, skip: 1
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 7, skip: 7
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 1056, fail: 2, skip: 92
* ltp-timers-tests - pass: 13,

qemu_arm64
* boot - pass: 20, fail: 1,
* kselftest - pass: 35, skip: 33
* libhugetlbfs - pass: 90, skip: 1
* ltp-cap_bounds-tests - pass: 2,
* ltp-containers-tests - pass: 64, skip: 17
* ltp-fcntl-locktests-tests - pass: 2,
* ltp-filecaps-tests - pass: 2,
* ltp-fs-tests - pass: 57, skip: 6
* ltp-fs_bind-tests - pass: 2,
* ltp-fs_perms_simple-tests - pass: 19,
* ltp-fsx-tests - pass: 2,
* ltp-hugetlb-tests - pass: 22,
* ltp-io-tests - pass: 3,
* ltp-ipc-tests - pass: 9,
* ltp-math-tests - pass: 11,
* ltp-nptl-tests - pass: 2,
* ltp-pty-tests - pass: 4,
* ltp-sched-tests - pass: 8, skip: 6
* ltp-securebits-tests - pass: 4,
* ltp-syscalls-tests - pass: 991, fail: 2, skip: 157
* ltp-timers-tests - pass: 13,

qemu_x86_64
* boot - pass: 20,
* kselftest - pass: 47, skip: 33
* libhugetlbfs 

Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Nadav Amit
h...@zytor.com wrote:

> On May 18, 2018 11:50:12 AM PDT, Linus Torvalds 
>  wrote:
>> On Fri, May 18, 2018 at 11:34 AM  wrote:
>> 
>>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>> torva...@linux-foundation.org> wrote:
>> 
>>> Unfortunately gcc doesn't guarantee that global assembly inlines will
>> appear at the top of the file.
>> 
>> Yeah. It really would be better to do the "asm version of -inline".
>> 
>> We already do something like that for real *.S files on some
>> architectures
>> (because their assembly really wants it, eg
>> 
>> arch/arm/Makefile:
>> KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>> -include
>> asm/unified.h -msoft-float
>> 
>> but I do want to point out that KBUILD_AFLAGS is *not* used for
>> compiler-generated assembly, only for actual *.S files.
>> 
>> Sadly, I don't actually know any way to make gcc call the 'as' phase
>> with
>> particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>> assembler, but there is no "-include" option for GNU as afaik.
>> 
>> Can you perhaps define a macro symbol for "--defsym"? Probably not.
>> 
>>  Linus
> 
> I looked at this thing a long time ago; it's not there, and the best would 
> probably be to get global asm() working properly in gcc.

I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.



Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Nadav Amit
h...@zytor.com wrote:

> On May 18, 2018 11:50:12 AM PDT, Linus Torvalds 
>  wrote:
>> On Fri, May 18, 2018 at 11:34 AM  wrote:
>> 
>>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>> torva...@linux-foundation.org> wrote:
>> 
>>> Unfortunately gcc doesn't guarantee that global assembly inlines will
>> appear at the top of the file.
>> 
>> Yeah. It really would be better to do the "asm version of -inline".
>> 
>> We already do something like that for real *.S files on some
>> architectures
>> (because their assembly really wants it, eg
>> 
>> arch/arm/Makefile:
>> KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>> -include
>> asm/unified.h -msoft-float
>> 
>> but I do want to point out that KBUILD_AFLAGS is *not* used for
>> compiler-generated assembly, only for actual *.S files.
>> 
>> Sadly, I don't actually know any way to make gcc call the 'as' phase
>> with
>> particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>> assembler, but there is no "-include" option for GNU as afaik.
>> 
>> Can you perhaps define a macro symbol for "--defsym"? Probably not.
>> 
>>  Linus
> 
> I looked at this thing a long time ago; it's not there, and the best would 
> probably be to get global asm() working properly in gcc.

I can add a -Wa,[filename.s] switch. It works, but sort of undocumented.



Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"

2018-05-18 Thread Frank Mori Hess
On Thu, May 17, 2018 at 1:22 PM, Lars-Peter Clausen  wrote:
> On 05/17/2018 06:20 PM, Frank Mori Hess wrote:
> The problem is not so much on the software side but even more so on the
> hardware side. Not all hardware even supports aborting a transfer with no
> data loss because there is no precise measurement of how much data has been
> transferred. The residue that is reported is always the lower bound, this
> much has been transferred but it might actually have been more.

I'd just like to point out, if the pl330 driver actually did report a
upper bound on the residue (lower bound on bytes transferred) that
would also blow up Marek's samsung serial driver use case.  Instead of
being in a situation where data loss might occur rarely due to a race
condition, they would be in a situation where data loss occurs every
time they stop a transfer.  Not that such a residue reporting would be
incorrect though, the pl330 driver doesn't even advertise burst level
granularity with its residue reporting.


Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"

2018-05-18 Thread Frank Mori Hess
On Thu, May 17, 2018 at 1:22 PM, Lars-Peter Clausen  wrote:
> On 05/17/2018 06:20 PM, Frank Mori Hess wrote:
> The problem is not so much on the software side but even more so on the
> hardware side. Not all hardware even supports aborting a transfer with no
> data loss because there is no precise measurement of how much data has been
> transferred. The residue that is reported is always the lower bound, this
> much has been transferred but it might actually have been more.

I'd just like to point out, if the pl330 driver actually did report a
upper bound on the residue (lower bound on bytes transferred) that
would also blow up Marek's samsung serial driver use case.  Instead of
being in a situation where data loss might occur rarely due to a race
condition, they would be in a situation where data loss occurs every
time they stop a transfer.  Not that such a residue reporting would be
incorrect though, the pl330 driver doesn't even advertise burst level
granularity with its residue reporting.


RE: [RFC PATCH 09/09] Introduce cache=rdma moutning option

2018-05-18 Thread Long Li
> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
> 
> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
> > From: Long Li 
> >
> > When cache=rdma is enabled on mount options, CIFS do not allocate
> > internal data buffer pages for I/O, data is read/writen directly to user
> memory via RDMA.
> 
> I don't think this should be an option.  For direct I/O without signing or
> encryption CIFS should always use get_user_pages, with or without RDMA.

Yes this should be done for all transport. If there are no objections, I'll 
send patches to change this.


RE: [RFC PATCH 09/09] Introduce cache=rdma moutning option

2018-05-18 Thread Long Li
> Subject: Re: [RFC PATCH 09/09] Introduce cache=rdma moutning option
> 
> On Thu, May 17, 2018 at 05:22:14PM -0700, Long Li wrote:
> > From: Long Li 
> >
> > When cache=rdma is enabled on mount options, CIFS do not allocate
> > internal data buffer pages for I/O, data is read/writen directly to user
> memory via RDMA.
> 
> I don't think this should be an option.  For direct I/O without signing or
> encryption CIFS should always use get_user_pages, with or without RDMA.

Yes this should be done for all transport. If there are no objections, I'll 
send patches to change this.


Re: WARNING in ip_recv_error

2018-05-18 Thread Willem de Bruijn
On Fri, May 18, 2018 at 2:46 PM, Willem de Bruijn
 wrote:
> On Fri, May 18, 2018 at 2:44 PM, Willem de Bruijn
>  wrote:
>> On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
>>  wrote:
>>> On Fri, May 18, 2018 at 11:44 AM, David Miller  wrote:
 From: Eric Dumazet 
 Date: Fri, 18 May 2018 08:30:43 -0700

> We probably need to revert Willem patch 
> (7ce875e5ecb8562fd44040f69bda96c999e38bbc)

 Is it really valid to reach ip_recv_err with an ipv6 socket?
>>>
>>> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
>>> atomic operation, so that the socket is neither fully ipv4 nor fully
>>> ipv6 by the time it reaches ip_recv_error.
>>>
>>>   sk->sk_socket->ops = _dgram_ops;
>>>   < HERE >
>>>   sk->sk_family = PF_INET;
>>>
>>> Even calling inet_recv_error to demux would not necessarily help.
>>>
>>> Safest would be to look up by skb->protocol, similar to what
>>> ipv6_recv_error does to handle v4-mapped-v6.
>>>
>>> Or to make that function safe with PF_INET and swap the order
>>> of the above two operations.
>>>
>>> All sound needlessly complicated for this rare socket option, but
>>> I don't have a better idea yet. Dropping on the floor is not nice,
>>> either.
>>
>> Ensuring that ip_recv_error correctly handles packets from either
>> socket and removing the warning should indeed be good.
>>
>> It is robust against v4-mapped packets from an AF_INET6 socket,
>> but see caveat on reconnect below.
>>
>> The code between ipv6_recv_error for v4-mapped addresses and
>> ip_recv_error is essentially the same, the main difference being
>> whether to return network headers as sockaddr_in with SOL_IP
>> or sockaddr_in6 with SOL_IPV6.
>>
>> There are very few other locations in the stack that explicitly test
>> sk_family in this way and thus would be vulnerable to races with
>> IPV6_ADDRFORM.
>>
>> I'm not sure whether it is possible for a udpv6 socket to queue a
>> real ipv6 packet on the error queue, disconnect, connect to an
>> ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
>> on a true ipv6 packet. That would return buggy data, e.g., in
>> msg_name.
>
> In do_ipv6_setsockopt IPV6_ADDRFORM we can test that the
> error queue is empty, and then take its lock for the duration of the
> operation.

Actually, no reason to hold the lock. This setsockopt holds the socket
lock, which connect would need, too. So testing that the queue
is empty after testing that it is connected to a v4 address is
sufficient to ensure that no ipv6 packets are queued for reception.

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 4d780c7f0130..a975d6311341 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -199,6 +199,11 @@ static int do_ipv6_setsockopt(struct sock *sk,
int level, int optname,

if (ipv6_only_sock(sk) ||
!ipv6_addr_v4mapped(>sk_v6_daddr)) {
retv = -EADDRNOTAVAIL;
break;
}

+   if (!skb_queue_empty(>sk_error_queue)) {
+   retv = -EBUSY;
+   break;
+   }
+
fl6_free_socklist(sk);
__ipv6_sock_mc_close(sk);

After this it should be safe to remove the warning in ip_recv_error.


Re: WARNING in ip_recv_error

2018-05-18 Thread Willem de Bruijn
On Fri, May 18, 2018 at 2:46 PM, Willem de Bruijn
 wrote:
> On Fri, May 18, 2018 at 2:44 PM, Willem de Bruijn
>  wrote:
>> On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
>>  wrote:
>>> On Fri, May 18, 2018 at 11:44 AM, David Miller  wrote:
 From: Eric Dumazet 
 Date: Fri, 18 May 2018 08:30:43 -0700

> We probably need to revert Willem patch 
> (7ce875e5ecb8562fd44040f69bda96c999e38bbc)

 Is it really valid to reach ip_recv_err with an ipv6 socket?
>>>
>>> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
>>> atomic operation, so that the socket is neither fully ipv4 nor fully
>>> ipv6 by the time it reaches ip_recv_error.
>>>
>>>   sk->sk_socket->ops = _dgram_ops;
>>>   < HERE >
>>>   sk->sk_family = PF_INET;
>>>
>>> Even calling inet_recv_error to demux would not necessarily help.
>>>
>>> Safest would be to look up by skb->protocol, similar to what
>>> ipv6_recv_error does to handle v4-mapped-v6.
>>>
>>> Or to make that function safe with PF_INET and swap the order
>>> of the above two operations.
>>>
>>> All sound needlessly complicated for this rare socket option, but
>>> I don't have a better idea yet. Dropping on the floor is not nice,
>>> either.
>>
>> Ensuring that ip_recv_error correctly handles packets from either
>> socket and removing the warning should indeed be good.
>>
>> It is robust against v4-mapped packets from an AF_INET6 socket,
>> but see caveat on reconnect below.
>>
>> The code between ipv6_recv_error for v4-mapped addresses and
>> ip_recv_error is essentially the same, the main difference being
>> whether to return network headers as sockaddr_in with SOL_IP
>> or sockaddr_in6 with SOL_IPV6.
>>
>> There are very few other locations in the stack that explicitly test
>> sk_family in this way and thus would be vulnerable to races with
>> IPV6_ADDRFORM.
>>
>> I'm not sure whether it is possible for a udpv6 socket to queue a
>> real ipv6 packet on the error queue, disconnect, connect to an
>> ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
>> on a true ipv6 packet. That would return buggy data, e.g., in
>> msg_name.
>
> In do_ipv6_setsockopt IPV6_ADDRFORM we can test that the
> error queue is empty, and then take its lock for the duration of the
> operation.

Actually, no reason to hold the lock. This setsockopt holds the socket
lock, which connect would need, too. So testing that the queue
is empty after testing that it is connected to a v4 address is
sufficient to ensure that no ipv6 packets are queued for reception.

diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 4d780c7f0130..a975d6311341 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -199,6 +199,11 @@ static int do_ipv6_setsockopt(struct sock *sk,
int level, int optname,

if (ipv6_only_sock(sk) ||
!ipv6_addr_v4mapped(>sk_v6_daddr)) {
retv = -EADDRNOTAVAIL;
break;
}

+   if (!skb_queue_empty(>sk_error_queue)) {
+   retv = -EBUSY;
+   break;
+   }
+
fl6_free_socklist(sk);
__ipv6_sock_mc_close(sk);

After this it should be safe to remove the warning in ip_recv_error.


Re: [PATCH] riscv: Fix the bug in memory access fixup code

2018-05-18 Thread Palmer Dabbelt

On Mon, 07 May 2018 19:59:33 PDT (-0700), alan...@andestech.com wrote:

A piece of fixup code is currently shared by __copy_user and
__clear_user.  It first disables the access to user-space memory
and then returns the "n" argument, which represents #(bytes not processed).
However,__copy_user's "n" is in register a2, while __clear_user's in a1,
and thus it causes errors for programs like setdomainname02 testcase in LTP.

This patch fixes this issue by separating their fixup code and returning
the right value for the kernel to handle a relative fault properly.

Signed-off-by: Alan Kao 
Cc: Greentime Hu 
Cc: Zong Li 
Cc: Vincent Chen 
---
 arch/riscv/lib/uaccess.S | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index 58fb2877c865..0173ea296baa 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -84,7 +84,7 @@ ENTRY(__clear_user)
bgeu t0, t1, 2f
bltu a0, t0, 4f
 1:
-   fixup REG_S, zero, (a0), 10f
+   fixup REG_S, zero, (a0), 11f
addi a0, a0, SZREG
bltu a0, t1, 1b
 2:
@@ -96,12 +96,12 @@ ENTRY(__clear_user)
li a0, 0
ret
 4: /* Edge case: unalignment */
-   fixup sb, zero, (a0), 10f
+   fixup sb, zero, (a0), 11f
addi a0, a0, 1
bltu a0, t0, 4b
j 1b
 5: /* Edge case: remainder */
-   fixup sb, zero, (a0), 10f
+   fixup sb, zero, (a0), 11f
addi a0, a0, 1
bltu a0, a3, 5b
j 3b
@@ -109,9 +109,14 @@ ENDPROC(__clear_user)

.section .fixup,"ax"
.balign 4
+   /* Fixup code for __copy_user(10) and __clear_user(11) */
 10:
/* Disable access to user memory */
csrs sstatus, t6
-   sub a0, a3, a0
+   mv a0, a2
+   ret
+11:
+   csrs sstatus, t6
+   mv a0, a1
ret
.previous


Thanks!


Re: [PATCH] riscv: Fix the bug in memory access fixup code

2018-05-18 Thread Palmer Dabbelt

On Mon, 07 May 2018 19:59:33 PDT (-0700), alan...@andestech.com wrote:

A piece of fixup code is currently shared by __copy_user and
__clear_user.  It first disables the access to user-space memory
and then returns the "n" argument, which represents #(bytes not processed).
However,__copy_user's "n" is in register a2, while __clear_user's in a1,
and thus it causes errors for programs like setdomainname02 testcase in LTP.

This patch fixes this issue by separating their fixup code and returning
the right value for the kernel to handle a relative fault properly.

Signed-off-by: Alan Kao 
Cc: Greentime Hu 
Cc: Zong Li 
Cc: Vincent Chen 
---
 arch/riscv/lib/uaccess.S | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index 58fb2877c865..0173ea296baa 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -84,7 +84,7 @@ ENTRY(__clear_user)
bgeu t0, t1, 2f
bltu a0, t0, 4f
 1:
-   fixup REG_S, zero, (a0), 10f
+   fixup REG_S, zero, (a0), 11f
addi a0, a0, SZREG
bltu a0, t1, 1b
 2:
@@ -96,12 +96,12 @@ ENTRY(__clear_user)
li a0, 0
ret
 4: /* Edge case: unalignment */
-   fixup sb, zero, (a0), 10f
+   fixup sb, zero, (a0), 11f
addi a0, a0, 1
bltu a0, t0, 4b
j 1b
 5: /* Edge case: remainder */
-   fixup sb, zero, (a0), 10f
+   fixup sb, zero, (a0), 11f
addi a0, a0, 1
bltu a0, a3, 5b
j 3b
@@ -109,9 +109,14 @@ ENDPROC(__clear_user)

.section .fixup,"ax"
.balign 4
+   /* Fixup code for __copy_user(10) and __clear_user(11) */
 10:
/* Disable access to user memory */
csrs sstatus, t6
-   sub a0, a3, a0
+   mv a0, a2
+   ret
+11:
+   csrs sstatus, t6
+   mv a0, a1
ret
.previous


Thanks!


Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"

2018-05-18 Thread Frank Mori Hess
On Fri, May 18, 2018 at 12:03 AM, Vinod  wrote:
>
> You are simply mixing things up!

It certainly feels like I'm mixed up.  If I have to resolve this, I'd
like to be a little less mixed up before I submit more patches which
are going to inevitably result in subtly broken code suddenly becoming
prominently and unignorably broken code.  Unfortunately I get the
impression I'm exhausting your patience to answer my questions, and
I've failed to fully communicate what the question is.


> On Pause we don't expect data loss, as user can
> resume the transfer. This means as you rightly guessed, the DMA HW should not 
> drop
> any data, nor should SW.
>
> Now if you want to read residue at this point it is perfectly valid. But if 
> you
> decide to terminate the channel (yes it is terminate_all API), we abort and 
> don't
> have context to report back!

I understand the residue can't be read after terminate, that's why
reading the residue is step 2 in pause/residue/terminate.  My question
was whether the entire sequence pause/residue/terminate taken as a
whole can or cannot lose data.  Saying that individual steps can or
can't lose data is not enough, context is required.  The key point is
whether pause flushes in-flight data to its destination or not.  If it
does, and our residue is accurate, the terminate cannot cause data
loss.  If pause doesn't flush, an additional step of flush_sync as
Lars suggested is required.  So pause/flush_sync/residue/terminate
would be the safe sequence that cannot lose data.


> As Lars rightly pointed out, residue calculation are very tricky, DMA fifo may
> have data, some data may be in device FIFO, so residue is always from DMA 
> point
> of view and may differ from device view (more or less depending upon 
> direction)
>
> Now if you require to add more features for your usecase, please do feel free 
> to
> send a patch. The framework can always be improved, we haven't solved world
> hunger yet!

World hunger?  I don't see how asking questions about a dma engine's
data integrity guarantees is either unreasonable or out of scope.

-- 
Frank


Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"

2018-05-18 Thread Frank Mori Hess
On Fri, May 18, 2018 at 12:03 AM, Vinod  wrote:
>
> You are simply mixing things up!

It certainly feels like I'm mixed up.  If I have to resolve this, I'd
like to be a little less mixed up before I submit more patches which
are going to inevitably result in subtly broken code suddenly becoming
prominently and unignorably broken code.  Unfortunately I get the
impression I'm exhausting your patience to answer my questions, and
I've failed to fully communicate what the question is.


> On Pause we don't expect data loss, as user can
> resume the transfer. This means as you rightly guessed, the DMA HW should not 
> drop
> any data, nor should SW.
>
> Now if you want to read residue at this point it is perfectly valid. But if 
> you
> decide to terminate the channel (yes it is terminate_all API), we abort and 
> don't
> have context to report back!

I understand the residue can't be read after terminate, that's why
reading the residue is step 2 in pause/residue/terminate.  My question
was whether the entire sequence pause/residue/terminate taken as a
whole can or cannot lose data.  Saying that individual steps can or
can't lose data is not enough, context is required.  The key point is
whether pause flushes in-flight data to its destination or not.  If it
does, and our residue is accurate, the terminate cannot cause data
loss.  If pause doesn't flush, an additional step of flush_sync as
Lars suggested is required.  So pause/flush_sync/residue/terminate
would be the safe sequence that cannot lose data.


> As Lars rightly pointed out, residue calculation are very tricky, DMA fifo may
> have data, some data may be in device FIFO, so residue is always from DMA 
> point
> of view and may differ from device view (more or less depending upon 
> direction)
>
> Now if you require to add more features for your usecase, please do feel free 
> to
> send a patch. The framework can always be improved, we haven't solved world
> hunger yet!

World hunger?  I don't see how asking questions about a dma engine's
data integrity guarantees is either unreasonable or out of scope.

-- 
Frank


[PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked

2018-05-18 Thread Joel Fernandes (Google.)
From: "Joel Fernandes (Google)" 

Currently there is a chance of a schedutil cpufreq update request to be
dropped if there is a pending update request. This pending request can
be delayed if there is a scheduling delay of the irq_work and the wake
up of the schedutil governor kthread.

A very bad scenario is when a schedutil request was already just made,
such as to reduce the CPU frequency, then a newer request to increase
CPU frequency (even sched deadline urgent frequency increase requests)
can be dropped, even though the rate limits suggest that its Ok to
process a request. This is because of the way the work_in_progress flag
is used.

This patch improves the situation by allowing new requests to happen
even though the old one is still being processed. Note that in this
approach, if an irq_work was already issued, we just update next_freq
and don't bother to queue another request so there's no extra work being
done to make this happen.

I had brought up this issue at the OSPM conference and Claudio had a
discussion RFC with an alternate approach [1]. I prefer the approach as
done in the patch below since it doesn't need any new flags and doesn't
cause any other extra overhead.

[1] https://patchwork.kernel.org/patch/10384261/

LGTMed-by: Viresh Kumar 
LGTMed-by: Juri Lelli 
CC: Viresh Kumar 
CC: Rafael J. Wysocki 
CC: Peter Zijlstra 
CC: Ingo Molnar 
CC: Patrick Bellasi 
CC: Juri Lelli 
Cc: Luca Abeni 
CC: Joel Fernandes 
CC: Todd Kjos 
CC: clau...@evidence.eu.com
CC: kernel-t...@android.com
CC: linux...@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) 
---
v1 -> v2: Minor style related changes.

 kernel/sched/cpufreq_schedutil.c | 34 
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e13df951aca7..5c482ec38610 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy 
*sg_policy, u64 time)
!cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
 
-   if (sg_policy->work_in_progress)
-   return false;
-
if (unlikely(sg_policy->need_freq_update)) {
sg_policy->need_freq_update = false;
/*
@@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy 
*sg_policy, u64 time,
 
policy->cur = next_freq;
trace_cpu_frequency(next_freq, smp_processor_id());
-   } else {
+   } else if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(_policy->irq_work);
}
@@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
 
ignore_dl_rate_limit(sg_cpu, sg_policy);
 
+   /*
+* For slow-switch systems, single policy requests can't run at the
+* moment if update is in progress, unless we acquire update_lock.
+*/
+   if (sg_policy->work_in_progress)
+   return;
+
if (!sugov_should_update_freq(sg_policy, time))
return;
 
@@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 
time, unsigned int flags)
 static void sugov_work(struct kthread_work *work)
 {
struct sugov_policy *sg_policy = container_of(work, struct 
sugov_policy, work);
+   unsigned int freq;
+   unsigned long flags;
+
+   /*
+* Hold sg_policy->update_lock shortly to handle the case where:
+* incase sg_policy->next_freq is read here, and then updated by
+* sugov_update_shared just before work_in_progress is set to false
+* here, we may miss queueing the new update.
+*
+* Note: If a work was queued after the update_lock is released,
+* sugov_work will just be called again by kthread_work code; and the
+* request will be proceed before the sugov thread sleeps.
+*/
+   raw_spin_lock_irqsave(_policy->update_lock, flags);
+   freq = sg_policy->next_freq;
+   sg_policy->work_in_progress = false;
+   raw_spin_unlock_irqrestore(_policy->update_lock, flags);
 
mutex_lock(_policy->work_lock);
-   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
-   CPUFREQ_RELATION_L);
+   __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
mutex_unlock(_policy->work_lock);
-
-   sg_policy->work_in_progress = false;
 }
 
 static void sugov_irq_work(struct irq_work *irq_work)
-- 
2.17.0.441.gb46fe60e1d-goog



[PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked

2018-05-18 Thread Joel Fernandes (Google.)
From: "Joel Fernandes (Google)" 

Currently there is a chance of a schedutil cpufreq update request to be
dropped if there is a pending update request. This pending request can
be delayed if there is a scheduling delay of the irq_work and the wake
up of the schedutil governor kthread.

A very bad scenario is when a schedutil request was already just made,
such as to reduce the CPU frequency, then a newer request to increase
CPU frequency (even sched deadline urgent frequency increase requests)
can be dropped, even though the rate limits suggest that its Ok to
process a request. This is because of the way the work_in_progress flag
is used.

This patch improves the situation by allowing new requests to happen
even though the old one is still being processed. Note that in this
approach, if an irq_work was already issued, we just update next_freq
and don't bother to queue another request so there's no extra work being
done to make this happen.

I had brought up this issue at the OSPM conference and Claudio had a
discussion RFC with an alternate approach [1]. I prefer the approach as
done in the patch below since it doesn't need any new flags and doesn't
cause any other extra overhead.

[1] https://patchwork.kernel.org/patch/10384261/

LGTMed-by: Viresh Kumar 
LGTMed-by: Juri Lelli 
CC: Viresh Kumar 
CC: Rafael J. Wysocki 
CC: Peter Zijlstra 
CC: Ingo Molnar 
CC: Patrick Bellasi 
CC: Juri Lelli 
Cc: Luca Abeni 
CC: Joel Fernandes 
CC: Todd Kjos 
CC: clau...@evidence.eu.com
CC: kernel-t...@android.com
CC: linux...@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) 
---
v1 -> v2: Minor style related changes.

 kernel/sched/cpufreq_schedutil.c | 34 
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e13df951aca7..5c482ec38610 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy 
*sg_policy, u64 time)
!cpufreq_can_do_remote_dvfs(sg_policy->policy))
return false;
 
-   if (sg_policy->work_in_progress)
-   return false;
-
if (unlikely(sg_policy->need_freq_update)) {
sg_policy->need_freq_update = false;
/*
@@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy 
*sg_policy, u64 time,
 
policy->cur = next_freq;
trace_cpu_frequency(next_freq, smp_processor_id());
-   } else {
+   } else if (!sg_policy->work_in_progress) {
sg_policy->work_in_progress = true;
irq_work_queue(_policy->irq_work);
}
@@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data 
*hook, u64 time,
 
ignore_dl_rate_limit(sg_cpu, sg_policy);
 
+   /*
+* For slow-switch systems, single policy requests can't run at the
+* moment if update is in progress, unless we acquire update_lock.
+*/
+   if (sg_policy->work_in_progress)
+   return;
+
if (!sugov_should_update_freq(sg_policy, time))
return;
 
@@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 
time, unsigned int flags)
 static void sugov_work(struct kthread_work *work)
 {
struct sugov_policy *sg_policy = container_of(work, struct 
sugov_policy, work);
+   unsigned int freq;
+   unsigned long flags;
+
+   /*
+* Hold sg_policy->update_lock shortly to handle the case where:
+* incase sg_policy->next_freq is read here, and then updated by
+* sugov_update_shared just before work_in_progress is set to false
+* here, we may miss queueing the new update.
+*
+* Note: If a work was queued after the update_lock is released,
+* sugov_work will just be called again by kthread_work code; and the
+* request will be proceed before the sugov thread sleeps.
+*/
+   raw_spin_lock_irqsave(_policy->update_lock, flags);
+   freq = sg_policy->next_freq;
+   sg_policy->work_in_progress = false;
+   raw_spin_unlock_irqrestore(_policy->update_lock, flags);
 
mutex_lock(_policy->work_lock);
-   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
-   CPUFREQ_RELATION_L);
+   __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
mutex_unlock(_policy->work_lock);
-
-   sg_policy->work_in_progress = false;
 }
 
 static void sugov_irq_work(struct irq_work *irq_work)
-- 
2.17.0.441.gb46fe60e1d-goog



Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread hpa
On May 18, 2018 11:50:12 AM PDT, Linus Torvalds  
wrote:
>On Fri, May 18, 2018 at 11:34 AM  wrote:
>
>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>torva...@linux-foundation.org> wrote:
>
>> Unfortunately gcc doesn't guarantee that global assembly inlines will
>appear at the top of the file.
>
>Yeah. It really would be better to do the "asm version of -inline".
>
>We already do something like that for real *.S files on some
>architectures
>(because their assembly really wants it, eg
>
>  arch/arm/Makefile:
>KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>-include
>asm/unified.h -msoft-float
>
>but I do want to point out that KBUILD_AFLAGS is *not* used for
>compiler-generated assembly, only for actual *.S files.
>
>Sadly, I don't actually know any way to make gcc call the 'as' phase
>with
>particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>assembler, but there is no "-include" option for GNU as afaik.
>
>Can you perhaps define a macro symbol for "--defsym"? Probably not.
>
>   Linus

I looked at this thing a long time ago; it's not there, and the best would 
probably be to get global asm() working properly in gcc.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread hpa
On May 18, 2018 11:50:12 AM PDT, Linus Torvalds  
wrote:
>On Fri, May 18, 2018 at 11:34 AM  wrote:
>
>> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
>torva...@linux-foundation.org> wrote:
>
>> Unfortunately gcc doesn't guarantee that global assembly inlines will
>appear at the top of the file.
>
>Yeah. It really would be better to do the "asm version of -inline".
>
>We already do something like that for real *.S files on some
>architectures
>(because their assembly really wants it, eg
>
>  arch/arm/Makefile:
>KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y)
>-include
>asm/unified.h -msoft-float
>
>but I do want to point out that KBUILD_AFLAGS is *not* used for
>compiler-generated assembly, only for actual *.S files.
>
>Sadly, I don't actually know any way to make gcc call the 'as' phase
>with
>particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
>assembler, but there is no "-include" option for GNU as afaik.
>
>Can you perhaps define a macro symbol for "--defsym"? Probably not.
>
>   Linus

I looked at this thing a long time ago; it's not there, and the best would 
probably be to get global asm() working properly in gcc.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Linus Torvalds
On Fri, May 18, 2018 at 11:34 AM  wrote:

> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
torva...@linux-foundation.org> wrote:

> Unfortunately gcc doesn't guarantee that global assembly inlines will
appear at the top of the file.

Yeah. It really would be better to do the "asm version of -inline".

We already do something like that for real *.S files on some architectures
(because their assembly really wants it, eg

  arch/arm/Makefile:
KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y) -include
asm/unified.h -msoft-float

but I do want to point out that KBUILD_AFLAGS is *not* used for
compiler-generated assembly, only for actual *.S files.

Sadly, I don't actually know any way to make gcc call the 'as' phase with
particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
assembler, but there is no "-include" option for GNU as afaik.

Can you perhaps define a macro symbol for "--defsym"? Probably not.

   Linus


Re: [PATCH 2/6] x86: bug: prevent gcc distortions

2018-05-18 Thread Linus Torvalds
On Fri, May 18, 2018 at 11:34 AM  wrote:

> On May 18, 2018 11:25:32 AM PDT, Linus Torvalds <
torva...@linux-foundation.org> wrote:

> Unfortunately gcc doesn't guarantee that global assembly inlines will
appear at the top of the file.

Yeah. It really would be better to do the "asm version of -inline".

We already do something like that for real *.S files on some architectures
(because their assembly really wants it, eg

  arch/arm/Makefile:
KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y) -include
asm/unified.h -msoft-float

but I do want to point out that KBUILD_AFLAGS is *not* used for
compiler-generated assembly, only for actual *.S files.

Sadly, I don't actually know any way to make gcc call the 'as' phase with
particular options. We can use "-Wa,xyzzy" to pass in xyzzy to the
assembler, but there is no "-include" option for GNU as afaik.

Can you perhaps define a macro symbol for "--defsym"? Probably not.

   Linus


[PULL REQUEST] i2c for 4.17

2018-05-18 Thread Wolfram Sang
Linus,

I2C has a bunch of driver bugfixes and a MAINTAINERS addition.

Please pull.

Thanks,

   Wolfram


The following changes since commit 75bc37fefc4471e718ba8e651aa74673d4e0a9eb:

  Linux 4.17-rc4 (2018-05-06 16:57:38 -1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git 
i2c/for-current-fixed

for you to fetch changes up to 22aac3eb0c465dd9ea7f06ee1ed8ad933890f2a3:

  MAINTAINERS: add entry for STM32 I2C driver (2018-05-17 16:02:19 +0200)


Alexander Monakov (1):
  i2c: designware: fix poll-after-enable regression

Bartosz Golaszewski (1):
  eeprom: at24: fix retrieving the at24_chip_data structure

Hans de Goede (2):
  i2c: core: ACPI: Improve OpRegion read errors
  i2c: core: ACPI: Log device not acking errors at dbg loglevel

Peter Rosin (3):
  i2c: pmcmsp: return message count on master_xfer success
  i2c: pmcmsp: fix error return from master_xfer
  i2c: viperboard: return message count on master_xfer success

Pierre-Yves MORDRET (1):
  MAINTAINERS: add entry for STM32 I2C driver

Wolfram Sang (1):
  Merge tag 'at24-4.17-rc5-fixes-for-wolfram' of 
git://git.kernel.org/.../brgl/linux into i2c/for-current


with much appreciated quality assurance from

Ben Gardner (1):
  (Test) i2c: designware: fix poll-after-enable regression

Mika Westerberg (2):
  (Rev.) i2c: core: ACPI: Log device not acking errors at dbg loglevel
  (Rev.) i2c: core: ACPI: Improve OpRegion read errors

 MAINTAINERS|  6 ++
 drivers/i2c/busses/i2c-designware-master.c |  5 -
 drivers/i2c/busses/i2c-pmcmsp.c|  4 ++--
 drivers/i2c/busses/i2c-viperboard.c|  2 +-
 drivers/i2c/i2c-core-acpi.c| 13 ++---
 drivers/misc/eeprom/at24.c |  2 +-
 6 files changed, 24 insertions(+), 8 deletions(-)


signature.asc
Description: PGP signature


[PULL REQUEST] i2c for 4.17

2018-05-18 Thread Wolfram Sang
Linus,

I2C has a bunch of driver bugfixes and a MAINTAINERS addition.

Please pull.

Thanks,

   Wolfram


The following changes since commit 75bc37fefc4471e718ba8e651aa74673d4e0a9eb:

  Linux 4.17-rc4 (2018-05-06 16:57:38 -1000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git 
i2c/for-current-fixed

for you to fetch changes up to 22aac3eb0c465dd9ea7f06ee1ed8ad933890f2a3:

  MAINTAINERS: add entry for STM32 I2C driver (2018-05-17 16:02:19 +0200)


Alexander Monakov (1):
  i2c: designware: fix poll-after-enable regression

Bartosz Golaszewski (1):
  eeprom: at24: fix retrieving the at24_chip_data structure

Hans de Goede (2):
  i2c: core: ACPI: Improve OpRegion read errors
  i2c: core: ACPI: Log device not acking errors at dbg loglevel

Peter Rosin (3):
  i2c: pmcmsp: return message count on master_xfer success
  i2c: pmcmsp: fix error return from master_xfer
  i2c: viperboard: return message count on master_xfer success

Pierre-Yves MORDRET (1):
  MAINTAINERS: add entry for STM32 I2C driver

Wolfram Sang (1):
  Merge tag 'at24-4.17-rc5-fixes-for-wolfram' of 
git://git.kernel.org/.../brgl/linux into i2c/for-current


with much appreciated quality assurance from

Ben Gardner (1):
  (Test) i2c: designware: fix poll-after-enable regression

Mika Westerberg (2):
  (Rev.) i2c: core: ACPI: Log device not acking errors at dbg loglevel
  (Rev.) i2c: core: ACPI: Improve OpRegion read errors

 MAINTAINERS|  6 ++
 drivers/i2c/busses/i2c-designware-master.c |  5 -
 drivers/i2c/busses/i2c-pmcmsp.c|  4 ++--
 drivers/i2c/busses/i2c-viperboard.c|  2 +-
 drivers/i2c/i2c-core-acpi.c| 13 ++---
 drivers/misc/eeprom/at24.c |  2 +-
 6 files changed, 24 insertions(+), 8 deletions(-)


signature.asc
Description: PGP signature


Re: WARNING in ip_recv_error

2018-05-18 Thread Willem de Bruijn
On Fri, May 18, 2018 at 2:44 PM, Willem de Bruijn
 wrote:
> On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
>  wrote:
>> On Fri, May 18, 2018 at 11:44 AM, David Miller  wrote:
>>> From: Eric Dumazet 
>>> Date: Fri, 18 May 2018 08:30:43 -0700
>>>
 We probably need to revert Willem patch 
 (7ce875e5ecb8562fd44040f69bda96c999e38bbc)
>>>
>>> Is it really valid to reach ip_recv_err with an ipv6 socket?
>>
>> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
>> atomic operation, so that the socket is neither fully ipv4 nor fully
>> ipv6 by the time it reaches ip_recv_error.
>>
>>   sk->sk_socket->ops = _dgram_ops;
>>   < HERE >
>>   sk->sk_family = PF_INET;
>>
>> Even calling inet_recv_error to demux would not necessarily help.
>>
>> Safest would be to look up by skb->protocol, similar to what
>> ipv6_recv_error does to handle v4-mapped-v6.
>>
>> Or to make that function safe with PF_INET and swap the order
>> of the above two operations.
>>
>> All sound needlessly complicated for this rare socket option, but
>> I don't have a better idea yet. Dropping on the floor is not nice,
>> either.
>
> Ensuring that ip_recv_error correctly handles packets from either
> socket and removing the warning should indeed be good.
>
> It is robust against v4-mapped packets from an AF_INET6 socket,
> but see caveat on reconnect below.
>
> The code between ipv6_recv_error for v4-mapped addresses and
> ip_recv_error is essentially the same, the main difference being
> whether to return network headers as sockaddr_in with SOL_IP
> or sockaddr_in6 with SOL_IPV6.
>
> There are very few other locations in the stack that explicitly test
> sk_family in this way and thus would be vulnerable to races with
> IPV6_ADDRFORM.
>
> I'm not sure whether it is possible for a udpv6 socket to queue a
> real ipv6 packet on the error queue, disconnect, connect to an
> ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
> on a true ipv6 packet. That would return buggy data, e.g., in
> msg_name.

In do_ipv6_setsockopt IPV6_ADDRFORM we can test that the
error queue is empty, and then take its lock for the duration of the
operation.


Re: WARNING in ip_recv_error

2018-05-18 Thread Willem de Bruijn
On Fri, May 18, 2018 at 2:44 PM, Willem de Bruijn
 wrote:
> On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
>  wrote:
>> On Fri, May 18, 2018 at 11:44 AM, David Miller  wrote:
>>> From: Eric Dumazet 
>>> Date: Fri, 18 May 2018 08:30:43 -0700
>>>
 We probably need to revert Willem patch 
 (7ce875e5ecb8562fd44040f69bda96c999e38bbc)
>>>
>>> Is it really valid to reach ip_recv_err with an ipv6 socket?
>>
>> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
>> atomic operation, so that the socket is neither fully ipv4 nor fully
>> ipv6 by the time it reaches ip_recv_error.
>>
>>   sk->sk_socket->ops = _dgram_ops;
>>   < HERE >
>>   sk->sk_family = PF_INET;
>>
>> Even calling inet_recv_error to demux would not necessarily help.
>>
>> Safest would be to look up by skb->protocol, similar to what
>> ipv6_recv_error does to handle v4-mapped-v6.
>>
>> Or to make that function safe with PF_INET and swap the order
>> of the above two operations.
>>
>> All sound needlessly complicated for this rare socket option, but
>> I don't have a better idea yet. Dropping on the floor is not nice,
>> either.
>
> Ensuring that ip_recv_error correctly handles packets from either
> socket and removing the warning should indeed be good.
>
> It is robust against v4-mapped packets from an AF_INET6 socket,
> but see caveat on reconnect below.
>
> The code between ipv6_recv_error for v4-mapped addresses and
> ip_recv_error is essentially the same, the main difference being
> whether to return network headers as sockaddr_in with SOL_IP
> or sockaddr_in6 with SOL_IPV6.
>
> There are very few other locations in the stack that explicitly test
> sk_family in this way and thus would be vulnerable to races with
> IPV6_ADDRFORM.
>
> I'm not sure whether it is possible for a udpv6 socket to queue a
> real ipv6 packet on the error queue, disconnect, connect to an
> ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
> on a true ipv6 packet. That would return buggy data, e.g., in
> msg_name.

In do_ipv6_setsockopt IPV6_ADDRFORM we can test that the
error queue is empty, and then take its lock for the duration of the
operation.


Re: WARNING in ip_recv_error

2018-05-18 Thread Willem de Bruijn
On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
 wrote:
> On Fri, May 18, 2018 at 11:44 AM, David Miller  wrote:
>> From: Eric Dumazet 
>> Date: Fri, 18 May 2018 08:30:43 -0700
>>
>>> We probably need to revert Willem patch 
>>> (7ce875e5ecb8562fd44040f69bda96c999e38bbc)
>>
>> Is it really valid to reach ip_recv_err with an ipv6 socket?
>
> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
> atomic operation, so that the socket is neither fully ipv4 nor fully
> ipv6 by the time it reaches ip_recv_error.
>
>   sk->sk_socket->ops = _dgram_ops;
>   < HERE >
>   sk->sk_family = PF_INET;
>
> Even calling inet_recv_error to demux would not necessarily help.
>
> Safest would be to look up by skb->protocol, similar to what
> ipv6_recv_error does to handle v4-mapped-v6.
>
> Or to make that function safe with PF_INET and swap the order
> of the above two operations.
>
> All sound needlessly complicated for this rare socket option, but
> I don't have a better idea yet. Dropping on the floor is not nice,
> either.

Ensuring that ip_recv_error correctly handles packets from either
socket and removing the warning should indeed be good.

It is robust against v4-mapped packets from an AF_INET6 socket,
but see caveat on reconnect below.

The code between ipv6_recv_error for v4-mapped addresses and
ip_recv_error is essentially the same, the main difference being
whether to return network headers as sockaddr_in with SOL_IP
or sockaddr_in6 with SOL_IPV6.

There are very few other locations in the stack that explicitly test
sk_family in this way and thus would be vulnerable to races with
IPV6_ADDRFORM.

I'm not sure whether it is possible for a udpv6 socket to queue a
real ipv6 packet on the error queue, disconnect, connect to an
ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
on a true ipv6 packet. That would return buggy data, e.g., in
msg_name.


Re: WARNING in ip_recv_error

2018-05-18 Thread Willem de Bruijn
On Fri, May 18, 2018 at 1:09 PM, Willem de Bruijn
 wrote:
> On Fri, May 18, 2018 at 11:44 AM, David Miller  wrote:
>> From: Eric Dumazet 
>> Date: Fri, 18 May 2018 08:30:43 -0700
>>
>>> We probably need to revert Willem patch 
>>> (7ce875e5ecb8562fd44040f69bda96c999e38bbc)
>>
>> Is it really valid to reach ip_recv_err with an ipv6 socket?
>
> I guess the issue is that setsockopt IPV6_ADDRFORM is not an
> atomic operation, so that the socket is neither fully ipv4 nor fully
> ipv6 by the time it reaches ip_recv_error.
>
>   sk->sk_socket->ops = _dgram_ops;
>   < HERE >
>   sk->sk_family = PF_INET;
>
> Even calling inet_recv_error to demux would not necessarily help.
>
> Safest would be to look up by skb->protocol, similar to what
> ipv6_recv_error does to handle v4-mapped-v6.
>
> Or to make that function safe with PF_INET and swap the order
> of the above two operations.
>
> All sound needlessly complicated for this rare socket option, but
> I don't have a better idea yet. Dropping on the floor is not nice,
> either.

Ensuring that ip_recv_error correctly handles packets from either
socket and removing the warning should indeed be good.

It is robust against v4-mapped packets from an AF_INET6 socket,
but see caveat on reconnect below.

The code between ipv6_recv_error for v4-mapped addresses and
ip_recv_error is essentially the same, the main difference being
whether to return network headers as sockaddr_in with SOL_IP
or sockaddr_in6 with SOL_IPV6.

There are very few other locations in the stack that explicitly test
sk_family in this way and thus would be vulnerable to races with
IPV6_ADDRFORM.

I'm not sure whether it is possible for a udpv6 socket to queue a
real ipv6 packet on the error queue, disconnect, connect to an
ipv4 address, call IPV6_ADDRFORM and then call ip_recv_error
on a true ipv6 packet. That would return buggy data, e.g., in
msg_name.


Re: [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more

2018-05-18 Thread Palmer Dabbelt
On Sun, 06 May 2018 07:57:27 PDT (-0700), mi...@kernel.org wrote:
>
> * Andrea Parri  wrote:
>
>> Hi Ingo,
>>
>> > From 5affbf7e91901143f84f1b2ca64f4afe70e210fd Mon Sep 17 00:00:00 2001
>> > From: Ingo Molnar 
>> > Date: Sat, 5 May 2018 10:23:23 +0200
>> > Subject: [PATCH] locking/atomics: Simplify the op definitions in atomic.h 
>> > some more
>> >
>> > Before:
>> >
>> >  #ifndef atomic_fetch_dec_relaxed
>> >  # ifndef atomic_fetch_dec
>> >  #  define atomic_fetch_dec(v) atomic_fetch_sub(1, (v))
>> >  #  define atomic_fetch_dec_relaxed(v) 
>> > atomic_fetch_sub_relaxed(1, (v))
>> >  #  define atomic_fetch_dec_acquire(v) 
>> > atomic_fetch_sub_acquire(1, (v))
>> >  #  define atomic_fetch_dec_release(v) 
>> > atomic_fetch_sub_release(1, (v))
>> >  # else
>> >  #  define atomic_fetch_dec_relaxedatomic_fetch_dec
>> >  #  define atomic_fetch_dec_acquireatomic_fetch_dec
>> >  #  define atomic_fetch_dec_releaseatomic_fetch_dec
>> >  # endif
>> >  #else
>> >  # ifndef atomic_fetch_dec_acquire
>> >  #  define atomic_fetch_dec_acquire(...)   
>> > __atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)
>> >  # endif
>> >  # ifndef atomic_fetch_dec_release
>> >  #  define atomic_fetch_dec_release(...)   
>> > __atomic_op_release(atomic_fetch_dec, __VA_ARGS__)
>> >  # endif
>> >  # ifndef atomic_fetch_dec
>> >  #  define atomic_fetch_dec(...)   
>> > __atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)
>> >  # endif
>> >  #endif
>> >
>> > After:
>> >
>> >  #ifndef atomic_fetch_dec_relaxed
>> >  # ifndef atomic_fetch_dec
>> >  #  define atomic_fetch_dec(v) atomic_fetch_sub(1, (v))
>> >  #  define atomic_fetch_dec_relaxed(v) 
>> > atomic_fetch_sub_relaxed(1, (v))
>> >  #  define atomic_fetch_dec_acquire(v) 
>> > atomic_fetch_sub_acquire(1, (v))
>> >  #  define atomic_fetch_dec_release(v) 
>> > atomic_fetch_sub_release(1, (v))
>> >  # else
>> >  #  define atomic_fetch_dec_relaxedatomic_fetch_dec
>> >  #  define atomic_fetch_dec_acquireatomic_fetch_dec
>> >  #  define atomic_fetch_dec_releaseatomic_fetch_dec
>> >  # endif
>> >  #else
>> >  # ifndef atomic_fetch_dec
>> >  #  define atomic_fetch_dec(...)   
>> > __atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)
>> >  #  define atomic_fetch_dec_acquire(...)   
>> > __atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)
>> >  #  define atomic_fetch_dec_release(...)   
>> > __atomic_op_release(atomic_fetch_dec, __VA_ARGS__)
>> >  # endif
>> >  #endif
>> >
>> > The idea is that because we already group these APIs by certain defines
>> > such as atomic_fetch_dec_relaxed and atomic_fetch_dec in the primary
>> > branches - we can do the same in the secondary branch as well.
>> >
>> > ( Also remove some unnecessarily duplicate comments, as the API
>> >   group defines are now pretty much self-documenting. )
>> >
>> > No change in functionality.
>> >
>> > Cc: Peter Zijlstra 
>> > Cc: Linus Torvalds 
>> > Cc: Andrew Morton 
>> > Cc: Thomas Gleixner 
>> > Cc: Paul E. McKenney 
>> > Cc: Will Deacon 
>> > Cc: linux-kernel@vger.kernel.org
>> > Signed-off-by: Ingo Molnar 
>>
>> This breaks compilation on RISC-V. (For some of its atomics, the arch
>> currently defines the _relaxed and the full variants and it relies on
>> the generic definitions for the _acquire and the _release variants.)
>
> I don't have cross-compilation for RISC-V, which is a relatively new arch.
> (Is there any RISC-V set of cross-compilation tools on kernel.org somewhere?)

Arnd added RISC-V to the cross compiler list a month or two ago when he updated 
them all.  I use the "make.cross" script from the Intel test robot, which will 
fetch the cross compilers for you.  It looks like I made a Git Hub pull request 
to update the script for RISC-V, it fetches from kernel.org


https://github.com/palmer-dabbelt/lkp-tests/blob/e14f4236ccd0572f4b87ffd480fecefee412dedc/sbin/make.cross
http://cdn.kernel.org/pub/tools/crosstool/files/bin/

http://cdn.kernel.org/pub/tools/crosstool/files/bin/x86_64/7.3.0/x86_64-gcc-7.3.0-nolibc_riscv64-linux.tar.gz

> Could you please send a patch that defines those variants against Linus's 
> tree,
> like the PowerPC patch that does something similar:
>
>   0476a632cb3a: locking/atomics/powerpc: Move cmpxchg helpers to 
> asm/cmpxchg.h and define the full set of cmpxchg APIs
>
> ?
>
> ... and I'll integrate it into the proper place to make it all bisectable, 
> etc.

Sorry, I got buried in email again.  Did this get merged, or is there a current 
version of the patch set I should look at?


Re: [PATCH] locking/atomics: Simplify the op definitions in atomic.h some more

2018-05-18 Thread Palmer Dabbelt
On Sun, 06 May 2018 07:57:27 PDT (-0700), mi...@kernel.org wrote:
>
> * Andrea Parri  wrote:
>
>> Hi Ingo,
>>
>> > From 5affbf7e91901143f84f1b2ca64f4afe70e210fd Mon Sep 17 00:00:00 2001
>> > From: Ingo Molnar 
>> > Date: Sat, 5 May 2018 10:23:23 +0200
>> > Subject: [PATCH] locking/atomics: Simplify the op definitions in atomic.h 
>> > some more
>> >
>> > Before:
>> >
>> >  #ifndef atomic_fetch_dec_relaxed
>> >  # ifndef atomic_fetch_dec
>> >  #  define atomic_fetch_dec(v) atomic_fetch_sub(1, (v))
>> >  #  define atomic_fetch_dec_relaxed(v) 
>> > atomic_fetch_sub_relaxed(1, (v))
>> >  #  define atomic_fetch_dec_acquire(v) 
>> > atomic_fetch_sub_acquire(1, (v))
>> >  #  define atomic_fetch_dec_release(v) 
>> > atomic_fetch_sub_release(1, (v))
>> >  # else
>> >  #  define atomic_fetch_dec_relaxedatomic_fetch_dec
>> >  #  define atomic_fetch_dec_acquireatomic_fetch_dec
>> >  #  define atomic_fetch_dec_releaseatomic_fetch_dec
>> >  # endif
>> >  #else
>> >  # ifndef atomic_fetch_dec_acquire
>> >  #  define atomic_fetch_dec_acquire(...)   
>> > __atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)
>> >  # endif
>> >  # ifndef atomic_fetch_dec_release
>> >  #  define atomic_fetch_dec_release(...)   
>> > __atomic_op_release(atomic_fetch_dec, __VA_ARGS__)
>> >  # endif
>> >  # ifndef atomic_fetch_dec
>> >  #  define atomic_fetch_dec(...)   
>> > __atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)
>> >  # endif
>> >  #endif
>> >
>> > After:
>> >
>> >  #ifndef atomic_fetch_dec_relaxed
>> >  # ifndef atomic_fetch_dec
>> >  #  define atomic_fetch_dec(v) atomic_fetch_sub(1, (v))
>> >  #  define atomic_fetch_dec_relaxed(v) 
>> > atomic_fetch_sub_relaxed(1, (v))
>> >  #  define atomic_fetch_dec_acquire(v) 
>> > atomic_fetch_sub_acquire(1, (v))
>> >  #  define atomic_fetch_dec_release(v) 
>> > atomic_fetch_sub_release(1, (v))
>> >  # else
>> >  #  define atomic_fetch_dec_relaxedatomic_fetch_dec
>> >  #  define atomic_fetch_dec_acquireatomic_fetch_dec
>> >  #  define atomic_fetch_dec_releaseatomic_fetch_dec
>> >  # endif
>> >  #else
>> >  # ifndef atomic_fetch_dec
>> >  #  define atomic_fetch_dec(...)   
>> > __atomic_op_fence(atomic_fetch_dec, __VA_ARGS__)
>> >  #  define atomic_fetch_dec_acquire(...)   
>> > __atomic_op_acquire(atomic_fetch_dec, __VA_ARGS__)
>> >  #  define atomic_fetch_dec_release(...)   
>> > __atomic_op_release(atomic_fetch_dec, __VA_ARGS__)
>> >  # endif
>> >  #endif
>> >
>> > The idea is that because we already group these APIs by certain defines
>> > such as atomic_fetch_dec_relaxed and atomic_fetch_dec in the primary
>> > branches - we can do the same in the secondary branch as well.
>> >
>> > ( Also remove some unnecessarily duplicate comments, as the API
>> >   group defines are now pretty much self-documenting. )
>> >
>> > No change in functionality.
>> >
>> > Cc: Peter Zijlstra 
>> > Cc: Linus Torvalds 
>> > Cc: Andrew Morton 
>> > Cc: Thomas Gleixner 
>> > Cc: Paul E. McKenney 
>> > Cc: Will Deacon 
>> > Cc: linux-kernel@vger.kernel.org
>> > Signed-off-by: Ingo Molnar 
>>
>> This breaks compilation on RISC-V. (For some of its atomics, the arch
>> currently defines the _relaxed and the full variants and it relies on
>> the generic definitions for the _acquire and the _release variants.)
>
> I don't have cross-compilation for RISC-V, which is a relatively new arch.
> (Is there any RISC-V set of cross-compilation tools on kernel.org somewhere?)

Arnd added RISC-V to the cross compiler list a month or two ago when he updated 
them all.  I use the "make.cross" script from the Intel test robot, which will 
fetch the cross compilers for you.  It looks like I made a Git Hub pull request 
to update the script for RISC-V, it fetches from kernel.org


https://github.com/palmer-dabbelt/lkp-tests/blob/e14f4236ccd0572f4b87ffd480fecefee412dedc/sbin/make.cross
http://cdn.kernel.org/pub/tools/crosstool/files/bin/

http://cdn.kernel.org/pub/tools/crosstool/files/bin/x86_64/7.3.0/x86_64-gcc-7.3.0-nolibc_riscv64-linux.tar.gz

> Could you please send a patch that defines those variants against Linus's 
> tree,
> like the PowerPC patch that does something similar:
>
>   0476a632cb3a: locking/atomics/powerpc: Move cmpxchg helpers to 
> asm/cmpxchg.h and define the full set of cmpxchg APIs
>
> ?
>
> ... and I'll integrate it into the proper place to make it all bisectable, 
> etc.

Sorry, I got buried in email again.  Did this get merged, or is there a current 
version of the patch set I should look at?


Tasks RCU vs Preempt RCU

2018-05-18 Thread Joel Fernandes
Hi,

I was thinking about tasks-RCU and why its needed. Since preempt-RCU allows
tasks to be preempted in read-sections, can we not just reuse that mechanism
for the trampolines since we track all preempted tasks so we would wait on
all tasks preempted within a trampoline?

I am trying to understand what will _not_ work if we did that.. I'm guessing
the answer is that that would mean the trampoline has to be wrapped with
rcu_read_{lock,unlock} which may add some overhead, but please let me know
if I'm missing something else..

The advantage I guess is possible elimination of an RCU variant, and also
possibly eliminating the tasks RCU thread that monitors.. Anyway I was
thinking more in terms of the effort of reduction of the RCU flavors etc and
reducing complexity ideas.

thanks!

- Joel



Tasks RCU vs Preempt RCU

2018-05-18 Thread Joel Fernandes
Hi,

I was thinking about tasks-RCU and why its needed. Since preempt-RCU allows
tasks to be preempted in read-sections, can we not just reuse that mechanism
for the trampolines since we track all preempted tasks so we would wait on
all tasks preempted within a trampoline?

I am trying to understand what will _not_ work if we did that.. I'm guessing
the answer is that that would mean the trampoline has to be wrapped with
rcu_read_{lock,unlock} which may add some overhead, but please let me know
if I'm missing something else..

The advantage I guess is possible elimination of an RCU variant, and also
possibly eliminating the tasks RCU thread that monitors.. Anyway I was
thinking more in terms of the effort of reduction of the RCU flavors etc and
reducing complexity ideas.

thanks!

- Joel



Re: seq_putc is preferable to seq_puts

2018-05-18 Thread Joe Perches
On Fri, 2018-05-18 at 10:50 -0700, Joe Perches wrote:
> and a trivial script to convert them all treewide
> 
> $ git grep -P --name-only 
> '\b(?:seq_puts|seq_printf)\s*\(\s*[^,]+,\s*"(?:\\.|.)"\s*\)\s*;' | \
>   xargs perl -i -e 'local $/; while (<>) { 
> s/\b(?:seq_puts|seq_printf)\s*\(\s*([^,]+),\s*"(\\.|.)"\s*\)\s*;/seq_putc(\1, 
> '"'\2'"')/g; print;}'

Hmm, pity this was missing a trailing semicolon in the substitution

$ git grep -P --name-only 
'\b(?:seq_puts|seq_printf)\s*\(\s*[^,]+,\s*"(?:\\.|.)"\s*\)\s*;' | \
  xargs perl -i -e 'local $/; while (<>) { 
s/\b(?:seq_puts|seq_printf)\s*\(\s*([^,]+),\s*"(\\.|.)"\s*\)\s*;/seq_putc(\1, 
'"'\2'"');/g; print;}'



Re: seq_putc is preferable to seq_puts

2018-05-18 Thread Joe Perches
On Fri, 2018-05-18 at 10:50 -0700, Joe Perches wrote:
> and a trivial script to convert them all treewide
> 
> $ git grep -P --name-only 
> '\b(?:seq_puts|seq_printf)\s*\(\s*[^,]+,\s*"(?:\\.|.)"\s*\)\s*;' | \
>   xargs perl -i -e 'local $/; while (<>) { 
> s/\b(?:seq_puts|seq_printf)\s*\(\s*([^,]+),\s*"(\\.|.)"\s*\)\s*;/seq_putc(\1, 
> '"'\2'"')/g; print;}'

Hmm, pity this was missing a trailing semicolon in the substitution

$ git grep -P --name-only 
'\b(?:seq_puts|seq_printf)\s*\(\s*[^,]+,\s*"(?:\\.|.)"\s*\)\s*;' | \
  xargs perl -i -e 'local $/; while (<>) { 
s/\b(?:seq_puts|seq_printf)\s*\(\s*([^,]+),\s*"(\\.|.)"\s*\)\s*;/seq_putc(\1, 
'"'\2'"');/g; print;}'



Re: [PATCH v11 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64

2018-05-18 Thread hpa
On May 18, 2018 11:00:05 AM PDT, Bart Van Assche  wrote:
>The next patch in this series introduces a call to cmpxchg64()
>in the block layer core for those architectures on which this
>functionality is available. Make it possible to test whether
>cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64.
>
>Signed-off-by: Bart Van Assche 
>Cc: Catalin Marinas 
>Cc: Will Deacon 
>Cc: Tony Luck 
>Cc: Fenghua Yu 
>Cc: Geert Uytterhoeven 
>Cc: "James E.J. Bottomley" 
>Cc: Helge Deller 
>Cc: Benjamin Herrenschmidt 
>Cc: Paul Mackerras 
>Cc: Michael Ellerman 
>Cc: Martin Schwidefsky 
>Cc: Heiko Carstens 
>Cc: David S. Miller 
>Cc: Thomas Gleixner 
>Cc: Ingo Molnar 
>Cc: H. Peter Anvin 
>Cc: Chris Zankel 
>Cc: Max Filippov 
>Cc: Arnd Bergmann 
>Cc: Jonathan Corbet 
>---
>.../features/locking/cmpxchg64/arch-support.txt| 33
>++
> arch/Kconfig   |  4 +++
> arch/arm/Kconfig   |  1 +
> arch/ia64/Kconfig  |  1 +
> arch/m68k/Kconfig  |  1 +
> arch/mips/Kconfig  |  1 +
> arch/parisc/Kconfig|  1 +
> arch/riscv/Kconfig |  1 +
> arch/sparc/Kconfig |  1 +
> arch/x86/Kconfig   |  1 +
> arch/xtensa/Kconfig|  1 +
> 11 files changed, 46 insertions(+)
>create mode 100644
>Documentation/features/locking/cmpxchg64/arch-support.txt
>
>diff --git a/Documentation/features/locking/cmpxchg64/arch-support.txt
>b/Documentation/features/locking/cmpxchg64/arch-support.txt
>new file mode 100644
>index ..84bfef7242b2
>--- /dev/null
>+++ b/Documentation/features/locking/cmpxchg64/arch-support.txt
>@@ -0,0 +1,33 @@
>+#
>+# Feature name:  cmpxchg64
>+# Kconfig:   ARCH_HAVE_CMPXCHG64
>+# description:   arch supports the cmpxchg64() API
>+#
>+---
>+| arch |status|
>+---
>+|   alpha: |  ok  |
>+| arc: |  ..  |
>+| arm: |  ok  |
>+|   arm64: |  ok  |
>+| c6x: |  ..  |
>+|   h8300: |  ..  |
>+| hexagon: |  ..  |
>+|ia64: |  ok  |
>+|m68k: |  ok  |
>+|  microblaze: |  ..  |
>+|mips: |  ok  |
>+|   nds32: |  ..  |
>+|   nios2: |  ..  |
>+|openrisc: |  ..  |
>+|  parisc: |  ok  |
>+| powerpc: |  ok  |
>+|   riscv: |  ok  |
>+|s390: |  ok  |
>+|  sh: |  ..  |
>+|   sparc: |  ok  |
>+|  um: |  ..  |
>+|   unicore32: |  ..  |
>+| x86: |  ok  |
>+|  xtensa: |  ok  |
>+---
>diff --git a/arch/Kconfig b/arch/Kconfig
>index 8e0d665c8d53..9840b2577af1 100644
>--- a/arch/Kconfig
>+++ b/arch/Kconfig
>@@ -358,6 +358,10 @@ config HAVE_ALIGNED_STRUCT_PAGE
> on a struct page for better performance. However selecting this
> might increase the size of a struct page by a word.
> 
>+config ARCH_HAVE_CMPXCHG64
>+  bool
>+  default y if 64BIT
>+
> config HAVE_CMPXCHG_LOCAL
>   bool
> 
>diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>index a7f8e7f4b88f..02c75697176e 100644
>--- a/arch/arm/Kconfig
>+++ b/arch/arm/Kconfig
>@@ -13,6 +13,7 @@ config ARM
>   select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
>   select ARCH_HAS_STRICT_MODULE_RWX if MMU
>   select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>+  select ARCH_HAVE_CMPXCHG64 if !THUMB2_KERNEL
>   select ARCH_HAVE_CUSTOM_GPIO_H
>   select ARCH_HAS_GCOV_PROFILE_ALL
>   select ARCH_MIGHT_HAVE_PC_PARPORT
>diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
>index bbe12a038d21..31c49e1482e2 100644
>--- a/arch/ia64/Kconfig
>+++ b/arch/ia64/Kconfig
>@@ -41,6 +41,7 @@ config IA64
>   select GENERIC_PENDING_IRQ if SMP
>   select GENERIC_IRQ_SHOW
>   select GENERIC_IRQ_LEGACY
>+  select ARCH_HAVE_CMPXCHG64
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select GENERIC_IOMAP
>   select GENERIC_SMP_IDLE_THREAD
>diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
>index 785612b576f7..7b87cda3bbed 100644
>--- a/arch/m68k/Kconfig
>+++ b/arch/m68k/Kconfig
>@@ -11,6 +11,7 @@ config M68K
>   select GENERIC_ATOMIC64
>   select HAVE_UID16
>   select VIRT_TO_BUS
>+  select 

Re: [PATCH v11 1/2] arch/*: Add CONFIG_ARCH_HAVE_CMPXCHG64

2018-05-18 Thread hpa
On May 18, 2018 11:00:05 AM PDT, Bart Van Assche  wrote:
>The next patch in this series introduces a call to cmpxchg64()
>in the block layer core for those architectures on which this
>functionality is available. Make it possible to test whether
>cmpxchg64() is available by introducing CONFIG_ARCH_HAVE_CMPXCHG64.
>
>Signed-off-by: Bart Van Assche 
>Cc: Catalin Marinas 
>Cc: Will Deacon 
>Cc: Tony Luck 
>Cc: Fenghua Yu 
>Cc: Geert Uytterhoeven 
>Cc: "James E.J. Bottomley" 
>Cc: Helge Deller 
>Cc: Benjamin Herrenschmidt 
>Cc: Paul Mackerras 
>Cc: Michael Ellerman 
>Cc: Martin Schwidefsky 
>Cc: Heiko Carstens 
>Cc: David S. Miller 
>Cc: Thomas Gleixner 
>Cc: Ingo Molnar 
>Cc: H. Peter Anvin 
>Cc: Chris Zankel 
>Cc: Max Filippov 
>Cc: Arnd Bergmann 
>Cc: Jonathan Corbet 
>---
>.../features/locking/cmpxchg64/arch-support.txt| 33
>++
> arch/Kconfig   |  4 +++
> arch/arm/Kconfig   |  1 +
> arch/ia64/Kconfig  |  1 +
> arch/m68k/Kconfig  |  1 +
> arch/mips/Kconfig  |  1 +
> arch/parisc/Kconfig|  1 +
> arch/riscv/Kconfig |  1 +
> arch/sparc/Kconfig |  1 +
> arch/x86/Kconfig   |  1 +
> arch/xtensa/Kconfig|  1 +
> 11 files changed, 46 insertions(+)
>create mode 100644
>Documentation/features/locking/cmpxchg64/arch-support.txt
>
>diff --git a/Documentation/features/locking/cmpxchg64/arch-support.txt
>b/Documentation/features/locking/cmpxchg64/arch-support.txt
>new file mode 100644
>index ..84bfef7242b2
>--- /dev/null
>+++ b/Documentation/features/locking/cmpxchg64/arch-support.txt
>@@ -0,0 +1,33 @@
>+#
>+# Feature name:  cmpxchg64
>+# Kconfig:   ARCH_HAVE_CMPXCHG64
>+# description:   arch supports the cmpxchg64() API
>+#
>+---
>+| arch |status|
>+---
>+|   alpha: |  ok  |
>+| arc: |  ..  |
>+| arm: |  ok  |
>+|   arm64: |  ok  |
>+| c6x: |  ..  |
>+|   h8300: |  ..  |
>+| hexagon: |  ..  |
>+|ia64: |  ok  |
>+|m68k: |  ok  |
>+|  microblaze: |  ..  |
>+|mips: |  ok  |
>+|   nds32: |  ..  |
>+|   nios2: |  ..  |
>+|openrisc: |  ..  |
>+|  parisc: |  ok  |
>+| powerpc: |  ok  |
>+|   riscv: |  ok  |
>+|s390: |  ok  |
>+|  sh: |  ..  |
>+|   sparc: |  ok  |
>+|  um: |  ..  |
>+|   unicore32: |  ..  |
>+| x86: |  ok  |
>+|  xtensa: |  ok  |
>+---
>diff --git a/arch/Kconfig b/arch/Kconfig
>index 8e0d665c8d53..9840b2577af1 100644
>--- a/arch/Kconfig
>+++ b/arch/Kconfig
>@@ -358,6 +358,10 @@ config HAVE_ALIGNED_STRUCT_PAGE
> on a struct page for better performance. However selecting this
> might increase the size of a struct page by a word.
> 
>+config ARCH_HAVE_CMPXCHG64
>+  bool
>+  default y if 64BIT
>+
> config HAVE_CMPXCHG_LOCAL
>   bool
> 
>diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>index a7f8e7f4b88f..02c75697176e 100644
>--- a/arch/arm/Kconfig
>+++ b/arch/arm/Kconfig
>@@ -13,6 +13,7 @@ config ARM
>   select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
>   select ARCH_HAS_STRICT_MODULE_RWX if MMU
>   select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>+  select ARCH_HAVE_CMPXCHG64 if !THUMB2_KERNEL
>   select ARCH_HAVE_CUSTOM_GPIO_H
>   select ARCH_HAS_GCOV_PROFILE_ALL
>   select ARCH_MIGHT_HAVE_PC_PARPORT
>diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
>index bbe12a038d21..31c49e1482e2 100644
>--- a/arch/ia64/Kconfig
>+++ b/arch/ia64/Kconfig
>@@ -41,6 +41,7 @@ config IA64
>   select GENERIC_PENDING_IRQ if SMP
>   select GENERIC_IRQ_SHOW
>   select GENERIC_IRQ_LEGACY
>+  select ARCH_HAVE_CMPXCHG64
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG
>   select GENERIC_IOMAP
>   select GENERIC_SMP_IDLE_THREAD
>diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
>index 785612b576f7..7b87cda3bbed 100644
>--- a/arch/m68k/Kconfig
>+++ b/arch/m68k/Kconfig
>@@ -11,6 +11,7 @@ config M68K
>   select GENERIC_ATOMIC64
>   select HAVE_UID16
>   select VIRT_TO_BUS
>+  select ARCH_HAVE_CMPXCHG64
>   select ARCH_HAVE_NMI_SAFE_CMPXCHG if RMW_INSNS
>   select GENERIC_CPU_DEVICES
>   select GENERIC_IOMAP
>diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
>index 225c95da23ce..088bca0fd9f2 100644
>--- a/arch/mips/Kconfig
>+++ b/arch/mips/Kconfig
>@@ -7,6 +7,7 @@ config MIPS
>   select ARCH_DISCARD_MEMBLOCK
>   select ARCH_HAS_ELF_RANDOMIZE
>   select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
>+  select 

<    1   2   3   4   5   6   7   8   9   10   >