Re: [PATCH v12 7/9] ACPI: Translate the I/O range of non-MMIO devices before scanning

2018-02-03 Thread Rafael J. Wysocki
On Tue, Jan 23, 2018 at 5:36 PM, John Garry  wrote:
> On some platforms(such as Hip06/Hip07), the legacy ISA/LPC devices access
> I/O with some special host-local I/O ports known on x86. As their I/O space
> are not memory mapped like PCI/PCIE MMIO host bridges, this patch is meant
> to support a new class of I/O host controllers where the local IO ports of
> the children devices are translated into the Indirect I/O address space.
>
> Through the handler attach callback, all the I/O translations are done
> before starting the enumeration on children devices and the translated
> addresses are replaced in the children resources.

The changelog is somewhat dry for a patch adding over 300 lines of new
code and a new file for that matter.

> Signed-off-by: John Garry 
> Signed-off-by: Zhichang Yuan 
> Signed-off-by: Gabriele Paoloni 
> ---
>  drivers/acpi/arm64/Makefile  |   1 +
>  drivers/acpi/arm64/acpi_indirectio.c | 273 
> +++
>  drivers/acpi/internal.h  |   5 +
>  drivers/acpi/scan.c  |   1 +
>  4 files changed, 280 insertions(+)
>  create mode 100644 drivers/acpi/arm64/acpi_indirectio.c
>
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 1017def..f4a7f46 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -1,2 +1,3 @@
>  obj-$(CONFIG_ACPI_IORT)+= iort.o
>  obj-$(CONFIG_ACPI_GTDT)+= gtdt.o
> +obj-$(CONFIG_INDIRECT_PIO) += acpi_indirectio.o
> diff --git a/drivers/acpi/arm64/acpi_indirectio.c 
> b/drivers/acpi/arm64/acpi_indirectio.c
> new file mode 100644
> index 000..2649f57
> --- /dev/null
> +++ b/drivers/acpi/arm64/acpi_indirectio.c
> @@ -0,0 +1,273 @@

SPDX license identifier here?

> +/*
> + * ACPI support for indirect-IO bus.
> + *
> + * Copyright (C) 2017 HiSilicon Limited, All Rights Reserved.
> + * Author: Gabriele Paoloni 
> + * Author: Zhichang Yuan 
> + * Author: John Garry 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.

And then you can skip the above.

Also I would like to see some description of what's there in this file
to appear here.

> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +ACPI_MODULE_NAME("indirect IO");
> +
> +#define ACPI_INDIRECTIO_NAME_LENGTH 255
> +
> +#define INDIRECT_IO_INFO(desc) ((unsigned long))
> +
> +struct acpi_indirectio_mfd_cell {
> +   struct mfd_cell_acpi_match acpi_match;
> +   char name[ACPI_INDIRECTIO_NAME_LENGTH];
> +   char pnpid[ACPI_INDIRECTIO_NAME_LENGTH];
> +};
> +
> +struct acpi_indirectio_host_data {
> +   resource_size_t io_size;
> +   resource_size_t io_start;
> +};
> +
> +struct acpi_indirectio_device_desc {

Why don't you use a consistent naming convention and call this
acpi_indirect_io_device_desc (and analogously everywhere above and
below)?

> +   struct acpi_indirectio_host_data pdata; /* device relevant info data 
> */
> +   int (*pre_setup)(struct acpi_device *adev,
> +struct acpi_indirectio_host_data *pdata);
> +};
> +
> +static int acpi_translate_logicio_res(struct acpi_device *adev,
> +   struct acpi_device *host, struct resource *resource)
> +{
> +   unsigned long sys_port;
> +   struct device *dev = >dev;
> +   resource_size_t length = resource->end - resource->start;
> +
> +   sys_port = logic_pio_trans_hwaddr(>fwnode, resource->start,
> +   length);
> +
> +   if (sys_port == -1) {

Would if (sysp_port < 0) not work here?

> +   dev_err(dev, "translate bus-addr(0x%llx) fail!\n",
> +   resource->start);

That's not a very informative message.  What are users expected to do
in response to seeing it?

> +   return -EFAULT;
> +   }
> +
> +   resource->start = sys_port;
> +   resource->end = sys_port + length;
> +
> +   return 0;
> +}
> +
> +/*
> + * update/set the current I/O resource of the designated device node.
> + * after this calling, the enumeration can be started as the I/O resource
> + * had been translated to logicial I/O from bus-local I/O.
> + *
> + * @child: the device node to be updated the I/O resource;
> + * @hostdev: the device node where 'adev' is attached, which can be not
> + *  the parent of 'adev';
> + * @res: double pointer to be set to the address of the updated resources
> + * @num_res: address of the variable to contain the number of updated 
> resources
> + *
> + * return 0 when successful, negative is for failure.
> + */

The above should be a proper kerneldoc comment.

> +int acpi_indirectio_set_logicio_res(struct device *child,

Re: [PATCH v12 7/9] ACPI: Translate the I/O range of non-MMIO devices before scanning

2018-02-03 Thread Rafael J. Wysocki
On Tue, Jan 23, 2018 at 5:36 PM, John Garry  wrote:
> On some platforms(such as Hip06/Hip07), the legacy ISA/LPC devices access
> I/O with some special host-local I/O ports known on x86. As their I/O space
> are not memory mapped like PCI/PCIE MMIO host bridges, this patch is meant
> to support a new class of I/O host controllers where the local IO ports of
> the children devices are translated into the Indirect I/O address space.
>
> Through the handler attach callback, all the I/O translations are done
> before starting the enumeration on children devices and the translated
> addresses are replaced in the children resources.

The changelog is somewhat dry for a patch adding over 300 lines of new
code and a new file for that matter.

> Signed-off-by: John Garry 
> Signed-off-by: Zhichang Yuan 
> Signed-off-by: Gabriele Paoloni 
> ---
>  drivers/acpi/arm64/Makefile  |   1 +
>  drivers/acpi/arm64/acpi_indirectio.c | 273 
> +++
>  drivers/acpi/internal.h  |   5 +
>  drivers/acpi/scan.c  |   1 +
>  4 files changed, 280 insertions(+)
>  create mode 100644 drivers/acpi/arm64/acpi_indirectio.c
>
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 1017def..f4a7f46 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -1,2 +1,3 @@
>  obj-$(CONFIG_ACPI_IORT)+= iort.o
>  obj-$(CONFIG_ACPI_GTDT)+= gtdt.o
> +obj-$(CONFIG_INDIRECT_PIO) += acpi_indirectio.o
> diff --git a/drivers/acpi/arm64/acpi_indirectio.c 
> b/drivers/acpi/arm64/acpi_indirectio.c
> new file mode 100644
> index 000..2649f57
> --- /dev/null
> +++ b/drivers/acpi/arm64/acpi_indirectio.c
> @@ -0,0 +1,273 @@

SPDX license identifier here?

> +/*
> + * ACPI support for indirect-IO bus.
> + *
> + * Copyright (C) 2017 HiSilicon Limited, All Rights Reserved.
> + * Author: Gabriele Paoloni 
> + * Author: Zhichang Yuan 
> + * Author: John Garry 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.

And then you can skip the above.

Also I would like to see some description of what's there in this file
to appear here.

> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +ACPI_MODULE_NAME("indirect IO");
> +
> +#define ACPI_INDIRECTIO_NAME_LENGTH 255
> +
> +#define INDIRECT_IO_INFO(desc) ((unsigned long))
> +
> +struct acpi_indirectio_mfd_cell {
> +   struct mfd_cell_acpi_match acpi_match;
> +   char name[ACPI_INDIRECTIO_NAME_LENGTH];
> +   char pnpid[ACPI_INDIRECTIO_NAME_LENGTH];
> +};
> +
> +struct acpi_indirectio_host_data {
> +   resource_size_t io_size;
> +   resource_size_t io_start;
> +};
> +
> +struct acpi_indirectio_device_desc {

Why don't you use a consistent naming convention and call this
acpi_indirect_io_device_desc (and analogously everywhere above and
below)?

> +   struct acpi_indirectio_host_data pdata; /* device relevant info data 
> */
> +   int (*pre_setup)(struct acpi_device *adev,
> +struct acpi_indirectio_host_data *pdata);
> +};
> +
> +static int acpi_translate_logicio_res(struct acpi_device *adev,
> +   struct acpi_device *host, struct resource *resource)
> +{
> +   unsigned long sys_port;
> +   struct device *dev = >dev;
> +   resource_size_t length = resource->end - resource->start;
> +
> +   sys_port = logic_pio_trans_hwaddr(>fwnode, resource->start,
> +   length);
> +
> +   if (sys_port == -1) {

Would if (sysp_port < 0) not work here?

> +   dev_err(dev, "translate bus-addr(0x%llx) fail!\n",
> +   resource->start);

That's not a very informative message.  What are users expected to do
in response to seeing it?

> +   return -EFAULT;
> +   }
> +
> +   resource->start = sys_port;
> +   resource->end = sys_port + length;
> +
> +   return 0;
> +}
> +
> +/*
> + * update/set the current I/O resource of the designated device node.
> + * after this calling, the enumeration can be started as the I/O resource
> + * had been translated to logicial I/O from bus-local I/O.
> + *
> + * @child: the device node to be updated the I/O resource;
> + * @hostdev: the device node where 'adev' is attached, which can be not
> + *  the parent of 'adev';
> + * @res: double pointer to be set to the address of the updated resources
> + * @num_res: address of the variable to contain the number of updated 
> resources
> + *
> + * return 0 when successful, negative is for failure.
> + */

The above should be a proper kerneldoc comment.

> +int acpi_indirectio_set_logicio_res(struct device *child,
> +struct device *hostdev,
> +const struct resource **res,
> +

[no subject]

2018-02-03 Thread Jones
This is in regards to an inheritance on your surname, reply back using your 
email address, stating your full name for more details. Reply to email for 
info. Email me here ( ger...@dr.com )


[no subject]

2018-02-03 Thread Jones
This is in regards to an inheritance on your surname, reply back using your 
email address, stating your full name for more details. Reply to email for 
info. Email me here ( ger...@dr.com )


[PATCH] mm/migrate: Rename various page allocation helper functions

2018-02-03 Thread Anshuman Khandual
Allocation helper functions for migrate_pages() remmain scattered with
similar names making them really confusing. Rename these functions based
on type of the intended migration. Function alloc_misplaced_dst_page()
remains unchanged as its highly specialized. The renamed functions are
listed below. Functionality of migration remains unchanged.

1. alloc_migrate_target -> new_page_alloc
2. new_node_page -> new_page_alloc_othernode
3. new_page -> new_page_alloc_keepnode
4. alloc_new_node_page -> new_page_alloc_node
5. new_page -> new_page_alloc_mempolicy

Signed-off-by: Anshuman Khandual 
---
- Just renamed these function as suggested
- Previous RFC discussions (https://patchwork.kernel.org/patch/10191331/)

 include/linux/page-isolation.h |  2 +-
 mm/internal.h  |  2 +-
 mm/memory-failure.c| 11 ++-
 mm/memory_hotplug.c|  5 +++--
 mm/mempolicy.c | 15 +--
 mm/migrate.c   |  2 +-
 mm/page_alloc.c|  2 +-
 mm/page_isolation.c|  2 +-
 8 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 4ae347cbc36d..2e77a88a37fc 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -63,6 +63,6 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned 
long end_pfn,
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
bool skip_hwpoisoned_pages);
 
-struct page *alloc_migrate_target(struct page *page, unsigned long private);
+struct page *new_page_alloc(struct page *page, unsigned long private);
 
 #endif
diff --git a/mm/internal.h b/mm/internal.h
index 62d8c34e63d5..ef03f0eed209 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -540,5 +540,5 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 }
 
 void setup_zone_pageset(struct zone *zone);
-extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
+extern struct page *new_page_alloc_node(struct page *page, unsigned long node);
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 264e020ef60c..30789042e3cd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1499,7 +1499,8 @@ int unpoison_memory(unsigned long pfn)
 }
 EXPORT_SYMBOL(unpoison_memory);
 
-static struct page *new_page(struct page *p, unsigned long private)
+static struct page *new_page_alloc_keepnode(struct page *p,
+   unsigned long private)
 {
int nid = page_to_nid(p);
 
@@ -1600,8 +1601,8 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
return -EBUSY;
}
 
-   ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL,
-   MIGRATE_SYNC, MR_MEMORY_FAILURE);
+   ret = migrate_pages(, new_page_alloc_keepnode, NULL,
+   MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
pr_info("soft offline: %#lx: hugepage migration failed %d, type 
%lx (%pGp)\n",
pfn, ret, page->flags, >flags);
@@ -1678,8 +1679,8 @@ static int __soft_offline_page(struct page *page, int 
flags)
inc_node_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
list_add(>lru, );
-   ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL,
-   MIGRATE_SYNC, MR_MEMORY_FAILURE);
+   ret = migrate_pages(, new_page_alloc_keepnode, NULL,
+   MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
if (!list_empty())
putback_movable_pages();
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a9bee33ffa7..f1dc28f5057e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1341,7 +1341,8 @@ static unsigned long scan_movable_pages(unsigned long 
start, unsigned long end)
return 0;
 }
 
-static struct page *new_node_page(struct page *page, unsigned long private)
+static struct page *new_page_alloc_othernode(struct page *page,
+   unsigned long private)
 {
int nid = page_to_nid(page);
nodemask_t nmask = node_states[N_MEMORY];
@@ -1428,7 +1429,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
end_pfn)
}
 
/* Allocate a new page from the nearest neighbor node */
-   ret = migrate_pages(, new_node_page, NULL, 0,
+   ret = migrate_pages(, new_page_alloc_othernode, NULL, 0,
MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
if (ret)
putback_movable_pages();
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 

[PATCH] mm/migrate: Rename various page allocation helper functions

2018-02-03 Thread Anshuman Khandual
Allocation helper functions for migrate_pages() remmain scattered with
similar names making them really confusing. Rename these functions based
on type of the intended migration. Function alloc_misplaced_dst_page()
remains unchanged as its highly specialized. The renamed functions are
listed below. Functionality of migration remains unchanged.

1. alloc_migrate_target -> new_page_alloc
2. new_node_page -> new_page_alloc_othernode
3. new_page -> new_page_alloc_keepnode
4. alloc_new_node_page -> new_page_alloc_node
5. new_page -> new_page_alloc_mempolicy

Signed-off-by: Anshuman Khandual 
---
- Just renamed these function as suggested
- Previous RFC discussions (https://patchwork.kernel.org/patch/10191331/)

 include/linux/page-isolation.h |  2 +-
 mm/internal.h  |  2 +-
 mm/memory-failure.c| 11 ++-
 mm/memory_hotplug.c|  5 +++--
 mm/mempolicy.c | 15 +--
 mm/migrate.c   |  2 +-
 mm/page_alloc.c|  2 +-
 mm/page_isolation.c|  2 +-
 8 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 4ae347cbc36d..2e77a88a37fc 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -63,6 +63,6 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned 
long end_pfn,
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
bool skip_hwpoisoned_pages);
 
-struct page *alloc_migrate_target(struct page *page, unsigned long private);
+struct page *new_page_alloc(struct page *page, unsigned long private);
 
 #endif
diff --git a/mm/internal.h b/mm/internal.h
index 62d8c34e63d5..ef03f0eed209 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -540,5 +540,5 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 }
 
 void setup_zone_pageset(struct zone *zone);
-extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
+extern struct page *new_page_alloc_node(struct page *page, unsigned long node);
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 264e020ef60c..30789042e3cd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1499,7 +1499,8 @@ int unpoison_memory(unsigned long pfn)
 }
 EXPORT_SYMBOL(unpoison_memory);
 
-static struct page *new_page(struct page *p, unsigned long private)
+static struct page *new_page_alloc_keepnode(struct page *p,
+   unsigned long private)
 {
int nid = page_to_nid(p);
 
@@ -1600,8 +1601,8 @@ static int soft_offline_huge_page(struct page *page, int 
flags)
return -EBUSY;
}
 
-   ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL,
-   MIGRATE_SYNC, MR_MEMORY_FAILURE);
+   ret = migrate_pages(, new_page_alloc_keepnode, NULL,
+   MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
pr_info("soft offline: %#lx: hugepage migration failed %d, type 
%lx (%pGp)\n",
pfn, ret, page->flags, >flags);
@@ -1678,8 +1679,8 @@ static int __soft_offline_page(struct page *page, int 
flags)
inc_node_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
list_add(>lru, );
-   ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL,
-   MIGRATE_SYNC, MR_MEMORY_FAILURE);
+   ret = migrate_pages(, new_page_alloc_keepnode, NULL,
+   MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (ret) {
if (!list_empty())
putback_movable_pages();
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6a9bee33ffa7..f1dc28f5057e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1341,7 +1341,8 @@ static unsigned long scan_movable_pages(unsigned long 
start, unsigned long end)
return 0;
 }
 
-static struct page *new_node_page(struct page *page, unsigned long private)
+static struct page *new_page_alloc_othernode(struct page *page,
+   unsigned long private)
 {
int nid = page_to_nid(page);
nodemask_t nmask = node_states[N_MEMORY];
@@ -1428,7 +1429,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
end_pfn)
}
 
/* Allocate a new page from the nearest neighbor node */
-   ret = migrate_pages(, new_node_page, NULL, 0,
+   ret = migrate_pages(, new_page_alloc_othernode, NULL, 0,
MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
if (ret)
putback_movable_pages();
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a8b7d59002e8..fd3fd1de9b3d 100644
--- a/mm/mempolicy.c

Re: [PATCH 4.14 103/156] scripts/faddr2line: extend usage on generic arch

2018-02-03 Thread Liu, Changcheng
Hi Greg Kroah-Hartman,
Below commit is needed to resolve issue in this patch:
Upstream commit 4cc90b4cc3d4955f79eae4f7f9d64e67e17b468e

B.R.
Changcheng

On 17:58 Fri 02 Feb, Greg Kroah-Hartman wrote:
> 4.14-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: "Liu, Changcheng" 
> 
> 
> [ Upstream commit 95a87982541932503d3f59aba4c30b0bde0a6294 ]
> 
> When cross-compiling, fadd2line should use the binary tool used for the
> target system, rather than that of the host.
> 
> Link: http://lkml.kernel.org/r/20171121092911.GA150711@sofia
> Signed-off-by: Liu Changcheng 
> Cc: Kate Stewart 
> Cc: NeilBrown 
> Cc: Thomas Gleixner 
> Cc: Greg Kroah-Hartman 
> Signed-off-by: Andrew Morton 
> Signed-off-by: Linus Torvalds 
> Signed-off-by: Sasha Levin 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  scripts/faddr2line |   21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> --- a/scripts/faddr2line
> +++ b/scripts/faddr2line
> @@ -44,9 +44,16 @@
>  set -o errexit
>  set -o nounset
>  
> +READELF="${CROSS_COMPILE}readelf"
> +ADDR2LINE="${CROSS_COMPILE}addr2line"
> +SIZE="${CROSS_COMPILE}size"
> +NM="${CROSS_COMPILE}nm"
> +
>  command -v awk >/dev/null 2>&1 || die "awk isn't installed"
> -command -v readelf >/dev/null 2>&1 || die "readelf isn't installed"
> -command -v addr2line >/dev/null 2>&1 || die "addr2line isn't installed"
> +command -v ${READELF} >/dev/null 2>&1 || die "readelf isn't installed"
> +command -v ${ADDR2LINE} >/dev/null 2>&1 || die "addr2line isn't installed"
> +command -v ${SIZE} >/dev/null 2>&1 || die "size isn't installed"
> +command -v ${NM} >/dev/null 2>&1 || die "nm isn't installed"
>  
>  usage() {
>   echo "usage: faddr2line   ..." 
> >&2
> @@ -69,10 +76,10 @@ die() {
>  find_dir_prefix() {
>   local objfile=$1
>  
> - local start_kernel_addr=$(readelf -sW $objfile | awk '$8 == 
> "start_kernel" {printf "0x%s", $2}')
> + local start_kernel_addr=$(${READELF} -sW $objfile | awk '$8 == 
> "start_kernel" {printf "0x%s", $2}')
>   [[ -z $start_kernel_addr ]] && return
>  
> - local file_line=$(addr2line -e $objfile $start_kernel_addr)
> + local file_line=$(${ADDR2LINE} -e $objfile $start_kernel_addr)
>   [[ -z $file_line ]] && return
>  
>   local prefix=${file_line%init/main.c:*}
> @@ -104,7 +111,7 @@ __faddr2line() {
>  
>   # Go through each of the object's symbols which match the func name.
>   # In rare cases there might be duplicates.
> - file_end=$(size -Ax $objfile | awk '$1 == ".text" {print $2}')
> + file_end=$(${SIZE} -Ax $objfile | awk '$1 == ".text" {print $2}')
>   while read symbol; do
>   local fields=($symbol)
>   local sym_base=0x${fields[0]}
> @@ -156,10 +163,10 @@ __faddr2line() {
>  
>   # pass real address to addr2line
>   echo "$func+$offset/$sym_size:"
> - addr2line -fpie $objfile $addr | sed "s; $dir_prefix\(\./\)*; ;"
> + ${ADDR2LINE} -fpie $objfile $addr | sed "s; 
> $dir_prefix\(\./\)*; ;"
>   DONE=1
>  
> - done < <(nm -n $objfile | awk -v fn=$func -v end=$file_end '$3 == fn { 
> found=1; line=$0; start=$1; next } found == 1 { found=0; print line, "0x"$1 } 
> END {if (found == 1) print line, end; }')
> + done < <(${NM} -n $objfile | awk -v fn=$func -v end=$file_end '$3 == fn 
> { found=1; line=$0; start=$1; next } found == 1 { found=0; print line, "0x"$1 
> } END {if (found == 1) print line, end; }')
>  }
>  
>  [[ $# -lt 2 ]] && usage
> 
> 


Re: [PATCH 4.14 103/156] scripts/faddr2line: extend usage on generic arch

2018-02-03 Thread Liu, Changcheng
Hi Greg Kroah-Hartman,
Below commit is needed to resolve issue in this patch:
Upstream commit 4cc90b4cc3d4955f79eae4f7f9d64e67e17b468e

B.R.
Changcheng

On 17:58 Fri 02 Feb, Greg Kroah-Hartman wrote:
> 4.14-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> From: "Liu, Changcheng" 
> 
> 
> [ Upstream commit 95a87982541932503d3f59aba4c30b0bde0a6294 ]
> 
> When cross-compiling, fadd2line should use the binary tool used for the
> target system, rather than that of the host.
> 
> Link: http://lkml.kernel.org/r/20171121092911.GA150711@sofia
> Signed-off-by: Liu Changcheng 
> Cc: Kate Stewart 
> Cc: NeilBrown 
> Cc: Thomas Gleixner 
> Cc: Greg Kroah-Hartman 
> Signed-off-by: Andrew Morton 
> Signed-off-by: Linus Torvalds 
> Signed-off-by: Sasha Levin 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  scripts/faddr2line |   21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> --- a/scripts/faddr2line
> +++ b/scripts/faddr2line
> @@ -44,9 +44,16 @@
>  set -o errexit
>  set -o nounset
>  
> +READELF="${CROSS_COMPILE}readelf"
> +ADDR2LINE="${CROSS_COMPILE}addr2line"
> +SIZE="${CROSS_COMPILE}size"
> +NM="${CROSS_COMPILE}nm"
> +
>  command -v awk >/dev/null 2>&1 || die "awk isn't installed"
> -command -v readelf >/dev/null 2>&1 || die "readelf isn't installed"
> -command -v addr2line >/dev/null 2>&1 || die "addr2line isn't installed"
> +command -v ${READELF} >/dev/null 2>&1 || die "readelf isn't installed"
> +command -v ${ADDR2LINE} >/dev/null 2>&1 || die "addr2line isn't installed"
> +command -v ${SIZE} >/dev/null 2>&1 || die "size isn't installed"
> +command -v ${NM} >/dev/null 2>&1 || die "nm isn't installed"
>  
>  usage() {
>   echo "usage: faddr2line   ..." 
> >&2
> @@ -69,10 +76,10 @@ die() {
>  find_dir_prefix() {
>   local objfile=$1
>  
> - local start_kernel_addr=$(readelf -sW $objfile | awk '$8 == 
> "start_kernel" {printf "0x%s", $2}')
> + local start_kernel_addr=$(${READELF} -sW $objfile | awk '$8 == 
> "start_kernel" {printf "0x%s", $2}')
>   [[ -z $start_kernel_addr ]] && return
>  
> - local file_line=$(addr2line -e $objfile $start_kernel_addr)
> + local file_line=$(${ADDR2LINE} -e $objfile $start_kernel_addr)
>   [[ -z $file_line ]] && return
>  
>   local prefix=${file_line%init/main.c:*}
> @@ -104,7 +111,7 @@ __faddr2line() {
>  
>   # Go through each of the object's symbols which match the func name.
>   # In rare cases there might be duplicates.
> - file_end=$(size -Ax $objfile | awk '$1 == ".text" {print $2}')
> + file_end=$(${SIZE} -Ax $objfile | awk '$1 == ".text" {print $2}')
>   while read symbol; do
>   local fields=($symbol)
>   local sym_base=0x${fields[0]}
> @@ -156,10 +163,10 @@ __faddr2line() {
>  
>   # pass real address to addr2line
>   echo "$func+$offset/$sym_size:"
> - addr2line -fpie $objfile $addr | sed "s; $dir_prefix\(\./\)*; ;"
> + ${ADDR2LINE} -fpie $objfile $addr | sed "s; 
> $dir_prefix\(\./\)*; ;"
>   DONE=1
>  
> - done < <(nm -n $objfile | awk -v fn=$func -v end=$file_end '$3 == fn { 
> found=1; line=$0; start=$1; next } found == 1 { found=0; print line, "0x"$1 } 
> END {if (found == 1) print line, end; }')
> + done < <(${NM} -n $objfile | awk -v fn=$func -v end=$file_end '$3 == fn 
> { found=1; line=$0; start=$1; next } found == 1 { found=0; print line, "0x"$1 
> } END {if (found == 1) print line, end; }')
>  }
>  
>  [[ $# -lt 2 ]] && usage
> 
> 


[PATCH] x86/build: Add arch/x86/tools/insn_decoder_test to gitignore

2018-02-03 Thread Progyan Bhattacharya
The file was generated by make command and should not be in source tree

Signed-off-by: Progyan Bhattacharya 
---
 arch/x86/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/.gitignore b/arch/x86/.gitignore
index aff152c87cf4..5a82bac5e0bc 100644
--- a/arch/x86/.gitignore
+++ b/arch/x86/.gitignore
@@ -1,6 +1,7 @@
 boot/compressed/vmlinux
 tools/test_get_len
 tools/insn_sanity
+tools/insn_decoder_test
 purgatory/kexec-purgatory.c
 purgatory/purgatory.ro
 
-- 
2.15.1




[PATCH] x86/build: Add arch/x86/tools/insn_decoder_test to gitignore

2018-02-03 Thread Progyan Bhattacharya
The file was generated by make command and should not be in source tree

Signed-off-by: Progyan Bhattacharya 
---
 arch/x86/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/.gitignore b/arch/x86/.gitignore
index aff152c87cf4..5a82bac5e0bc 100644
--- a/arch/x86/.gitignore
+++ b/arch/x86/.gitignore
@@ -1,6 +1,7 @@
 boot/compressed/vmlinux
 tools/test_get_len
 tools/insn_sanity
+tools/insn_decoder_test
 purgatory/kexec-purgatory.c
 purgatory/purgatory.ro
 
-- 
2.15.1




[PATCH 2/2] kernel: compat: fixed a trailing whitespaces code style issue

2018-02-03 Thread devesh . pradhan1
From: devesh pradhan 

Fixed a coding style issue.

Signed-off-by: devesh pradhan 
---
 kernel/compat.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/compat.c b/kernel/compat.c
index d1cee656a7ed..d40c83792ae9 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -475,10 +475,14 @@ get_compat_sigset(sigset_t *set, const compat_sigset_t 
__user *compat)
if (copy_from_user(, compat, sizeof(compat_sigset_t)))
return -EFAULT;
switch (_NSIG_WORDS) {
-   case 4: set->sig[3] = v.sig[6] | (((long)v.sig[7]) << 32 );
-   case 3: set->sig[2] = v.sig[4] | (((long)v.sig[5]) << 32 );
-   case 2: set->sig[1] = v.sig[2] | (((long)v.sig[3]) << 32 );
-   case 1: set->sig[0] = v.sig[0] | (((long)v.sig[1]) << 32 );
+   case 4:
+   set->sig[3] = v.sig[6] | (((long)v.sig[7]) << 32);
+   case 3:
+   set->sig[2] = v.sig[4] | (((long)v.sig[5]) << 32);
+   case 2:
+   set->sig[1] = v.sig[2] | (((long)v.sig[3]) << 32);
+   case 1:
+   set->sig[0] = v.sig[0] | (((long)v.sig[1]) << 32);
}
 #else
if (copy_from_user(set, compat, sizeof(compat_sigset_t)))
@@ -496,10 +500,14 @@ put_compat_sigset(compat_sigset_t __user *compat, const 
sigset_t *set,
 #ifdef __BIG_ENDIAN
compat_sigset_t v;
switch (_NSIG_WORDS) {
-   case 4: v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3];
-   case 3: v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2];
-   case 2: v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1];
-   case 1: v.sig[1] = (set->sig[0] >> 32); v.sig[0] = set->sig[0];
+   case 4:
+   v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3];
+   case 3:
+   v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2];
+   case 2:
+   v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1];
+   case 1:
+   v.sig[1] = (set->sig[0] >> 32); v.sig[0] = set->sig[0];
}
return copy_to_user(compat, , size) ? -EFAULT : 0;
 #else
-- 
2.14.1



[PATCH 2/2] kernel: compat: fixed a trailing whitespaces code style issue

2018-02-03 Thread devesh . pradhan1
From: devesh pradhan 

Fixed a coding style issue.

Signed-off-by: devesh pradhan 
---
 kernel/compat.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/kernel/compat.c b/kernel/compat.c
index d1cee656a7ed..d40c83792ae9 100644
--- a/kernel/compat.c
+++ b/kernel/compat.c
@@ -475,10 +475,14 @@ get_compat_sigset(sigset_t *set, const compat_sigset_t 
__user *compat)
if (copy_from_user(, compat, sizeof(compat_sigset_t)))
return -EFAULT;
switch (_NSIG_WORDS) {
-   case 4: set->sig[3] = v.sig[6] | (((long)v.sig[7]) << 32 );
-   case 3: set->sig[2] = v.sig[4] | (((long)v.sig[5]) << 32 );
-   case 2: set->sig[1] = v.sig[2] | (((long)v.sig[3]) << 32 );
-   case 1: set->sig[0] = v.sig[0] | (((long)v.sig[1]) << 32 );
+   case 4:
+   set->sig[3] = v.sig[6] | (((long)v.sig[7]) << 32);
+   case 3:
+   set->sig[2] = v.sig[4] | (((long)v.sig[5]) << 32);
+   case 2:
+   set->sig[1] = v.sig[2] | (((long)v.sig[3]) << 32);
+   case 1:
+   set->sig[0] = v.sig[0] | (((long)v.sig[1]) << 32);
}
 #else
if (copy_from_user(set, compat, sizeof(compat_sigset_t)))
@@ -496,10 +500,14 @@ put_compat_sigset(compat_sigset_t __user *compat, const 
sigset_t *set,
 #ifdef __BIG_ENDIAN
compat_sigset_t v;
switch (_NSIG_WORDS) {
-   case 4: v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3];
-   case 3: v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2];
-   case 2: v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1];
-   case 1: v.sig[1] = (set->sig[0] >> 32); v.sig[0] = set->sig[0];
+   case 4:
+   v.sig[7] = (set->sig[3] >> 32); v.sig[6] = set->sig[3];
+   case 3:
+   v.sig[5] = (set->sig[2] >> 32); v.sig[4] = set->sig[2];
+   case 2:
+   v.sig[3] = (set->sig[1] >> 32); v.sig[2] = set->sig[1];
+   case 1:
+   v.sig[1] = (set->sig[0] >> 32); v.sig[0] = set->sig[0];
}
return copy_to_user(compat, , size) ? -EFAULT : 0;
 #else
-- 
2.14.1



[PATCH] libata: fix length validation of ATAPI-relayed SCSI commands

2018-02-03 Thread Eric Biggers
From: Eric Biggers 

syzkaller reported a crash in ata_bmdma_fill_sg() when writing to
/dev/sg1.  The immediate cause was that the ATA command's scatterlist
was not DMA-mapped, which causes 'pi - 1' to underflow, resulting in a
write to 'qc->ap->bmdma_prd[0x]'.

Strangely though, the flag ATA_QCFLAG_DMAMAP was set in qc->flags.  The
root cause is that when __ata_scsi_queuecmd() is preparing to relay a
SCSI command to an ATAPI device, it doesn't correctly validate the CDB
length before copying it into the 16-byte buffer 'cdb' in 'struct
ata_queued_cmd'.  Namely, it validates the fixed CDB length expected
based on the SCSI opcode but not the actual CDB length, which can be
larger due to the use of the SG_NEXT_CMD_LEN ioctl.  Since 'flags' is
the next member in ata_queued_cmd, a buffer overflow corrupts it.

Fix it by requiring that the actual CDB length be <= 16 (ATAPI_CDB_LEN).

[Really it seems the length should be required to be <= dev->cdb_len,
but the current behavior seems to have been intentionally introduced by
commit 607126c2a21c ("libata-scsi: be tolerant of 12-byte ATAPI commands
in 16-byte CDBs") to work around a userspace bug in mplayer.  Probably
the workaround is no longer needed (mplayer was fixed in 2007), but
continuing to allow lengths to up 16 appears harmless for now.]

Here's a reproducer that works in QEMU when /dev/sg1 refers to the
CD-ROM drive that qemu-system-x86_64 creates by default:

#include 
#include 
#include 

#define SG_NEXT_CMD_LEN 0x2283

int main()
{
char buf[53] = { [36] = 0x7e, [52] = 0x02 };
int fd = open("/dev/sg1", O_RDWR);
ioctl(fd, SG_NEXT_CMD_LEN, &(int){ 17 });
write(fd, buf, sizeof(buf));
}

The crash was:

BUG: unable to handle kernel paging request at 8cb97db37ffc
IP: ata_bmdma_fill_sg drivers/ata/libata-sff.c:2623 [inline]
IP: ata_bmdma_qc_prep+0xa4/0xc0 drivers/ata/libata-sff.c:2727
PGD fb6c067 P4D fb6c067 PUD 0
Oops: 0002 [#1] SMP
CPU: 1 PID: 150 Comm: syz_ata_bmdma_q Not tainted 4.15.0-next-20180202 #99
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.0-20171110_100015-anatol 04/01/2014
[...]
Call Trace:
 ata_qc_issue+0x100/0x1d0 drivers/ata/libata-core.c:5421
 ata_scsi_translate+0xc9/0x1a0 drivers/ata/libata-scsi.c:2024
 __ata_scsi_queuecmd drivers/ata/libata-scsi.c:4326 [inline]
 ata_scsi_queuecmd+0x8c/0x210 drivers/ata/libata-scsi.c:4375
 scsi_dispatch_cmd+0xa2/0xe0 drivers/scsi/scsi_lib.c:1727
 scsi_request_fn+0x24c/0x530 drivers/scsi/scsi_lib.c:1865
 __blk_run_queue_uncond block/blk-core.c:412 [inline]
 __blk_run_queue+0x3a/0x60 block/blk-core.c:432
 blk_execute_rq_nowait+0x93/0xc0 block/blk-exec.c:78
 sg_common_write.isra.7+0x272/0x5a0 drivers/scsi/sg.c:806
 sg_write+0x1ef/0x340 drivers/scsi/sg.c:677
 __vfs_write+0x31/0x160 fs/read_write.c:480
 vfs_write+0xa7/0x160 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0x4d/0xc0 fs/read_write.c:581
 do_syscall_64+0x5e/0x110 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x21/0x86

Fixes: 607126c2a21c ("libata-scsi: be tolerant of 12-byte ATAPI commands in 
16-byte CDBs")
Reported-by: 
syzbot+1ff6f9fcc3c35f1c72a95e26528c8e7e3276e...@syzkaller.appspotmail.com
Cc:  # v2.6.24+
Signed-off-by: Eric Biggers 
---
 drivers/ata/libata-scsi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 66be961c93a4e..47d421666451c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4309,7 +4309,9 @@ static inline int __ata_scsi_queuecmd(struct scsi_cmnd 
*scmd,
if (likely((scsi_op != ATA_16) || !atapi_passthru16)) {
/* relay SCSI command to ATAPI device */
int len = COMMAND_SIZE(scsi_op);
-   if (unlikely(len > scmd->cmd_len || len > dev->cdb_len))
+   if (unlikely(len > scmd->cmd_len ||
+len > dev->cdb_len ||
+scmd->cmd_len > ATAPI_CDB_LEN))
goto bad_cdb_len;
 
xlat_func = atapi_xlat;
-- 
2.16.1



[PATCH] libata: fix length validation of ATAPI-relayed SCSI commands

2018-02-03 Thread Eric Biggers
From: Eric Biggers 

syzkaller reported a crash in ata_bmdma_fill_sg() when writing to
/dev/sg1.  The immediate cause was that the ATA command's scatterlist
was not DMA-mapped, which causes 'pi - 1' to underflow, resulting in a
write to 'qc->ap->bmdma_prd[0x]'.

Strangely though, the flag ATA_QCFLAG_DMAMAP was set in qc->flags.  The
root cause is that when __ata_scsi_queuecmd() is preparing to relay a
SCSI command to an ATAPI device, it doesn't correctly validate the CDB
length before copying it into the 16-byte buffer 'cdb' in 'struct
ata_queued_cmd'.  Namely, it validates the fixed CDB length expected
based on the SCSI opcode but not the actual CDB length, which can be
larger due to the use of the SG_NEXT_CMD_LEN ioctl.  Since 'flags' is
the next member in ata_queued_cmd, a buffer overflow corrupts it.

Fix it by requiring that the actual CDB length be <= 16 (ATAPI_CDB_LEN).

[Really it seems the length should be required to be <= dev->cdb_len,
but the current behavior seems to have been intentionally introduced by
commit 607126c2a21c ("libata-scsi: be tolerant of 12-byte ATAPI commands
in 16-byte CDBs") to work around a userspace bug in mplayer.  Probably
the workaround is no longer needed (mplayer was fixed in 2007), but
continuing to allow lengths to up 16 appears harmless for now.]

Here's a reproducer that works in QEMU when /dev/sg1 refers to the
CD-ROM drive that qemu-system-x86_64 creates by default:

#include 
#include 
#include 

#define SG_NEXT_CMD_LEN 0x2283

int main()
{
char buf[53] = { [36] = 0x7e, [52] = 0x02 };
int fd = open("/dev/sg1", O_RDWR);
ioctl(fd, SG_NEXT_CMD_LEN, &(int){ 17 });
write(fd, buf, sizeof(buf));
}

The crash was:

BUG: unable to handle kernel paging request at 8cb97db37ffc
IP: ata_bmdma_fill_sg drivers/ata/libata-sff.c:2623 [inline]
IP: ata_bmdma_qc_prep+0xa4/0xc0 drivers/ata/libata-sff.c:2727
PGD fb6c067 P4D fb6c067 PUD 0
Oops: 0002 [#1] SMP
CPU: 1 PID: 150 Comm: syz_ata_bmdma_q Not tainted 4.15.0-next-20180202 #99
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.0-20171110_100015-anatol 04/01/2014
[...]
Call Trace:
 ata_qc_issue+0x100/0x1d0 drivers/ata/libata-core.c:5421
 ata_scsi_translate+0xc9/0x1a0 drivers/ata/libata-scsi.c:2024
 __ata_scsi_queuecmd drivers/ata/libata-scsi.c:4326 [inline]
 ata_scsi_queuecmd+0x8c/0x210 drivers/ata/libata-scsi.c:4375
 scsi_dispatch_cmd+0xa2/0xe0 drivers/scsi/scsi_lib.c:1727
 scsi_request_fn+0x24c/0x530 drivers/scsi/scsi_lib.c:1865
 __blk_run_queue_uncond block/blk-core.c:412 [inline]
 __blk_run_queue+0x3a/0x60 block/blk-core.c:432
 blk_execute_rq_nowait+0x93/0xc0 block/blk-exec.c:78
 sg_common_write.isra.7+0x272/0x5a0 drivers/scsi/sg.c:806
 sg_write+0x1ef/0x340 drivers/scsi/sg.c:677
 __vfs_write+0x31/0x160 fs/read_write.c:480
 vfs_write+0xa7/0x160 fs/read_write.c:544
 SYSC_write fs/read_write.c:589 [inline]
 SyS_write+0x4d/0xc0 fs/read_write.c:581
 do_syscall_64+0x5e/0x110 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x21/0x86

Fixes: 607126c2a21c ("libata-scsi: be tolerant of 12-byte ATAPI commands in 
16-byte CDBs")
Reported-by: 
syzbot+1ff6f9fcc3c35f1c72a95e26528c8e7e3276e...@syzkaller.appspotmail.com
Cc:  # v2.6.24+
Signed-off-by: Eric Biggers 
---
 drivers/ata/libata-scsi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 66be961c93a4e..47d421666451c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4309,7 +4309,9 @@ static inline int __ata_scsi_queuecmd(struct scsi_cmnd 
*scmd,
if (likely((scsi_op != ATA_16) || !atapi_passthru16)) {
/* relay SCSI command to ATAPI device */
int len = COMMAND_SIZE(scsi_op);
-   if (unlikely(len > scmd->cmd_len || len > dev->cdb_len))
+   if (unlikely(len > scmd->cmd_len ||
+len > dev->cdb_len ||
+scmd->cmd_len > ATAPI_CDB_LEN))
goto bad_cdb_len;
 
xlat_func = atapi_xlat;
-- 
2.16.1



Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

2018-02-03 Thread Masami Hiramatsu
On Sat, 3 Feb 2018 10:27:59 -0500
Steven Rostedt  wrote:

> On Sat, 3 Feb 2018 22:38:17 +0900
> Masami Hiramatsu  wrote:
> 
> > This seems very similar feature of what kprobe-based event does.
> 
> It is similar, but not the same as kprobes. It only focuses on
> functions and their arguments, and should not require any disassembling
> (no knowledge of regs required). Any need to see anything within the
> function will still require kprobe based help.

Yes, I see that point.

> > Earlier version of kprobe-based event supported Nth argument, but I decided
> > to drop it because we can not ensure the "function signature" from kernel
> > binary. It has been passed to "perf probe", so that we can define line-level
> > granularity. 
> 
> Sure, and if we get a wrong function, which can happen, the code is set
> up not to break anything. You only get garbage output.
> 
> > 
> > Of course if it is easy to support "argN" again as it does if the arch's
> > calling convention is clearly stated. (and now kprobe is based on ftrace
> > if it is on the entry of functions)
> > 
> > So my question is, what is the difference of those from the user 
> > perspective?
> > Only event syntax is a problem?
> > I'm very confusing...
> 
> Again, this is not a kprobe replacement. It is somewhat of a syntax
> issue. I want this to be very simple and not need a disassembler. For
> indexing of structures one may use gdb, but that's about it. You could
> get the same info from counting what's in a structure itself.

OK, so it is a simpler version of function event...

> I based some of the code from kprobes too. But I wanted this to be
> simpler, and as such, not as powerful as kprobes. More of a "poor mans"
> kprobe ;-) Where you are limited to functions and their arguments. If
> you need more power, switch to kprobes. In other words, its just an
> added stepping stone.
> 
> Also, this should work without kprobe support, only ftrace, and function
> args from the arch.

Hmm, but implementation seems very far from current probe events, we need
to consider how to unify it. Anyway, it is a very good time to do, because
I found current probe-event fetch method is not good with retpoline/IBRS,
it is full of indirect call.

I would like to convert it to eBPF if possible. It will be good for the
performance with JIT, and we can collaborate on the same code with BPF
people.

Thank you,

-- 
Masami Hiramatsu 


Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

2018-02-03 Thread Masami Hiramatsu
On Sat, 3 Feb 2018 10:27:59 -0500
Steven Rostedt  wrote:

> On Sat, 3 Feb 2018 22:38:17 +0900
> Masami Hiramatsu  wrote:
> 
> > This seems very similar feature of what kprobe-based event does.
> 
> It is similar, but not the same as kprobes. It only focuses on
> functions and their arguments, and should not require any disassembling
> (no knowledge of regs required). Any need to see anything within the
> function will still require kprobe based help.

Yes, I see that point.

> > Earlier version of kprobe-based event supported Nth argument, but I decided
> > to drop it because we can not ensure the "function signature" from kernel
> > binary. It has been passed to "perf probe", so that we can define line-level
> > granularity. 
> 
> Sure, and if we get a wrong function, which can happen, the code is set
> up not to break anything. You only get garbage output.
> 
> > 
> > Of course if it is easy to support "argN" again as it does if the arch's
> > calling convention is clearly stated. (and now kprobe is based on ftrace
> > if it is on the entry of functions)
> > 
> > So my question is, what is the difference of those from the user 
> > perspective?
> > Only event syntax is a problem?
> > I'm very confusing...
> 
> Again, this is not a kprobe replacement. It is somewhat of a syntax
> issue. I want this to be very simple and not need a disassembler. For
> indexing of structures one may use gdb, but that's about it. You could
> get the same info from counting what's in a structure itself.

OK, so it is a simpler version of function event...

> I based some of the code from kprobes too. But I wanted this to be
> simpler, and as such, not as powerful as kprobes. More of a "poor mans"
> kprobe ;-) Where you are limited to functions and their arguments. If
> you need more power, switch to kprobes. In other words, its just an
> added stepping stone.
> 
> Also, this should work without kprobe support, only ftrace, and function
> args from the arch.

Hmm, but implementation seems very far from current probe events, we need
to consider how to unify it. Anyway, it is a very good time to do, because
I found current probe-event fetch method is not good with retpoline/IBRS,
it is full of indirect call.

I would like to convert it to eBPF if possible. It will be good for the
performance with JIT, and we can collaborate on the same code with BPF
people.

Thank you,

-- 
Masami Hiramatsu 


Re: [PATCH 13/18] tracing: Add array type to function based events

2018-02-03 Thread Masami Hiramatsu
On Sat, 3 Feb 2018 10:29:03 -0500
Steven Rostedt  wrote:

> On Sat, 3 Feb 2018 22:56:15 +0900
> Masami Hiramatsu  wrote:
> 
> > This idea looks good for kprobe events too.
> > I'll try to implement same one :)
> 
> We should have the two re-use the same code.
> 
> In other words, I would like as much code sharing as possible.

Finally it will be, but currently the code base is too far.
I would like to implement it on current code for avoiding
break anything, but refactoring afterwards.

I think current fetch function implementation may be too heavy
with retpoline. I would like to reimplement it with eBPF :)

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu 


Re: [PATCH 13/18] tracing: Add array type to function based events

2018-02-03 Thread Masami Hiramatsu
On Sat, 3 Feb 2018 10:29:03 -0500
Steven Rostedt  wrote:

> On Sat, 3 Feb 2018 22:56:15 +0900
> Masami Hiramatsu  wrote:
> 
> > This idea looks good for kprobe events too.
> > I'll try to implement same one :)
> 
> We should have the two re-use the same code.
> 
> In other words, I would like as much code sharing as possible.

Finally it will be, but currently the code base is too far.
I would like to implement it on current code for avoiding
break anything, but refactoring afterwards.

I think current fetch function implementation may be too heavy
with retpoline. I would like to reimplement it with eBPF :)

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu 


[PATCH 1/2] trace-cmd: Fix the detection for swig

2018-02-03 Thread sztsian
From: Zamir SUN 

The current detection for swig will cause output to be
/usr/bin/swig
y
So this will never be equal to y. With this patch, the swig path is
removed from output, so the detection can work as expected.

Fixes 3bf187a43b7e6302592552ecbc294e5820249687

Signed-off-by: Zamir SUN (Red Hat) 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index a5d2c38..7c0d1a6 100644
--- a/Makefile
+++ b/Makefile
@@ -105,7 +105,7 @@ PYTHON_GUI  := ctracecmd.so ctracecmdgui.so
 PYTHON_VERS ?= python
 
 # Can build python?
-ifeq ($(shell sh -c "pkg-config --cflags $(PYTHON_VERS) > /dev/null 2>&1 && 
which swig && echo y"), y)
+ifeq ($(shell sh -c "pkg-config --cflags $(PYTHON_VERS) > /dev/null 2>&1 && 
which swig > /dev/null && echo y"), y)
PYTHON_PLUGINS := plugin_python.so
BUILD_PYTHON := $(PYTHON) $(PYTHON_PLUGINS)
PYTHON_SO_INSTALL := ctracecmd.install
-- 
2.14.3



[PATCH 1/2] trace-cmd: Fix the detection for swig

2018-02-03 Thread sztsian
From: Zamir SUN 

The current detection for swig will cause output to be
/usr/bin/swig
y
So this will never be equal to y. With this patch, the swig path is
removed from output, so the detection can work as expected.

Fixes 3bf187a43b7e6302592552ecbc294e5820249687

Signed-off-by: Zamir SUN (Red Hat) 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index a5d2c38..7c0d1a6 100644
--- a/Makefile
+++ b/Makefile
@@ -105,7 +105,7 @@ PYTHON_GUI  := ctracecmd.so ctracecmdgui.so
 PYTHON_VERS ?= python
 
 # Can build python?
-ifeq ($(shell sh -c "pkg-config --cflags $(PYTHON_VERS) > /dev/null 2>&1 && 
which swig && echo y"), y)
+ifeq ($(shell sh -c "pkg-config --cflags $(PYTHON_VERS) > /dev/null 2>&1 && 
which swig > /dev/null && echo y"), y)
PYTHON_PLUGINS := plugin_python.so
BUILD_PYTHON := $(PYTHON) $(PYTHON_PLUGINS)
PYTHON_SO_INSTALL := ctracecmd.install
-- 
2.14.3



[PATCH 2/2] trace-cmd: Change the way of getting python ldflags.

2018-02-03 Thread sztsian
From: Zamir SUN 

Prior than this patch, Makefile detects python ldflags using a hardcoded
python command. It will cause problems if we are building against
python3 in the future when ldflags for python2 and python3 are
different. With this patch, python ldflags are detected by
corresponding python{,3}-config which will detect the right config for
python plugins.

Signed-off-by: Zamir SUN (Red Hat) 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 7c0d1a6..f41e399 100644
--- a/Makefile
+++ b/Makefile
@@ -636,7 +636,7 @@ report_noswig: force
 
 PYTHON_INCLUDES = `pkg-config --cflags $(PYTHON_VERS)`
 PYTHON_LDFLAGS = `pkg-config --libs $(PYTHON_VERS)` \
-   $(shell python2 -c "import distutils.sysconfig; print 
distutils.sysconfig.get_config_var('LINKFORSHARED')")
+   $(shell $(PYTHON_VERS)-config --ldflags)
 PYGTK_CFLAGS = `pkg-config --cflags pygtk-2.0`
 
 ctracecmd.so: $(TCMD_LIB_OBJS) ctracecmd.i
-- 
2.14.3



[PATCH 2/2] trace-cmd: Change the way of getting python ldflags.

2018-02-03 Thread sztsian
From: Zamir SUN 

Prior than this patch, Makefile detects python ldflags using a hardcoded
python command. It will cause problems if we are building against
python3 in the future when ldflags for python2 and python3 are
different. With this patch, python ldflags are detected by
corresponding python{,3}-config which will detect the right config for
python plugins.

Signed-off-by: Zamir SUN (Red Hat) 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 7c0d1a6..f41e399 100644
--- a/Makefile
+++ b/Makefile
@@ -636,7 +636,7 @@ report_noswig: force
 
 PYTHON_INCLUDES = `pkg-config --cflags $(PYTHON_VERS)`
 PYTHON_LDFLAGS = `pkg-config --libs $(PYTHON_VERS)` \
-   $(shell python2 -c "import distutils.sysconfig; print 
distutils.sysconfig.get_config_var('LINKFORSHARED')")
+   $(shell $(PYTHON_VERS)-config --ldflags)
 PYGTK_CFLAGS = `pkg-config --cflags pygtk-2.0`
 
 ctracecmd.so: $(TCMD_LIB_OBJS) ctracecmd.i
-- 
2.14.3



[PATCH 0/2] trace-cmd: Improves python plugin support

2018-02-03 Thread sztsian
From: Zamir SUN (Red Hat) 

This is a set of patch to improve trace-cmd python plugin support.

Patch 1 fixes the detection for swig. With this patch, python-plugin can
be built successfully.

Patch 2 improves the detection for python ldflags. With this patch,
ldflags can be detected with python3 in the future.

Zamir SUN (2):
  trace-cmd: Fix the detection for swig
  trace-cmd: Change the way of getting python ldflags.

 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.14.3



[PATCH 0/2] trace-cmd: Improves python plugin support

2018-02-03 Thread sztsian
From: Zamir SUN (Red Hat) 

This is a set of patch to improve trace-cmd python plugin support.

Patch 1 fixes the detection for swig. With this patch, python-plugin can
be built successfully.

Patch 2 improves the detection for python ldflags. With this patch,
ldflags can be detected with python3 in the future.

Zamir SUN (2):
  trace-cmd: Fix the detection for swig
  trace-cmd: Change the way of getting python ldflags.

 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.14.3



Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

2018-02-03 Thread Namhyung Kim
On Sun, Feb 4, 2018 at 6:30 AM, Alexei Starovoitov
 wrote:
> On Sat, Feb 03, 2018 at 04:08:24PM -0500, Steven Rostedt wrote:
>> OK, so no new development in this was wanted? So the entire talk about
>> getting tracepoints into vfs and scheduling wasn't needed?
>
> I don't know who wants tracepoints in vfs.

AFAIK some people wanted to get some info (e.g. filename) from vfs.


>> Not if you are working in the embedded space and only have busybox as
>> your interface.
>
> did you notice bpfd project that does remote kprobe+bpf into an android phone?
> or phone is not an embedded space?

I never tried bpfd yet but it looks promissing.
It'd be nice to have such setup working on a typical yocto environment.
Anyway, I'd say that android phone is different that typical embedded
systems. :)

Thanks,
Namhyung


Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

2018-02-03 Thread Namhyung Kim
On Sun, Feb 4, 2018 at 6:30 AM, Alexei Starovoitov
 wrote:
> On Sat, Feb 03, 2018 at 04:08:24PM -0500, Steven Rostedt wrote:
>> OK, so no new development in this was wanted? So the entire talk about
>> getting tracepoints into vfs and scheduling wasn't needed?
>
> I don't know who wants tracepoints in vfs.

AFAIK some people wanted to get some info (e.g. filename) from vfs.


>> Not if you are working in the embedded space and only have busybox as
>> your interface.
>
> did you notice bpfd project that does remote kprobe+bpf into an android phone?
> or phone is not an embedded space?

I never tried bpfd yet but it looks promissing.
It'd be nice to have such setup working on a typical yocto environment.
Anyway, I'd say that android phone is different that typical embedded
systems. :)

Thanks,
Namhyung


Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

2018-02-03 Thread Namhyung Kim
Hi Steve and Alexei,

On Sun, Feb 4, 2018 at 6:17 AM, Steven Rostedt  wrote:
> On Sat, 3 Feb 2018 12:52:08 -0800
> Alexei Starovoitov  wrote:
>
>> It's a user space job.
>
> BTW, I asked around at DevConf.cz, and nobody I talked with (besides
> Arnaldo), have used eBPF. The "path to hello world" is quite high. This
> interface is extremely simple to use, and one doesn't need to install
> LLVM or other tools to interface with it.
>
> I used the analogy, that eBPF is like C, and this is like Bash. One is
> much easier to get "Hello World!" out than the other.
>
> So personally, this is something I know I would use (note, I have
> never used eBPF either). But if I'm the only one to use this
> interface then I'll stop here (and not bother with the function graph
> return interface). If others think this would be helpful, I would ask
> them to speak up now.

I'm interested in this.  From my understanding, it's basically
function tracing + filter + custom argument info, right?

Supporting arguments with complex type could be error-prone.
We need to prevent malfunctions by invalid inputs.

Thanks,
Namhyung


Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

2018-02-03 Thread Namhyung Kim
Hi Steve and Alexei,

On Sun, Feb 4, 2018 at 6:17 AM, Steven Rostedt  wrote:
> On Sat, 3 Feb 2018 12:52:08 -0800
> Alexei Starovoitov  wrote:
>
>> It's a user space job.
>
> BTW, I asked around at DevConf.cz, and nobody I talked with (besides
> Arnaldo), have used eBPF. The "path to hello world" is quite high. This
> interface is extremely simple to use, and one doesn't need to install
> LLVM or other tools to interface with it.
>
> I used the analogy, that eBPF is like C, and this is like Bash. One is
> much easier to get "Hello World!" out than the other.
>
> So personally, this is something I know I would use (note, I have
> never used eBPF either). But if I'm the only one to use this
> interface then I'll stop here (and not bother with the function graph
> return interface). If others think this would be helpful, I would ask
> them to speak up now.

I'm interested in this.  From my understanding, it's basically
function tracing + filter + custom argument info, right?

Supporting arguments with complex type could be error-prone.
We need to prevent malfunctions by invalid inputs.

Thanks,
Namhyung


Re: [PATCH AUTOSEL for 4.14 006/110] KVM/x86: Check input paging mode when cs.l is set

2018-02-03 Thread Sasha Levin
On Sat, Feb 03, 2018 at 05:24:29PM -0800, Eric Biggers wrote:
>On Sat, Feb 03, 2018 at 06:00:29PM +, Sasha Levin wrote:
>> From: Lan Tianyu 
>>
>> [ Upstream commit f29810335965ac1f7bcb501ee2af5f039f792416 ]
>>
>> Reported by syzkaller:
>> WARNING: CPU: 0 PID: 27962 at arch/x86/kvm/emulate.c:5631 
>> x86_emulate_insn+0x557/0x15f0 [kvm]
>> Modules linked in: kvm_intel kvm [last unloaded: kvm]
>> CPU: 0 PID: 27962 Comm: syz-executor Tainted: GB   W
>> 4.15.0-rc2-next-20171208+ #32
>> Hardware name: Intel Corporation S1200SP/S1200SP, BIOS 
>> S1200SP.86B.01.03.0006.040720161253 04/07/2016
>> RIP: 0010:x86_emulate_insn+0x557/0x15f0 [kvm]
>> RSP: 0018:8807234476d0 EFLAGS: 00010282
>> RAX:  RBX: 88072d0237a0 RCX: a0065c4d
>> RDX: 1100e5a046f9 RSI: 0003 RDI: 88072d0237c8
>> RBP: 880723447728 R08: 88072d02 R09: a008d240
>> R10: 0002 R11: ed00e7d87db3 R12: 88072d0237c8
>> R13: 88072d023870 R14: 88072d0238c2 R15: a008d080
>> FS:  7f8a68666700() GS:88080220() 
>> knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 2009506c CR3: 00071fec4005 CR4: 003626f0
>> Call Trace:
>>  x86_emulate_instruction+0x3bc/0xb70 [kvm]
>>  ? reexecute_instruction.part.162+0x130/0x130 [kvm]
>>  vmx_handle_exit+0x46d/0x14f0 [kvm_intel]
>>  ? trace_event_raw_event_kvm_entry+0xe7/0x150 [kvm]
>>  ? handle_vmfunc+0x2f0/0x2f0 [kvm_intel]
>>  ? wait_lapic_expire+0x25/0x270 [kvm]
>>  vcpu_enter_guest+0x720/0x1ef0 [kvm]
>>  ...
>>
>> When CS.L is set, vcpu should run in the 64 bit paging mode.
>> Current kvm set_sregs function doesn't have such check when
>> userspace inputs sreg values. This will lead unexpected behavior.
>> This patch is to add checks for CS.L, EFER.LME, EFER.LMA and
>> CR4.PAE when get SREG inputs from userspace in order to avoid
>> unexpected behavior.
>>
>> Suggested-by: Paolo Bonzini 
>> Reported-by: Dmitry Vyukov 
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Dmitry Vyukov 
>> Cc: Jim Mattson 
>> Signed-off-by: Tianyu Lan 
>> Signed-off-by: Paolo Bonzini 
>> Signed-off-by: Sasha Levin 
>> ---
>>  arch/x86/kvm/x86.c | 26 ++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 8c28023a43b1..ad0f18107c74 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -7473,6 +7473,29 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
>> tss_selector, int idt_index,
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_task_switch);
>>
>> +int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>> +{
>> +if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG_BIT)) {
>> +/*
>> + * When EFER.LME and CR0.PG are set, the processor is in
>> + * 64-bit mode (though maybe in a 32-bit code segment).
>> + * CR4.PAE and EFER.LMA must be set.
>> + */
>> +if (!(sregs->cr4 & X86_CR4_PAE_BIT)
>> +|| !(sregs->efer & EFER_LMA))
>> +return -EINVAL;
>> +} else {
>> +/*
>> + * Not in 64-bit mode: EFER.LMA is clear and the code
>> + * segment cannot be 64-bit.
>> + */
>> +if (sregs->efer & EFER_LMA || sregs->cs.l)
>> +return -EINVAL;
>> +}
>> +
>> +return 0;
>> +}
>
>This commit is broken and there was a fix for it merged: 37b95951c58 ("KVM/x86:
>Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in
>kvm_valid_sregs()").  Shouldn't your script have picked that up too?

It should have, yes. I tried to figure out why it didn't and it looks
like the "Fixes:" line is a bit messed up in that commit, and my script
didn't parse it right:

Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l
is set)

-- 

Thanks,
Sasha

Re: [PATCH AUTOSEL for 4.14 006/110] KVM/x86: Check input paging mode when cs.l is set

2018-02-03 Thread Sasha Levin
On Sat, Feb 03, 2018 at 05:24:29PM -0800, Eric Biggers wrote:
>On Sat, Feb 03, 2018 at 06:00:29PM +, Sasha Levin wrote:
>> From: Lan Tianyu 
>>
>> [ Upstream commit f29810335965ac1f7bcb501ee2af5f039f792416 ]
>>
>> Reported by syzkaller:
>> WARNING: CPU: 0 PID: 27962 at arch/x86/kvm/emulate.c:5631 
>> x86_emulate_insn+0x557/0x15f0 [kvm]
>> Modules linked in: kvm_intel kvm [last unloaded: kvm]
>> CPU: 0 PID: 27962 Comm: syz-executor Tainted: GB   W
>> 4.15.0-rc2-next-20171208+ #32
>> Hardware name: Intel Corporation S1200SP/S1200SP, BIOS 
>> S1200SP.86B.01.03.0006.040720161253 04/07/2016
>> RIP: 0010:x86_emulate_insn+0x557/0x15f0 [kvm]
>> RSP: 0018:8807234476d0 EFLAGS: 00010282
>> RAX:  RBX: 88072d0237a0 RCX: a0065c4d
>> RDX: 1100e5a046f9 RSI: 0003 RDI: 88072d0237c8
>> RBP: 880723447728 R08: 88072d02 R09: a008d240
>> R10: 0002 R11: ed00e7d87db3 R12: 88072d0237c8
>> R13: 88072d023870 R14: 88072d0238c2 R15: a008d080
>> FS:  7f8a68666700() GS:88080220() 
>> knlGS:
>> CS:  0010 DS:  ES:  CR0: 80050033
>> CR2: 2009506c CR3: 00071fec4005 CR4: 003626f0
>> Call Trace:
>>  x86_emulate_instruction+0x3bc/0xb70 [kvm]
>>  ? reexecute_instruction.part.162+0x130/0x130 [kvm]
>>  vmx_handle_exit+0x46d/0x14f0 [kvm_intel]
>>  ? trace_event_raw_event_kvm_entry+0xe7/0x150 [kvm]
>>  ? handle_vmfunc+0x2f0/0x2f0 [kvm_intel]
>>  ? wait_lapic_expire+0x25/0x270 [kvm]
>>  vcpu_enter_guest+0x720/0x1ef0 [kvm]
>>  ...
>>
>> When CS.L is set, vcpu should run in the 64 bit paging mode.
>> Current kvm set_sregs function doesn't have such check when
>> userspace inputs sreg values. This will lead unexpected behavior.
>> This patch is to add checks for CS.L, EFER.LME, EFER.LMA and
>> CR4.PAE when get SREG inputs from userspace in order to avoid
>> unexpected behavior.
>>
>> Suggested-by: Paolo Bonzini 
>> Reported-by: Dmitry Vyukov 
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Dmitry Vyukov 
>> Cc: Jim Mattson 
>> Signed-off-by: Tianyu Lan 
>> Signed-off-by: Paolo Bonzini 
>> Signed-off-by: Sasha Levin 
>> ---
>>  arch/x86/kvm/x86.c | 26 ++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 8c28023a43b1..ad0f18107c74 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -7473,6 +7473,29 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
>> tss_selector, int idt_index,
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_task_switch);
>>
>> +int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>> +{
>> +if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG_BIT)) {
>> +/*
>> + * When EFER.LME and CR0.PG are set, the processor is in
>> + * 64-bit mode (though maybe in a 32-bit code segment).
>> + * CR4.PAE and EFER.LMA must be set.
>> + */
>> +if (!(sregs->cr4 & X86_CR4_PAE_BIT)
>> +|| !(sregs->efer & EFER_LMA))
>> +return -EINVAL;
>> +} else {
>> +/*
>> + * Not in 64-bit mode: EFER.LMA is clear and the code
>> + * segment cannot be 64-bit.
>> + */
>> +if (sregs->efer & EFER_LMA || sregs->cs.l)
>> +return -EINVAL;
>> +}
>> +
>> +return 0;
>> +}
>
>This commit is broken and there was a fix for it merged: 37b95951c58 ("KVM/x86:
>Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in
>kvm_valid_sregs()").  Shouldn't your script have picked that up too?

It should have, yes. I tried to figure out why it didn't and it looks
like the "Fixes:" line is a bit messed up in that commit, and my script
didn't parse it right:

Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l
is set)

-- 

Thanks,
Sasha

Re: [PATCH net 1/1 v1] rtnetlink: require unique netns identifier

2018-02-03 Thread David Ahern
On 2/3/18 12:17 PM, Stephen Hemminger wrote:
> On Sat,  3 Feb 2018 14:29:04 +0100
> Christian Brauner  wrote:
> 
>> +static int rtnl_ensure_unique_netns_attr(const struct sock *sk,
>> + struct nlattr *tb[],
>> + struct netlink_ext_ack *extack)
>> +{
>> +int ret = -EINVAL;
>> +struct net *net = NULL, *unique_net = NULL;
>> +
>> +/* Requests without network namespace ids have been able to specify
>> + * multiple properties referring to different network namespaces so
>> + * don't regress them.
>> + */
>> +if (!tb[IFLA_IF_NETNSID])
>> +return 0;
>> +
>> +if (!tb[IFLA_NET_NS_PID] && !tb[IFLA_NET_NS_FD])
>> +return 0;
> 
> Isn't this an error?
> 
>> +
>> +unique_net = get_net_ns_by_id(sock_net(sk), 
>> nla_get_s32(tb[IFLA_IF_NETNSID]));
>> +if (!unique_net)
>> +return -1;
> 
> Other paths are returning errno, so why -1 here?
> 

extack needs to be filled in too.


Re: [PATCH net 1/1 v1] rtnetlink: require unique netns identifier

2018-02-03 Thread David Ahern
On 2/3/18 12:17 PM, Stephen Hemminger wrote:
> On Sat,  3 Feb 2018 14:29:04 +0100
> Christian Brauner  wrote:
> 
>> +static int rtnl_ensure_unique_netns_attr(const struct sock *sk,
>> + struct nlattr *tb[],
>> + struct netlink_ext_ack *extack)
>> +{
>> +int ret = -EINVAL;
>> +struct net *net = NULL, *unique_net = NULL;
>> +
>> +/* Requests without network namespace ids have been able to specify
>> + * multiple properties referring to different network namespaces so
>> + * don't regress them.
>> + */
>> +if (!tb[IFLA_IF_NETNSID])
>> +return 0;
>> +
>> +if (!tb[IFLA_NET_NS_PID] && !tb[IFLA_NET_NS_FD])
>> +return 0;
> 
> Isn't this an error?
> 
>> +
>> +unique_net = get_net_ns_by_id(sock_net(sk), 
>> nla_get_s32(tb[IFLA_IF_NETNSID]));
>> +if (!unique_net)
>> +return -1;
> 
> Other paths are returning errno, so why -1 here?
> 

extack needs to be filled in too.


Re: [PATCH 5/5] USB: serial: f81232: fix bulk_in/out size

2018-02-03 Thread Johan Hovold
On Thu, Feb 01, 2018 at 01:50:55PM +0800, Ji-Ze Hong (Peter Hong) wrote:
> Hi Johan,
> 
> Johan Hovold 於 2018/1/30 下午 12:11 寫道:
> > On Mon, Jan 22, 2018 at 03:58:47PM +0800, Ji-Ze Hong (Peter Hong) wrote:
> >> diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
> >> index a054f69446fd..f3ee537d643c 100644
> >> --- a/drivers/usb/serial/f81232.c
> >> +++ b/drivers/usb/serial/f81232.c
> >> @@ -769,8 +769,7 @@ static struct usb_serial_driver f81232_device = {
> >>},
> >>.id_table = id_table,
> >>.num_ports =1,
> >> -  .bulk_in_size = 256,
> >> -  .bulk_out_size =256,
> >> +  .bulk_out_size =16,
> > 
> > So it seems you should really be setting bulk_in_size to 64 here (and
> > possibly leave bulk_out_size unset) as that would appear to match your
> > device buffer sizes.
> 
> Yes, we want to set the bulk_in_size as 64. The public datasheet has
> some error with bulk in/out, the correct size is 64.
> 
> We had test the bulk_out_size set the same with internal TX FIFO will
> make the best performance in tests, but it's ok to set 64. In my opinion
> , I'll prefer to set 16.

Having larger URB buffers than the endpoint size is typically more
efficient, but sometimes there are hardware issues that needs to be
worked around.

Johan


Re: [PATCH 5/5] USB: serial: f81232: fix bulk_in/out size

2018-02-03 Thread Johan Hovold
On Thu, Feb 01, 2018 at 01:50:55PM +0800, Ji-Ze Hong (Peter Hong) wrote:
> Hi Johan,
> 
> Johan Hovold 於 2018/1/30 下午 12:11 寫道:
> > On Mon, Jan 22, 2018 at 03:58:47PM +0800, Ji-Ze Hong (Peter Hong) wrote:
> >> diff --git a/drivers/usb/serial/f81232.c b/drivers/usb/serial/f81232.c
> >> index a054f69446fd..f3ee537d643c 100644
> >> --- a/drivers/usb/serial/f81232.c
> >> +++ b/drivers/usb/serial/f81232.c
> >> @@ -769,8 +769,7 @@ static struct usb_serial_driver f81232_device = {
> >>},
> >>.id_table = id_table,
> >>.num_ports =1,
> >> -  .bulk_in_size = 256,
> >> -  .bulk_out_size =256,
> >> +  .bulk_out_size =16,
> > 
> > So it seems you should really be setting bulk_in_size to 64 here (and
> > possibly leave bulk_out_size unset) as that would appear to match your
> > device buffer sizes.
> 
> Yes, we want to set the bulk_in_size as 64. The public datasheet has
> some error with bulk in/out, the correct size is 64.
> 
> We had test the bulk_out_size set the same with internal TX FIFO will
> make the best performance in tests, but it's ok to set 64. In my opinion
> , I'll prefer to set 16.

Having larger URB buffers than the endpoint size is typically more
efficient, but sometimes there are hardware issues that needs to be
worked around.

Johan


Re: [PATCH 3/5] USB: serial: f81232: enable remote wakeup via RX/RI pin

2018-02-03 Thread Johan Hovold
On Thu, Feb 01, 2018 at 11:13:01AM +0800, Ji-Ze Hong (Peter Hong) wrote:
> Hi Johan,
> 
> Johan Hovold 於 2018/1/30 上午 11:57 寫道:
> > On Mon, Jan 22, 2018 at 03:58:45PM +0800, Ji-Ze Hong (Peter Hong) wrote:
> >> The F81232 can do remote wakeup via RX/RI pin with pulse.
> >> This patch will use device_set_wakeup_enable to enable this
> >> feature.
> > 
> > This is a policy decision that should be made by user space by setting
> > the power/wakeup attribute, and not something that something that
> > drivers should enable directly themselves.
> > 
> > Perhaps you really wanted to use device_set_wakeup_capable()? But then
> > you also need to honour the current setting in suspend() as well.
> > 
> > How have you tested this feature?
> > 
> 
> Our USB-To-Serial support RI/ RX remote wakeup by Modem, Fax or
> other peripherals and we had tested it by following procedure with
> device_set_wakeup_enable() enabled:
>  1. Using pm-suspend to S3
>  2. Trigger a pulse to RI/RX to wake up system.
> 
> In our test, we can do remote wakeup only with
> device_set_wakeup_enable() enabled.

Yeah, but you need to enable it though sysfs. Not every device should be
able to wake the system up. That's a decision left for user space.

> Should we add device_set_wakeup_capable() & device_set_wakeup_enable()
> like following link??
> https://elixir.free-electrons.com/linux/latest/source/drivers/media/rc/mceusb.c#L1476

No, your driver should not call device_set_wakeup_enable() itself. Just
set the wakeup capable flag in probe. And if you can disable the wake up
feature, this needs to be done at suspend depending on what user space
has requested.

Johan


Re: [PATCH 3/5] USB: serial: f81232: enable remote wakeup via RX/RI pin

2018-02-03 Thread Johan Hovold
On Thu, Feb 01, 2018 at 11:13:01AM +0800, Ji-Ze Hong (Peter Hong) wrote:
> Hi Johan,
> 
> Johan Hovold 於 2018/1/30 上午 11:57 寫道:
> > On Mon, Jan 22, 2018 at 03:58:45PM +0800, Ji-Ze Hong (Peter Hong) wrote:
> >> The F81232 can do remote wakeup via RX/RI pin with pulse.
> >> This patch will use device_set_wakeup_enable to enable this
> >> feature.
> > 
> > This is a policy decision that should be made by user space by setting
> > the power/wakeup attribute, and not something that something that
> > drivers should enable directly themselves.
> > 
> > Perhaps you really wanted to use device_set_wakeup_capable()? But then
> > you also need to honour the current setting in suspend() as well.
> > 
> > How have you tested this feature?
> > 
> 
> Our USB-To-Serial support RI/ RX remote wakeup by Modem, Fax or
> other peripherals and we had tested it by following procedure with
> device_set_wakeup_enable() enabled:
>  1. Using pm-suspend to S3
>  2. Trigger a pulse to RI/RX to wake up system.
> 
> In our test, we can do remote wakeup only with
> device_set_wakeup_enable() enabled.

Yeah, but you need to enable it though sysfs. Not every device should be
able to wake the system up. That's a decision left for user space.

> Should we add device_set_wakeup_capable() & device_set_wakeup_enable()
> like following link??
> https://elixir.free-electrons.com/linux/latest/source/drivers/media/rc/mceusb.c#L1476

No, your driver should not call device_set_wakeup_enable() itself. Just
set the wakeup capable flag in probe. And if you can disable the wake up
feature, this needs to be done at suspend depending on what user space
has requested.

Johan


Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

2018-02-03 Thread Andy Lutomirski
On Sun, Feb 4, 2018 at 1:25 AM, Dan Williams  wrote:
> On Sat, Feb 3, 2018 at 4:14 PM, Andy Lutomirski  wrote:
>> On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams  
>> wrote:
>>> At entry userspace may have populated the extra registers outside the
>>> syscall calling convention with values that could be useful in a
>>> speculative execution attack. Clear them to minimize the kernel's attack
>>> surface. Note, this only clears the extra registers and not the unused
>>> registers for syscalls less than 6 arguments since those registers are
>>> likely to be clobbered well before their values could be put to use
>>> under speculation.
>>>
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: x...@kernel.org
>>> Cc: Andy Lutomirski 
>>> Suggested-by: Linus Torvalds 
>>> Reported-by: Andi Kleen 
>>> Signed-off-by: Dan Williams 
>>> ---
>>>  arch/x86/entry/calling.h  |   17 +
>>>  arch/x86/entry/entry_64.S |1 +
>>>  2 files changed, 18 insertions(+)
>>>
>>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
>>> index 3f48f695d5e6..daee2d19e73d 100644
>>> --- a/arch/x86/entry/calling.h
>>> +++ b/arch/x86/entry/calling.h
>>> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel 
>>> is built with
>>> UNWIND_HINT_REGS offset=\offset
>>> .endm
>>>
>>> +   /*
>>> +* Sanitize extra registers of values that a speculation attack
>>> +* might want to exploit. In the CONFIG_FRAME_POINTER=y case,
>>> +* the expectation is that %ebp will be clobbered before it
>>> +* could be used.
>>> +*/
>>> +   .macro CLEAR_EXTRA_REGS_NOSPEC
>>> +   xorq %r15, %r15
>>> +   xorq %r14, %r14
>>> +   xorq %r13, %r13
>>> +   xorq %r12, %r12
>>> +   xorl %ebx, %ebx
>>> +#ifndef CONFIG_FRAME_POINTER
>>> +   xorl %ebp, %ebp
>>> +#endif
>>> +   .endm
>>> +
>>
>> Can we make the clears only happen if we have CONFIG_RETPOLINE?  Or is
>> there maybe some reason why we want this even without retpolines on?
>
> We have the other Spectre variant1 mitigations on by default. I'm not
> opposed to adding a config to turn them all off, but I think we should
> be consistent either way, and I don't think CONFIG_RETPOLINE is the
> right config to gate those.

Fair enough.

>
>>> .macro POP_EXTRA_REGS
>>> popq %r15
>>> popq %r14
>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>> index c752abe89d80..46260e951da6 100644
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
>>> TRACE_IRQS_OFF
>>>
>>> /* IRQs are off. */
>>> +   CLEAR_EXTRA_REGS_NOSPEC
>>
>> Please put the clears before TRACE_IRQS_OFF to protect users that use 
>> tracing.
>
> Ok.


Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

2018-02-03 Thread Andy Lutomirski
On Sun, Feb 4, 2018 at 1:25 AM, Dan Williams  wrote:
> On Sat, Feb 3, 2018 at 4:14 PM, Andy Lutomirski  wrote:
>> On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams  
>> wrote:
>>> At entry userspace may have populated the extra registers outside the
>>> syscall calling convention with values that could be useful in a
>>> speculative execution attack. Clear them to minimize the kernel's attack
>>> surface. Note, this only clears the extra registers and not the unused
>>> registers for syscalls less than 6 arguments since those registers are
>>> likely to be clobbered well before their values could be put to use
>>> under speculation.
>>>
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: "H. Peter Anvin" 
>>> Cc: x...@kernel.org
>>> Cc: Andy Lutomirski 
>>> Suggested-by: Linus Torvalds 
>>> Reported-by: Andi Kleen 
>>> Signed-off-by: Dan Williams 
>>> ---
>>>  arch/x86/entry/calling.h  |   17 +
>>>  arch/x86/entry/entry_64.S |1 +
>>>  2 files changed, 18 insertions(+)
>>>
>>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
>>> index 3f48f695d5e6..daee2d19e73d 100644
>>> --- a/arch/x86/entry/calling.h
>>> +++ b/arch/x86/entry/calling.h
>>> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel 
>>> is built with
>>> UNWIND_HINT_REGS offset=\offset
>>> .endm
>>>
>>> +   /*
>>> +* Sanitize extra registers of values that a speculation attack
>>> +* might want to exploit. In the CONFIG_FRAME_POINTER=y case,
>>> +* the expectation is that %ebp will be clobbered before it
>>> +* could be used.
>>> +*/
>>> +   .macro CLEAR_EXTRA_REGS_NOSPEC
>>> +   xorq %r15, %r15
>>> +   xorq %r14, %r14
>>> +   xorq %r13, %r13
>>> +   xorq %r12, %r12
>>> +   xorl %ebx, %ebx
>>> +#ifndef CONFIG_FRAME_POINTER
>>> +   xorl %ebp, %ebp
>>> +#endif
>>> +   .endm
>>> +
>>
>> Can we make the clears only happen if we have CONFIG_RETPOLINE?  Or is
>> there maybe some reason why we want this even without retpolines on?
>
> We have the other Spectre variant1 mitigations on by default. I'm not
> opposed to adding a config to turn them all off, but I think we should
> be consistent either way, and I don't think CONFIG_RETPOLINE is the
> right config to gate those.

Fair enough.

>
>>> .macro POP_EXTRA_REGS
>>> popq %r15
>>> popq %r14
>>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>>> index c752abe89d80..46260e951da6 100644
>>> --- a/arch/x86/entry/entry_64.S
>>> +++ b/arch/x86/entry/entry_64.S
>>> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
>>> TRACE_IRQS_OFF
>>>
>>> /* IRQs are off. */
>>> +   CLEAR_EXTRA_REGS_NOSPEC
>>
>> Please put the clears before TRACE_IRQS_OFF to protect users that use 
>> tracing.
>
> Ok.


Re: [PATCH] cifs: silence compiler warnings showing up with gcc-8.0.0

2018-02-03 Thread Steve French
merged into cifs-2.6.git for-next

On Fri, Feb 2, 2018 at 9:48 AM, Arnd Bergmann  wrote:
> This bug was fixed before, but came up again with the latest
> compiler in another function:
>
> fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA':
> fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 
> 4] [-Werror=array-bounds]
>strncpy(parm_data->list[0].name, ea_name, name_len);
>
> Let's apply the same fix that was used for the other instances.
>
> Fixes: b2a3ad9ca502 ("cifs: silence compiler warnings showing up with 
> gcc-4.7.0")
> Signed-off-by: Arnd Bergmann 
> ---
>  fs/cifs/cifssmb.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 72d71703f1e8..78bc86c315ec 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -6343,9 +6343,7 @@ CIFSSMBSetEA(const unsigned int xid, struct cifs_tcon 
> *tcon,
> pSMB->InformationLevel =
> cpu_to_le16(SMB_SET_FILE_EA);
>
> -   parm_data =
> -   (struct fealist *) (((char *) >hdr.Protocol) +
> -  offset);
> +   parm_data = (void *)pSMB + offsetof(struct smb_hdr, Protocol) + 
> offset;
> pSMB->ParameterOffset = cpu_to_le16(param_offset);
> pSMB->DataOffset = cpu_to_le16(offset);
> pSMB->SetupCount = 1;
> --
> 2.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve


Re: [PATCH] cifs: silence compiler warnings showing up with gcc-8.0.0

2018-02-03 Thread Steve French
merged into cifs-2.6.git for-next

On Fri, Feb 2, 2018 at 9:48 AM, Arnd Bergmann  wrote:
> This bug was fixed before, but came up again with the latest
> compiler in another function:
>
> fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA':
> fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 
> 4] [-Werror=array-bounds]
>strncpy(parm_data->list[0].name, ea_name, name_len);
>
> Let's apply the same fix that was used for the other instances.
>
> Fixes: b2a3ad9ca502 ("cifs: silence compiler warnings showing up with 
> gcc-4.7.0")
> Signed-off-by: Arnd Bergmann 
> ---
>  fs/cifs/cifssmb.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index 72d71703f1e8..78bc86c315ec 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -6343,9 +6343,7 @@ CIFSSMBSetEA(const unsigned int xid, struct cifs_tcon 
> *tcon,
> pSMB->InformationLevel =
> cpu_to_le16(SMB_SET_FILE_EA);
>
> -   parm_data =
> -   (struct fealist *) (((char *) >hdr.Protocol) +
> -  offset);
> +   parm_data = (void *)pSMB + offsetof(struct smb_hdr, Protocol) + 
> offset;
> pSMB->ParameterOffset = cpu_to_le16(param_offset);
> pSMB->DataOffset = cpu_to_le16(offset);
> pSMB->SetupCount = 1;
> --
> 2.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Thanks,

Steve


Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

2018-02-03 Thread Dan Williams
On Sat, Feb 3, 2018 at 4:14 PM, Andy Lutomirski  wrote:
> On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams  
> wrote:
>> At entry userspace may have populated the extra registers outside the
>> syscall calling convention with values that could be useful in a
>> speculative execution attack. Clear them to minimize the kernel's attack
>> surface. Note, this only clears the extra registers and not the unused
>> registers for syscalls less than 6 arguments since those registers are
>> likely to be clobbered well before their values could be put to use
>> under speculation.
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: "H. Peter Anvin" 
>> Cc: x...@kernel.org
>> Cc: Andy Lutomirski 
>> Suggested-by: Linus Torvalds 
>> Reported-by: Andi Kleen 
>> Signed-off-by: Dan Williams 
>> ---
>>  arch/x86/entry/calling.h  |   17 +
>>  arch/x86/entry/entry_64.S |1 +
>>  2 files changed, 18 insertions(+)
>>
>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
>> index 3f48f695d5e6..daee2d19e73d 100644
>> --- a/arch/x86/entry/calling.h
>> +++ b/arch/x86/entry/calling.h
>> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel 
>> is built with
>> UNWIND_HINT_REGS offset=\offset
>> .endm
>>
>> +   /*
>> +* Sanitize extra registers of values that a speculation attack
>> +* might want to exploit. In the CONFIG_FRAME_POINTER=y case,
>> +* the expectation is that %ebp will be clobbered before it
>> +* could be used.
>> +*/
>> +   .macro CLEAR_EXTRA_REGS_NOSPEC
>> +   xorq %r15, %r15
>> +   xorq %r14, %r14
>> +   xorq %r13, %r13
>> +   xorq %r12, %r12
>> +   xorl %ebx, %ebx
>> +#ifndef CONFIG_FRAME_POINTER
>> +   xorl %ebp, %ebp
>> +#endif
>> +   .endm
>> +
>
> Can we make the clears only happen if we have CONFIG_RETPOLINE?  Or is
> there maybe some reason why we want this even without retpolines on?

We have the other Spectre variant1 mitigations on by default. I'm not
opposed to adding a config to turn them all off, but I think we should
be consistent either way, and I don't think CONFIG_RETPOLINE is the
right config to gate those.

>> .macro POP_EXTRA_REGS
>> popq %r15
>> popq %r14
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index c752abe89d80..46260e951da6 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
>> TRACE_IRQS_OFF
>>
>> /* IRQs are off. */
>> +   CLEAR_EXTRA_REGS_NOSPEC
>
> Please put the clears before TRACE_IRQS_OFF to protect users that use tracing.

Ok.


Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

2018-02-03 Thread Dan Williams
On Sat, Feb 3, 2018 at 4:14 PM, Andy Lutomirski  wrote:
> On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams  
> wrote:
>> At entry userspace may have populated the extra registers outside the
>> syscall calling convention with values that could be useful in a
>> speculative execution attack. Clear them to minimize the kernel's attack
>> surface. Note, this only clears the extra registers and not the unused
>> registers for syscalls less than 6 arguments since those registers are
>> likely to be clobbered well before their values could be put to use
>> under speculation.
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: "H. Peter Anvin" 
>> Cc: x...@kernel.org
>> Cc: Andy Lutomirski 
>> Suggested-by: Linus Torvalds 
>> Reported-by: Andi Kleen 
>> Signed-off-by: Dan Williams 
>> ---
>>  arch/x86/entry/calling.h  |   17 +
>>  arch/x86/entry/entry_64.S |1 +
>>  2 files changed, 18 insertions(+)
>>
>> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
>> index 3f48f695d5e6..daee2d19e73d 100644
>> --- a/arch/x86/entry/calling.h
>> +++ b/arch/x86/entry/calling.h
>> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel 
>> is built with
>> UNWIND_HINT_REGS offset=\offset
>> .endm
>>
>> +   /*
>> +* Sanitize extra registers of values that a speculation attack
>> +* might want to exploit. In the CONFIG_FRAME_POINTER=y case,
>> +* the expectation is that %ebp will be clobbered before it
>> +* could be used.
>> +*/
>> +   .macro CLEAR_EXTRA_REGS_NOSPEC
>> +   xorq %r15, %r15
>> +   xorq %r14, %r14
>> +   xorq %r13, %r13
>> +   xorq %r12, %r12
>> +   xorl %ebx, %ebx
>> +#ifndef CONFIG_FRAME_POINTER
>> +   xorl %ebp, %ebp
>> +#endif
>> +   .endm
>> +
>
> Can we make the clears only happen if we have CONFIG_RETPOLINE?  Or is
> there maybe some reason why we want this even without retpolines on?

We have the other Spectre variant1 mitigations on by default. I'm not
opposed to adding a config to turn them all off, but I think we should
be consistent either way, and I don't think CONFIG_RETPOLINE is the
right config to gate those.

>> .macro POP_EXTRA_REGS
>> popq %r15
>> popq %r14
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index c752abe89d80..46260e951da6 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
>> TRACE_IRQS_OFF
>>
>> /* IRQs are off. */
>> +   CLEAR_EXTRA_REGS_NOSPEC
>
> Please put the clears before TRACE_IRQS_OFF to protect users that use tracing.

Ok.


Re: [PATCH AUTOSEL for 4.14 006/110] KVM/x86: Check input paging mode when cs.l is set

2018-02-03 Thread Eric Biggers
On Sat, Feb 03, 2018 at 06:00:29PM +, Sasha Levin wrote:
> From: Lan Tianyu 
> 
> [ Upstream commit f29810335965ac1f7bcb501ee2af5f039f792416 ]
> 
> Reported by syzkaller:
> WARNING: CPU: 0 PID: 27962 at arch/x86/kvm/emulate.c:5631 
> x86_emulate_insn+0x557/0x15f0 [kvm]
> Modules linked in: kvm_intel kvm [last unloaded: kvm]
> CPU: 0 PID: 27962 Comm: syz-executor Tainted: GB   W
> 4.15.0-rc2-next-20171208+ #32
> Hardware name: Intel Corporation S1200SP/S1200SP, BIOS 
> S1200SP.86B.01.03.0006.040720161253 04/07/2016
> RIP: 0010:x86_emulate_insn+0x557/0x15f0 [kvm]
> RSP: 0018:8807234476d0 EFLAGS: 00010282
> RAX:  RBX: 88072d0237a0 RCX: a0065c4d
> RDX: 1100e5a046f9 RSI: 0003 RDI: 88072d0237c8
> RBP: 880723447728 R08: 88072d02 R09: a008d240
> R10: 0002 R11: ed00e7d87db3 R12: 88072d0237c8
> R13: 88072d023870 R14: 88072d0238c2 R15: a008d080
> FS:  7f8a68666700() GS:88080220() 
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2009506c CR3: 00071fec4005 CR4: 003626f0
> Call Trace:
>  x86_emulate_instruction+0x3bc/0xb70 [kvm]
>  ? reexecute_instruction.part.162+0x130/0x130 [kvm]
>  vmx_handle_exit+0x46d/0x14f0 [kvm_intel]
>  ? trace_event_raw_event_kvm_entry+0xe7/0x150 [kvm]
>  ? handle_vmfunc+0x2f0/0x2f0 [kvm_intel]
>  ? wait_lapic_expire+0x25/0x270 [kvm]
>  vcpu_enter_guest+0x720/0x1ef0 [kvm]
>  ...
> 
> When CS.L is set, vcpu should run in the 64 bit paging mode.
> Current kvm set_sregs function doesn't have such check when
> userspace inputs sreg values. This will lead unexpected behavior.
> This patch is to add checks for CS.L, EFER.LME, EFER.LMA and
> CR4.PAE when get SREG inputs from userspace in order to avoid
> unexpected behavior.
> 
> Suggested-by: Paolo Bonzini 
> Reported-by: Dmitry Vyukov 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Dmitry Vyukov 
> Cc: Jim Mattson 
> Signed-off-by: Tianyu Lan 
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: Sasha Levin 
> ---
>  arch/x86/kvm/x86.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8c28023a43b1..ad0f18107c74 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7473,6 +7473,29 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
> tss_selector, int idt_index,
>  }
>  EXPORT_SYMBOL_GPL(kvm_task_switch);
>  
> +int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
> +{
> + if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG_BIT)) {
> + /*
> +  * When EFER.LME and CR0.PG are set, the processor is in
> +  * 64-bit mode (though maybe in a 32-bit code segment).
> +  * CR4.PAE and EFER.LMA must be set.
> +  */
> + if (!(sregs->cr4 & X86_CR4_PAE_BIT)
> + || !(sregs->efer & EFER_LMA))
> + return -EINVAL;
> + } else {
> + /*
> +  * Not in 64-bit mode: EFER.LMA is clear and the code
> +  * segment cannot be 64-bit.
> +  */
> + if (sregs->efer & EFER_LMA || sregs->cs.l)
> + return -EINVAL;
> + }
> +
> + return 0;
> +}

This commit is broken and there was a fix for it merged: 37b95951c58 ("KVM/x86:
Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in
kvm_valid_sregs()").  Shouldn't your script have picked that up too?

- Eric


Re: [PATCH AUTOSEL for 4.14 006/110] KVM/x86: Check input paging mode when cs.l is set

2018-02-03 Thread Eric Biggers
On Sat, Feb 03, 2018 at 06:00:29PM +, Sasha Levin wrote:
> From: Lan Tianyu 
> 
> [ Upstream commit f29810335965ac1f7bcb501ee2af5f039f792416 ]
> 
> Reported by syzkaller:
> WARNING: CPU: 0 PID: 27962 at arch/x86/kvm/emulate.c:5631 
> x86_emulate_insn+0x557/0x15f0 [kvm]
> Modules linked in: kvm_intel kvm [last unloaded: kvm]
> CPU: 0 PID: 27962 Comm: syz-executor Tainted: GB   W
> 4.15.0-rc2-next-20171208+ #32
> Hardware name: Intel Corporation S1200SP/S1200SP, BIOS 
> S1200SP.86B.01.03.0006.040720161253 04/07/2016
> RIP: 0010:x86_emulate_insn+0x557/0x15f0 [kvm]
> RSP: 0018:8807234476d0 EFLAGS: 00010282
> RAX:  RBX: 88072d0237a0 RCX: a0065c4d
> RDX: 1100e5a046f9 RSI: 0003 RDI: 88072d0237c8
> RBP: 880723447728 R08: 88072d02 R09: a008d240
> R10: 0002 R11: ed00e7d87db3 R12: 88072d0237c8
> R13: 88072d023870 R14: 88072d0238c2 R15: a008d080
> FS:  7f8a68666700() GS:88080220() 
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2009506c CR3: 00071fec4005 CR4: 003626f0
> Call Trace:
>  x86_emulate_instruction+0x3bc/0xb70 [kvm]
>  ? reexecute_instruction.part.162+0x130/0x130 [kvm]
>  vmx_handle_exit+0x46d/0x14f0 [kvm_intel]
>  ? trace_event_raw_event_kvm_entry+0xe7/0x150 [kvm]
>  ? handle_vmfunc+0x2f0/0x2f0 [kvm_intel]
>  ? wait_lapic_expire+0x25/0x270 [kvm]
>  vcpu_enter_guest+0x720/0x1ef0 [kvm]
>  ...
> 
> When CS.L is set, vcpu should run in the 64 bit paging mode.
> Current kvm set_sregs function doesn't have such check when
> userspace inputs sreg values. This will lead unexpected behavior.
> This patch is to add checks for CS.L, EFER.LME, EFER.LMA and
> CR4.PAE when get SREG inputs from userspace in order to avoid
> unexpected behavior.
> 
> Suggested-by: Paolo Bonzini 
> Reported-by: Dmitry Vyukov 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Dmitry Vyukov 
> Cc: Jim Mattson 
> Signed-off-by: Tianyu Lan 
> Signed-off-by: Paolo Bonzini 
> Signed-off-by: Sasha Levin 
> ---
>  arch/x86/kvm/x86.c | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8c28023a43b1..ad0f18107c74 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7473,6 +7473,29 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 
> tss_selector, int idt_index,
>  }
>  EXPORT_SYMBOL_GPL(kvm_task_switch);
>  
> +int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
> +{
> + if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG_BIT)) {
> + /*
> +  * When EFER.LME and CR0.PG are set, the processor is in
> +  * 64-bit mode (though maybe in a 32-bit code segment).
> +  * CR4.PAE and EFER.LMA must be set.
> +  */
> + if (!(sregs->cr4 & X86_CR4_PAE_BIT)
> + || !(sregs->efer & EFER_LMA))
> + return -EINVAL;
> + } else {
> + /*
> +  * Not in 64-bit mode: EFER.LMA is clear and the code
> +  * segment cannot be 64-bit.
> +  */
> + if (sregs->efer & EFER_LMA || sregs->cs.l)
> + return -EINVAL;
> + }
> +
> + return 0;
> +}

This commit is broken and there was a fix for it merged: 37b95951c58 ("KVM/x86:
Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in
kvm_valid_sregs()").  Shouldn't your script have picked that up too?

- Eric


[PATCH] PCI: update location of pci.ids file

2018-02-03 Thread Randy Dunlap
From: Randy Dunlap 

Update the URL for the pci.ids file and add locations for its mirrors.

Signed-off-by: Randy Dunlap 
Cc: Martin Mares 
Cc: Michal Vaner 
---
 Documentation/PCI/pci.txt |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- linux-next-20180202.orig/Documentation/PCI/pci.txt
+++ linux-next-20180202/Documentation/PCI/pci.txt
@@ -570,7 +570,9 @@ your driver if they're helpful, or just
 The device IDs are arbitrary hex numbers (vendor controlled) and normally used
 only in a single location, the pci_device_id table.
 
-Please DO submit new vendor/device IDs to http://pciids.sourceforge.net/.
+Please DO submit new vendor/device IDs to http://pci-ids.ucw.cz/.
+There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
+and https://github.com/pciutils/pciids.
 
 
 



[PATCH] PCI: update location of pci.ids file

2018-02-03 Thread Randy Dunlap
From: Randy Dunlap 

Update the URL for the pci.ids file and add locations for its mirrors.

Signed-off-by: Randy Dunlap 
Cc: Martin Mares 
Cc: Michal Vaner 
---
 Documentation/PCI/pci.txt |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- linux-next-20180202.orig/Documentation/PCI/pci.txt
+++ linux-next-20180202/Documentation/PCI/pci.txt
@@ -570,7 +570,9 @@ your driver if they're helpful, or just
 The device IDs are arbitrary hex numbers (vendor controlled) and normally used
 only in a single location, the pci_device_id table.
 
-Please DO submit new vendor/device IDs to http://pciids.sourceforge.net/.
+Please DO submit new vendor/device IDs to http://pci-ids.ucw.cz/.
+There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
+and https://github.com/pciutils/pciids.
 
 
 



[GIT PULL] fscrypt updates for 4.16

2018-02-03 Thread Theodore Ts'o
Note: there will be a merge conflict; please just take the chunk
which calls fscrypt_encrypt_symlink() from the fscrypt tree.  This
will end up dropping the kzalloc() -> f2fs_kzalloc() change, which
means the fscrypt-specific allocation won't get tested by f2fs's
kmalloc error injection system; which is fine.

The ubifs and f2fs changes have been reviewed by their respective
maintainers and I got their approval to run all of these changes
through the fscrypt tree.

 - Ted
 
The following changes since commit 1291a0d5049dbc06baaaf66a9ff3f53db493b19b:

  Linux 4.15-rc4 (2017-12-17 18:59:59 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt.git 
tags/fscrypt_for_linus

for you to fetch changes up to 0b1dfa4cc6c60052b2c30ead316fa84c46d3c43c:

  fscrypt: fix build with pre-4.6 gcc versions (2018-02-01 10:51:18 -0500)


Refactor support for encrypted symlinks to move common code to fscrypt.


Eric Biggers (26):
  fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers
  fscrypt: move fscrypt_control_page() to supp/notsupp headers
  fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h
  fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h
  fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions
  fscrypt: move fscrypt_operations declaration to fscrypt_supp.h
  fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
  fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
  fscrypt: trim down fscrypt.h includes
  fscrypt: new helper functions for ->symlink()
  fscrypt: new helper function - fscrypt_get_symlink()
  ext4: switch to fscrypt ->symlink() helper functions
  ext4: switch to fscrypt_get_symlink()
  f2fs: switch to fscrypt ->symlink() helper functions
  f2fs: switch to fscrypt_get_symlink()
  ubifs: free the encrypted symlink target
  ubifs: switch to fscrypt ->symlink() helper functions
  ubifs: switch to fscrypt_get_symlink()
  fscrypt: remove fscrypt_fname_usr_to_disk()
  fscrypt: move fscrypt_symlink_data to fscrypt_private.h
  fscrypt: calculate NUL-padding length in one place only
  fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
  fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
  fscrypt: document symlink length restriction
  fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
  fscrypt: fix build with pre-4.6 gcc versions

 Documentation/filesystems/fscrypt.rst |  10 -
 fs/crypto/crypto.c|   1 +
 fs/crypto/fname.c | 140 
+++--
 fs/crypto/fscrypt_private.h   |  31 ++
 fs/crypto/hooks.c | 158 
+
 fs/crypto/keyinfo.c   |  17 ++--
 fs/ext4/namei.c   |  58 +++---
 fs/ext4/super.c   |   4 +-
 fs/ext4/symlink.c |  43 +++
 fs/f2fs/inode.c   |   2 +-
 fs/f2fs/namei.c   | 132 
+++--
 fs/ubifs/dir.c|  63 +++-
 fs/ubifs/file.c   |  36 +---
 fs/ubifs/super.c  |   4 +-
 include/linux/fscrypt.h   | 174 
+---
 include/linux/fscrypt_notsupp.h   |  59 ++
 include/linux/fscrypt_supp.h  |  68 +++---
 17 files changed, 500 insertions(+), 500 deletions(-)


[GIT PULL] fscrypt updates for 4.16

2018-02-03 Thread Theodore Ts'o
Note: there will be a merge conflict; please just take the chunk
which calls fscrypt_encrypt_symlink() from the fscrypt tree.  This
will end up dropping the kzalloc() -> f2fs_kzalloc() change, which
means the fscrypt-specific allocation won't get tested by f2fs's
kmalloc error injection system; which is fine.

The ubifs and f2fs changes have been reviewed by their respective
maintainers and I got their approval to run all of these changes
through the fscrypt tree.

 - Ted
 
The following changes since commit 1291a0d5049dbc06baaaf66a9ff3f53db493b19b:

  Linux 4.15-rc4 (2017-12-17 18:59:59 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt.git 
tags/fscrypt_for_linus

for you to fetch changes up to 0b1dfa4cc6c60052b2c30ead316fa84c46d3c43c:

  fscrypt: fix build with pre-4.6 gcc versions (2018-02-01 10:51:18 -0500)


Refactor support for encrypted symlinks to move common code to fscrypt.


Eric Biggers (26):
  fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers
  fscrypt: move fscrypt_control_page() to supp/notsupp headers
  fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h
  fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h
  fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions
  fscrypt: move fscrypt_operations declaration to fscrypt_supp.h
  fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
  fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
  fscrypt: trim down fscrypt.h includes
  fscrypt: new helper functions for ->symlink()
  fscrypt: new helper function - fscrypt_get_symlink()
  ext4: switch to fscrypt ->symlink() helper functions
  ext4: switch to fscrypt_get_symlink()
  f2fs: switch to fscrypt ->symlink() helper functions
  f2fs: switch to fscrypt_get_symlink()
  ubifs: free the encrypted symlink target
  ubifs: switch to fscrypt ->symlink() helper functions
  ubifs: switch to fscrypt_get_symlink()
  fscrypt: remove fscrypt_fname_usr_to_disk()
  fscrypt: move fscrypt_symlink_data to fscrypt_private.h
  fscrypt: calculate NUL-padding length in one place only
  fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
  fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
  fscrypt: document symlink length restriction
  fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
  fscrypt: fix build with pre-4.6 gcc versions

 Documentation/filesystems/fscrypt.rst |  10 -
 fs/crypto/crypto.c|   1 +
 fs/crypto/fname.c | 140 
+++--
 fs/crypto/fscrypt_private.h   |  31 ++
 fs/crypto/hooks.c | 158 
+
 fs/crypto/keyinfo.c   |  17 ++--
 fs/ext4/namei.c   |  58 +++---
 fs/ext4/super.c   |   4 +-
 fs/ext4/symlink.c |  43 +++
 fs/f2fs/inode.c   |   2 +-
 fs/f2fs/namei.c   | 132 
+++--
 fs/ubifs/dir.c|  63 +++-
 fs/ubifs/file.c   |  36 +---
 fs/ubifs/super.c  |   4 +-
 include/linux/fscrypt.h   | 174 
+---
 include/linux/fscrypt_notsupp.h   |  59 ++
 include/linux/fscrypt_supp.h  |  68 +++---
 17 files changed, 500 insertions(+), 500 deletions(-)


Re: [PATCH AUTOSEL for 4.14 065/110] led: core: Fix brightness setting when setting delay_off=0

2018-02-03 Thread Sasha Levin
On Sat, Feb 03, 2018 at 09:35:26PM +0100, Pavel Machek wrote:
>On Sat 2018-02-03 18:00:59, Sasha Levin wrote:
>> From: Matthieu CASTET 
>>
>> [ Upstream commit 2b83ff96f51d0b039c4561b9f95c824d7bddb85c ]
>>
>> With the current code, the following sequence won't work :
>> echo timer > trigger
>>
>> echo 0 >  delay_off
>> * at this point we call
>> ** led_delay_off_store
>> ** led_blink_set
>> *** stop timer
>> ** led_blink_setup
>> ** led_set_software_blink
>> *** if !delay_on, led off
>> *** if !delay_off, set led_set_brightness_nosleep <--- LED_BLINK_SW is set 
>> but timer is stop
>> *** otherwise start timer/set LED_BLINK_SW flag
>>
>> echo xxx > brightness
>> * led_set_brightness
>> ** if LED_BLINK_SW
>> *** if brightness=0, led off
>> *** else apply brightness if next timer <--- timer is stop, and will never 
>> apply new setting
>> ** otherwise set led_set_brightness_nosleep
>>
>> To fix that, when we delete the timer, we should clear LED_BLINK_SW.
>
>Can you run the tests on the affected stable kernels? I have feeling
>that the problem described might not be present there.

Hm, I don't seem to have HW to test that out. Maybe someone else does?

-- 

Thanks,
Sasha

Re: [PATCH AUTOSEL for 4.14 065/110] led: core: Fix brightness setting when setting delay_off=0

2018-02-03 Thread Sasha Levin
On Sat, Feb 03, 2018 at 09:35:26PM +0100, Pavel Machek wrote:
>On Sat 2018-02-03 18:00:59, Sasha Levin wrote:
>> From: Matthieu CASTET 
>>
>> [ Upstream commit 2b83ff96f51d0b039c4561b9f95c824d7bddb85c ]
>>
>> With the current code, the following sequence won't work :
>> echo timer > trigger
>>
>> echo 0 >  delay_off
>> * at this point we call
>> ** led_delay_off_store
>> ** led_blink_set
>> *** stop timer
>> ** led_blink_setup
>> ** led_set_software_blink
>> *** if !delay_on, led off
>> *** if !delay_off, set led_set_brightness_nosleep <--- LED_BLINK_SW is set 
>> but timer is stop
>> *** otherwise start timer/set LED_BLINK_SW flag
>>
>> echo xxx > brightness
>> * led_set_brightness
>> ** if LED_BLINK_SW
>> *** if brightness=0, led off
>> *** else apply brightness if next timer <--- timer is stop, and will never 
>> apply new setting
>> ** otherwise set led_set_brightness_nosleep
>>
>> To fix that, when we delete the timer, we should clear LED_BLINK_SW.
>
>Can you run the tests on the affected stable kernels? I have feeling
>that the problem described might not be present there.

Hm, I don't seem to have HW to test that out. Maybe someone else does?

-- 

Thanks,
Sasha

[PATCH v3] Input: matrix_keypad - fix keypad does not response

2018-02-03 Thread Zhang Bo
If matrix_keypad_stop() is calling and the keypad interrupt is triggered,
disable_row_irqs() is called by both matrix_keypad_interrupt() and
matrix_keypad_stop() at the same time. then disable_row_irqs() is called
twice, and the device enter suspend state before keypad->work is executed.
At this condition the device will start keypad and enable irq once after
resume. and then irqs are disabled yet because irqs are disabled twice and
only enable once.

Take lock around keypad->stopped to ensure irqs operation is in atomic
operation.

Signed-off-by: Zhang Bo 
---
Changes in v3:
  - delete no needed lock protection, only add lock in matrix_keypad_stop().
Changes in v2:
  - Change commit message and full name in the signed-off-by tag.

 drivers/input/keyboard/matrix_keypad.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/input/keyboard/matrix_keypad.c 
b/drivers/input/keyboard/matrix_keypad.c
index 1f316d66e6f7..41614c185918 100644
--- a/drivers/input/keyboard/matrix_keypad.c
+++ b/drivers/input/keyboard/matrix_keypad.c
@@ -218,8 +218,10 @@ static void matrix_keypad_stop(struct input_dev *dev)
 {
struct matrix_keypad *keypad = input_get_drvdata(dev);
 
+   spin_lock_irq(>lock);
keypad->stopped = true;
-   mb();
+   spin_unlock_irq(>lock);
+
flush_work(>work.work);
/*
 * matrix_keypad_scan() will leave IRQs enabled;
-- 
2.14.3




[PATCH v3] Input: matrix_keypad - fix keypad does not response

2018-02-03 Thread Zhang Bo
If matrix_keypad_stop() is calling and the keypad interrupt is triggered,
disable_row_irqs() is called by both matrix_keypad_interrupt() and
matrix_keypad_stop() at the same time. then disable_row_irqs() is called
twice, and the device enter suspend state before keypad->work is executed.
At this condition the device will start keypad and enable irq once after
resume. and then irqs are disabled yet because irqs are disabled twice and
only enable once.

Take lock around keypad->stopped to ensure irqs operation is in atomic
operation.

Signed-off-by: Zhang Bo 
---
Changes in v3:
  - delete no needed lock protection, only add lock in matrix_keypad_stop().
Changes in v2:
  - Change commit message and full name in the signed-off-by tag.

 drivers/input/keyboard/matrix_keypad.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/input/keyboard/matrix_keypad.c 
b/drivers/input/keyboard/matrix_keypad.c
index 1f316d66e6f7..41614c185918 100644
--- a/drivers/input/keyboard/matrix_keypad.c
+++ b/drivers/input/keyboard/matrix_keypad.c
@@ -218,8 +218,10 @@ static void matrix_keypad_stop(struct input_dev *dev)
 {
struct matrix_keypad *keypad = input_get_drvdata(dev);
 
+   spin_lock_irq(>lock);
keypad->stopped = true;
-   mb();
+   spin_unlock_irq(>lock);
+
flush_work(>work.work);
/*
 * matrix_keypad_scan() will leave IRQs enabled;
-- 
2.14.3




Re: [PATCH 1/3] perf tools: Fix period/freq terms setup

2018-02-03 Thread Stephane Eranian
On Sat, Feb 3, 2018 at 7:30 AM, Jiri Olsa  wrote:
> On Fri, Feb 02, 2018 at 10:45:46AM -0800, Stephane Eranian wrote:
>> Jiri,
>>
>> On Thu, Feb 1, 2018 at 12:38 AM, Jiri Olsa  wrote:
>> > Stephane reported that we don't set properly PERIOD
>> > sample type for events with period term defined.
>> >
>> > Before:
>> >   $ perf record -e cpu/cpu-cycles,period=1000/u ls
>> >   $ perf evlist -v
>> >   cpu/cpu-cycles,period=1000/u: ... sample_type: IP|TID|TIME|PERIOD, ...
>> >
>> > After:
>> >   $ perf record -e cpu/cpu-cycles,period=1000/u ls
>> >   $ perf evlist -v
>> >   cpu/cpu-cycles,period=1000/u: ... sample_type: IP|TID|TIME, ...
>> >
>> > Setting PERIOD sample type based on period term setup.
>> >
>> there is still one problem remaining here. It has to do with the handling
>> of cycles:pp or :p or :ppp. Suppose I want to set a period for it while I am
>> also sampling on other events: Something like:
>>
>> $ perf record -e
>> cycles:pp,instructions,cpu/event=0xd0,umaks=0x81,period=10/ .
>>
>> I want to set the period for cycles:pp, but not for instructions. I
>> cannot use -c because
>> it would also force a period on instructions. I could use the raw hw
>> raw event code for cycles:pp.
>> But that does not work because recent kernels prevent use of hw
>> filters on events programmed
>> for PEBS, e.g., cpu/event=0xc2,umask=0x1,cmask=16,inv/pp is rejected.
>> PEBS does not support filters.
>> It works in the case of cycles:pp simply by the nature on the
>> underlying event and the stalls.
>>
>> To get precise cycles, the only event syntax you can use is cycles:pp,
>> but then you cannot specify
>> an event-specific period. This needs to be fixed as well.
>
> you can use p modifier behind like: cpu/.../pp
>
>>
>> I'd like to be able to say:
>>
>> $ perf record -e
>> cycles:pp:period=1001,cpu/event=0xd0,umaks=0x81,period=10/
>>
>> Or something equivalent.
>
> and you can specify terms for hw events like 'cycles'
>
> [jolsa@krava perf]$ sudo ./perf record -e 
> 'cycles/period=1001/pp,cpu/event=0xd0,umask=0x81,period=10/'
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.579 MB perf.data (722 samples) ]
>
Ok, I did not know about this syntax. It looks bizarre because you are
using an event name as a PMU instance.
Works for me.
Thanks.

> [jolsa@krava perf]$ ./perf evlist -v
> cycles/period=1001/pp: size: 112, { sample_period, sample_freq }: 
> 1001, sample_type: IP|TID|TIME|ID|CPU, read_format: ID, disabled: 1, 
> inherit: 1, mmap: 1, comm: 1, task: 1, precise_ip: 2, sample_id_all: 1, 
> exclude_guest: 1, mmap2: 1, comm_exec: 1
> cpu/event=0xd0,umask=0x81,period=10/: type: 4, size: 112, config: 0x81d0, 
> { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|ID|CPU, 
> read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
>
> jirka


Re: [PATCH 1/3] perf tools: Fix period/freq terms setup

2018-02-03 Thread Stephane Eranian
On Sat, Feb 3, 2018 at 7:30 AM, Jiri Olsa  wrote:
> On Fri, Feb 02, 2018 at 10:45:46AM -0800, Stephane Eranian wrote:
>> Jiri,
>>
>> On Thu, Feb 1, 2018 at 12:38 AM, Jiri Olsa  wrote:
>> > Stephane reported that we don't set properly PERIOD
>> > sample type for events with period term defined.
>> >
>> > Before:
>> >   $ perf record -e cpu/cpu-cycles,period=1000/u ls
>> >   $ perf evlist -v
>> >   cpu/cpu-cycles,period=1000/u: ... sample_type: IP|TID|TIME|PERIOD, ...
>> >
>> > After:
>> >   $ perf record -e cpu/cpu-cycles,period=1000/u ls
>> >   $ perf evlist -v
>> >   cpu/cpu-cycles,period=1000/u: ... sample_type: IP|TID|TIME, ...
>> >
>> > Setting PERIOD sample type based on period term setup.
>> >
>> there is still one problem remaining here. It has to do with the handling
>> of cycles:pp or :p or :ppp. Suppose I want to set a period for it while I am
>> also sampling on other events: Something like:
>>
>> $ perf record -e
>> cycles:pp,instructions,cpu/event=0xd0,umaks=0x81,period=10/ .
>>
>> I want to set the period for cycles:pp, but not for instructions. I
>> cannot use -c because
>> it would also force a period on instructions. I could use the raw hw
>> raw event code for cycles:pp.
>> But that does not work because recent kernels prevent use of hw
>> filters on events programmed
>> for PEBS, e.g., cpu/event=0xc2,umask=0x1,cmask=16,inv/pp is rejected.
>> PEBS does not support filters.
>> It works in the case of cycles:pp simply by the nature on the
>> underlying event and the stalls.
>>
>> To get precise cycles, the only event syntax you can use is cycles:pp,
>> but then you cannot specify
>> an event-specific period. This needs to be fixed as well.
>
> you can use p modifier behind like: cpu/.../pp
>
>>
>> I'd like to be able to say:
>>
>> $ perf record -e
>> cycles:pp:period=1001,cpu/event=0xd0,umaks=0x81,period=10/
>>
>> Or something equivalent.
>
> and you can specify terms for hw events like 'cycles'
>
> [jolsa@krava perf]$ sudo ./perf record -e 
> 'cycles/period=1001/pp,cpu/event=0xd0,umask=0x81,period=10/'
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.579 MB perf.data (722 samples) ]
>
Ok, I did not know about this syntax. It looks bizarre because you are
using an event name as a PMU instance.
Works for me.
Thanks.

> [jolsa@krava perf]$ ./perf evlist -v
> cycles/period=1001/pp: size: 112, { sample_period, sample_freq }: 
> 1001, sample_type: IP|TID|TIME|ID|CPU, read_format: ID, disabled: 1, 
> inherit: 1, mmap: 1, comm: 1, task: 1, precise_ip: 2, sample_id_all: 1, 
> exclude_guest: 1, mmap2: 1, comm_exec: 1
> cpu/event=0xd0,umask=0x81,period=10/: type: 4, size: 112, config: 0x81d0, 
> { sample_period, sample_freq }: 10, sample_type: IP|TID|TIME|ID|CPU, 
> read_format: ID, disabled: 1, inherit: 1, sample_id_all: 1, exclude_guest: 1
>
> jirka


Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

2018-02-03 Thread Andy Lutomirski
On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams  wrote:
> At entry userspace may have populated the extra registers outside the
> syscall calling convention with values that could be useful in a
> speculative execution attack. Clear them to minimize the kernel's attack
> surface. Note, this only clears the extra registers and not the unused
> registers for syscalls less than 6 arguments since those registers are
> likely to be clobbered well before their values could be put to use
> under speculation.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: Andy Lutomirski 
> Suggested-by: Linus Torvalds 
> Reported-by: Andi Kleen 
> Signed-off-by: Dan Williams 
> ---
>  arch/x86/entry/calling.h  |   17 +
>  arch/x86/entry/entry_64.S |1 +
>  2 files changed, 18 insertions(+)
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 3f48f695d5e6..daee2d19e73d 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is 
> built with
> UNWIND_HINT_REGS offset=\offset
> .endm
>
> +   /*
> +* Sanitize extra registers of values that a speculation attack
> +* might want to exploit. In the CONFIG_FRAME_POINTER=y case,
> +* the expectation is that %ebp will be clobbered before it
> +* could be used.
> +*/
> +   .macro CLEAR_EXTRA_REGS_NOSPEC
> +   xorq %r15, %r15
> +   xorq %r14, %r14
> +   xorq %r13, %r13
> +   xorq %r12, %r12
> +   xorl %ebx, %ebx
> +#ifndef CONFIG_FRAME_POINTER
> +   xorl %ebp, %ebp
> +#endif
> +   .endm
> +

Can we make the clears only happen if we have CONFIG_RETPOLINE?  Or is
there maybe some reason why we want this even without retpolines on?

> .macro POP_EXTRA_REGS
> popq %r15
> popq %r14
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index c752abe89d80..46260e951da6 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
> TRACE_IRQS_OFF
>
> /* IRQs are off. */
> +   CLEAR_EXTRA_REGS_NOSPEC

Please put the clears before TRACE_IRQS_OFF to protect users that use tracing.


Re: [PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

2018-02-03 Thread Andy Lutomirski
On Sat, Feb 3, 2018 at 11:21 PM, Dan Williams  wrote:
> At entry userspace may have populated the extra registers outside the
> syscall calling convention with values that could be useful in a
> speculative execution attack. Clear them to minimize the kernel's attack
> surface. Note, this only clears the extra registers and not the unused
> registers for syscalls less than 6 arguments since those registers are
> likely to be clobbered well before their values could be put to use
> under speculation.
>
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: Andy Lutomirski 
> Suggested-by: Linus Torvalds 
> Reported-by: Andi Kleen 
> Signed-off-by: Dan Williams 
> ---
>  arch/x86/entry/calling.h  |   17 +
>  arch/x86/entry/entry_64.S |1 +
>  2 files changed, 18 insertions(+)
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 3f48f695d5e6..daee2d19e73d 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is 
> built with
> UNWIND_HINT_REGS offset=\offset
> .endm
>
> +   /*
> +* Sanitize extra registers of values that a speculation attack
> +* might want to exploit. In the CONFIG_FRAME_POINTER=y case,
> +* the expectation is that %ebp will be clobbered before it
> +* could be used.
> +*/
> +   .macro CLEAR_EXTRA_REGS_NOSPEC
> +   xorq %r15, %r15
> +   xorq %r14, %r14
> +   xorq %r13, %r13
> +   xorq %r12, %r12
> +   xorl %ebx, %ebx
> +#ifndef CONFIG_FRAME_POINTER
> +   xorl %ebp, %ebp
> +#endif
> +   .endm
> +

Can we make the clears only happen if we have CONFIG_RETPOLINE?  Or is
there maybe some reason why we want this even without retpolines on?

> .macro POP_EXTRA_REGS
> popq %r15
> popq %r14
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index c752abe89d80..46260e951da6 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
> TRACE_IRQS_OFF
>
> /* IRQs are off. */
> +   CLEAR_EXTRA_REGS_NOSPEC

Please put the clears before TRACE_IRQS_OFF to protect users that use tracing.


Re: [PATCH v2] Input: matrix_keypad - fix keypad does not response

2018-02-03 Thread Zhang Bo
On Sat, 2018-02-03 at 11:35 -0800, Dmitry Torokhov wrote:
Hi Dmitry Torokhov
> > If matrix_keypad_stop() is calling and the keypad interrupt is
> > triggered,
> > disable_row_irqs() is called by both matrix_keypad_interrupt() and
> > matrix_keypad_stop() at the same time. then disable_row_irqs() is
> > called
> > twice, and the device enter suspend state before keypad->work is
> > executed.
> > At this condition the device will start keypad and enable irq once
> > after
> > resume. and then irqs are disabled yet because irqs are disabled
> > twice and
> > only enable once.
> > 
> > Take lock around keypad->stopped and queue delayed work in
> > matrix_keypad_start() and matrix_keypad_stop() to ensure irqs
> > operation and
> > scheduling scan work are in atomic operation.
> > 
> > Signed-off-by: Zhang Bo 
> > ---
> > Changes in v2:
> >   - Change commit message and full name in the signed-off-by tag.
> > 
> >  drivers/input/keyboard/matrix_keypad.c | 18 --
> >  1 file changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/input/keyboard/matrix_keypad.c
> > b/drivers/input/keyboard/matrix_keypad.c
> > index 1f316d66e6f7..13fe51824637 100644
> > --- a/drivers/input/keyboard/matrix_keypad.c
> > +++ b/drivers/input/keyboard/matrix_keypad.c
> > @@ -169,7 +169,8 @@ static void matrix_keypad_scan(struct
> > work_struct *work)
> > /* Enable IRQs again */
> > spin_lock_irq(>lock);
> > keypad->scan_pending = false;
> > -   enable_row_irqs(keypad);
> > +   if (keypad->stopped == false)
> > +   enable_row_irqs(keypad);
> > spin_unlock_irq(>lock);
> >  }
> >  
> > @@ -202,14 +203,16 @@ static int matrix_keypad_start(struct
> > input_dev *dev)
> >  {
> > struct matrix_keypad *keypad = input_get_drvdata(dev);
> >  
> > +   spin_lock_irq(>lock);
> > keypad->stopped = false;
> > -   mb();
> >  
> > /*
> >  * Schedule an immediate key scan to capture current key
> > state;
> >  * columns will be activated and IRQs be enabled after the
> > scan.
> >  */
> > -   schedule_delayed_work(>work, 0);
> > +   if (keypad->scan_pending == false)
> 
> How can we have the pending scan if the keypad was disabled.
> 
> > +   schedule_delayed_work(>work, 0);
> > +   spin_unlock_irq(>lock);
> 
> I do not think the change to matrix_keypad_start() is needed. If
> device
> is quiesced we do not have issue of ISR racing with us here.
> 
you are right, irqs are disabled and worker is finished in
matrix_keypad_stop(), so
there is no pending scaning and the if condition here and lock are not
needed.
> >  
> > return 0;
> >  }
> > @@ -218,14 +221,17 @@ static void matrix_keypad_stop(struct
> > input_dev *dev)
> >  {
> > struct matrix_keypad *keypad = input_get_drvdata(dev);
> >  
> > +   spin_lock_irq(>lock);
> > keypad->stopped = true;
> > -   mb();
> > -   flush_work(>work.work);
> > /*
> >  * matrix_keypad_scan() will leave IRQs enabled;
> >  * we should disable them now.
> >  */
> > -   disable_row_irqs(keypad);
> > +   if (keypad->scan_pending == false)
> > +   disable_row_irqs(keypad);
> > +   spin_unlock_irq(>lock);
> > +
> > +   flush_work(>work.work);
> 
> This is wrong, you should not have moved the flush_work() here. The
> logic is as follows:
> 
> - set the "stopped" flag
> - ensure that ISR has completed
> - ensure that work item has finished (by doing flush_work()) - this
> will
>   make sure that interrupts are enabled (either ISR noticed "stopped
>   flag" and did not touch them, or ISR scheduled work and work item
>   re-enabled them)
> - finally disable IRQs
> 
> Your change breaks this.
> 
> As far as I can see, the only change that is needed is this at the
> beginning of matrix_keypad_stop():
> 
>   spin_lock_irq(>lock);
>   keypad->stopped = true;
>   spin_unlock_irq(>lock);
> 
> Thanks.
> 
yes, only protecting keypad->stopped ensures irqs are disabled only
once.



Re: [PATCH v2] Input: matrix_keypad - fix keypad does not response

2018-02-03 Thread Zhang Bo
On Sat, 2018-02-03 at 11:35 -0800, Dmitry Torokhov wrote:
Hi Dmitry Torokhov
> > If matrix_keypad_stop() is calling and the keypad interrupt is
> > triggered,
> > disable_row_irqs() is called by both matrix_keypad_interrupt() and
> > matrix_keypad_stop() at the same time. then disable_row_irqs() is
> > called
> > twice, and the device enter suspend state before keypad->work is
> > executed.
> > At this condition the device will start keypad and enable irq once
> > after
> > resume. and then irqs are disabled yet because irqs are disabled
> > twice and
> > only enable once.
> > 
> > Take lock around keypad->stopped and queue delayed work in
> > matrix_keypad_start() and matrix_keypad_stop() to ensure irqs
> > operation and
> > scheduling scan work are in atomic operation.
> > 
> > Signed-off-by: Zhang Bo 
> > ---
> > Changes in v2:
> >   - Change commit message and full name in the signed-off-by tag.
> > 
> >  drivers/input/keyboard/matrix_keypad.c | 18 --
> >  1 file changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/input/keyboard/matrix_keypad.c
> > b/drivers/input/keyboard/matrix_keypad.c
> > index 1f316d66e6f7..13fe51824637 100644
> > --- a/drivers/input/keyboard/matrix_keypad.c
> > +++ b/drivers/input/keyboard/matrix_keypad.c
> > @@ -169,7 +169,8 @@ static void matrix_keypad_scan(struct
> > work_struct *work)
> > /* Enable IRQs again */
> > spin_lock_irq(>lock);
> > keypad->scan_pending = false;
> > -   enable_row_irqs(keypad);
> > +   if (keypad->stopped == false)
> > +   enable_row_irqs(keypad);
> > spin_unlock_irq(>lock);
> >  }
> >  
> > @@ -202,14 +203,16 @@ static int matrix_keypad_start(struct
> > input_dev *dev)
> >  {
> > struct matrix_keypad *keypad = input_get_drvdata(dev);
> >  
> > +   spin_lock_irq(>lock);
> > keypad->stopped = false;
> > -   mb();
> >  
> > /*
> >  * Schedule an immediate key scan to capture current key
> > state;
> >  * columns will be activated and IRQs be enabled after the
> > scan.
> >  */
> > -   schedule_delayed_work(>work, 0);
> > +   if (keypad->scan_pending == false)
> 
> How can we have the pending scan if the keypad was disabled.
> 
> > +   schedule_delayed_work(>work, 0);
> > +   spin_unlock_irq(>lock);
> 
> I do not think the change to matrix_keypad_start() is needed. If
> device
> is quiesced we do not have issue of ISR racing with us here.
> 
you are right, irqs are disabled and worker is finished in
matrix_keypad_stop(), so
there is no pending scaning and the if condition here and lock are not
needed.
> >  
> > return 0;
> >  }
> > @@ -218,14 +221,17 @@ static void matrix_keypad_stop(struct
> > input_dev *dev)
> >  {
> > struct matrix_keypad *keypad = input_get_drvdata(dev);
> >  
> > +   spin_lock_irq(>lock);
> > keypad->stopped = true;
> > -   mb();
> > -   flush_work(>work.work);
> > /*
> >  * matrix_keypad_scan() will leave IRQs enabled;
> >  * we should disable them now.
> >  */
> > -   disable_row_irqs(keypad);
> > +   if (keypad->scan_pending == false)
> > +   disable_row_irqs(keypad);
> > +   spin_unlock_irq(>lock);
> > +
> > +   flush_work(>work.work);
> 
> This is wrong, you should not have moved the flush_work() here. The
> logic is as follows:
> 
> - set the "stopped" flag
> - ensure that ISR has completed
> - ensure that work item has finished (by doing flush_work()) - this
> will
>   make sure that interrupts are enabled (either ISR noticed "stopped
>   flag" and did not touch them, or ISR scheduled work and work item
>   re-enabled them)
> - finally disable IRQs
> 
> Your change breaks this.
> 
> As far as I can see, the only change that is needed is this at the
> beginning of matrix_keypad_stop():
> 
>   spin_lock_irq(>lock);
>   keypad->stopped = true;
>   spin_unlock_irq(>lock);
> 
> Thanks.
> 
yes, only protecting keypad->stopped ensures irqs are disabled only
once.



[PATCH] parport_pc: Add support for WCH CH382L PCI-E single parallel port card.

2018-02-03 Thread Alexander Gerasiov
WCH CH382L is a PCI-E adapter with 1 parallel port. It is similair to CH382
but serial ports are not soldered on board. Detected as
Serial controller: Device 1c00:3050 (rev 10) (prog-if 05 [16850])

Signed-off-by: Alexander Gerasiov 
---
 drivers/parport/parport_pc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/parport/parport_pc.c b/drivers/parport/parport_pc.c
index 489492b608cf..380916bff9e0 100644
--- a/drivers/parport/parport_pc.c
+++ b/drivers/parport/parport_pc.c
@@ -2646,6 +2646,7 @@ enum parport_pc_pci_cards {
netmos_9901,
netmos_9865,
quatech_sppxp100,
+   wch_ch382l,
 };
 
 
@@ -2708,6 +2709,7 @@ static struct parport_pc_pci {
/* netmos_9901 */   { 1, { { 0, -1 }, } },
/* netmos_9865 */   { 1, { { 0, -1 }, } },
/* quatech_sppxp100 */  { 1, { { 0, 1 }, } },
+   /* wch_ch382l */{ 1, { { 2, -1 }, } },
 };
 
 static const struct pci_device_id parport_pc_pci_tbl[] = {
@@ -2797,6 +2799,8 @@ static const struct pci_device_id parport_pc_pci_tbl[] = {
/* Quatech SPPXP-100 Parallel port PCI ExpressCard */
{ PCI_VENDOR_ID_QUATECH, PCI_DEVICE_ID_QUATECH_SPPXP_100,
  PCI_ANY_ID, PCI_ANY_ID, 0, 0, quatech_sppxp100 },
+   /* WCH CH382L PCI-E single parallel port card */
+   { 0x1c00, 0x3050, 0x1c00, 0x3050, 0, 0, wch_ch382l },
{ 0, } /* terminate list */
 };
 MODULE_DEVICE_TABLE(pci, parport_pc_pci_tbl);
-- 
2.11.0



[PATCH] parport_pc: Add support for WCH CH382L PCI-E single parallel port card.

2018-02-03 Thread Alexander Gerasiov
WCH CH382L is a PCI-E adapter with 1 parallel port. It is similair to CH382
but serial ports are not soldered on board. Detected as
Serial controller: Device 1c00:3050 (rev 10) (prog-if 05 [16850])

Signed-off-by: Alexander Gerasiov 
---
 drivers/parport/parport_pc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/parport/parport_pc.c b/drivers/parport/parport_pc.c
index 489492b608cf..380916bff9e0 100644
--- a/drivers/parport/parport_pc.c
+++ b/drivers/parport/parport_pc.c
@@ -2646,6 +2646,7 @@ enum parport_pc_pci_cards {
netmos_9901,
netmos_9865,
quatech_sppxp100,
+   wch_ch382l,
 };
 
 
@@ -2708,6 +2709,7 @@ static struct parport_pc_pci {
/* netmos_9901 */   { 1, { { 0, -1 }, } },
/* netmos_9865 */   { 1, { { 0, -1 }, } },
/* quatech_sppxp100 */  { 1, { { 0, 1 }, } },
+   /* wch_ch382l */{ 1, { { 2, -1 }, } },
 };
 
 static const struct pci_device_id parport_pc_pci_tbl[] = {
@@ -2797,6 +2799,8 @@ static const struct pci_device_id parport_pc_pci_tbl[] = {
/* Quatech SPPXP-100 Parallel port PCI ExpressCard */
{ PCI_VENDOR_ID_QUATECH, PCI_DEVICE_ID_QUATECH_SPPXP_100,
  PCI_ANY_ID, PCI_ANY_ID, 0, 0, quatech_sppxp100 },
+   /* WCH CH382L PCI-E single parallel port card */
+   { 0x1c00, 0x3050, 0x1c00, 0x3050, 0, 0, wch_ch382l },
{ 0, } /* terminate list */
 };
 MODULE_DEVICE_TABLE(pci, parport_pc_pci_tbl);
-- 
2.11.0



[PATCH] parport_pc: Add support for WCH CH382L PCI-E single parallel port card.

2018-02-03 Thread Alexander Gerasiov
WCH CH382L is a PCI-E adapter with 1 parallel port. It is similair to CH382
but serial ports are not soldered on board. Detected as
Serial controller: Device 1c00:3050 (rev 10) (prog-if 05 [16850])

Signed-off-by: Alexander Gerasiov 
---
 drivers/parport/parport_pc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/parport/parport_pc.c b/drivers/parport/parport_pc.c
index 489492b608cf..aa6bb50384ee 100644
--- a/drivers/parport/parport_pc.c
+++ b/drivers/parport/parport_pc.c
@@ -2646,6 +2646,7 @@ enum parport_pc_pci_cards {
netmos_9901,
netmos_9865,
quatech_sppxp100,
+   wch_ch382l,
 };
 
 
@@ -2708,6 +2709,7 @@ static struct parport_pc_pci {
/* netmos_9901 */   { 1, { { 0, -1 }, } },
/* netmos_9865 */   { 1, { { 0, -1 }, } },
/* quatech_sppxp100 */  { 1, { { 0, 1 }, } },
+   /* wch_ch382l */{ 1, { { 2, -1 }, } },
 };
 
 static const struct pci_device_id parport_pc_pci_tbl[] = {
@@ -2797,6 +2799,8 @@ static const struct pci_device_id parport_pc_pci_tbl[] = {
/* Quatech SPPXP-100 Parallel port PCI ExpressCard */
{ PCI_VENDOR_ID_QUATECH, PCI_DEVICE_ID_QUATECH_SPPXP_100,
  PCI_ANY_ID, PCI_ANY_ID, 0, 0, quatech_sppxp100 },
+   /* WCH CH382L PCI-E single parallel port card */
+   { 0x1c00, 0x3050, 0x1c00, 0x3050, 0, 0, wch_ch382l},
{ 0, } /* terminate list */
 };
 MODULE_DEVICE_TABLE(pci, parport_pc_pci_tbl);
-- 
2.11.0



[PATCH] parport_pc: Add support for WCH CH382L PCI-E single parallel port card.

2018-02-03 Thread Alexander Gerasiov
WCH CH382L is a PCI-E adapter with 1 parallel port. It is similair to CH382
but serial ports are not soldered on board. Detected as
Serial controller: Device 1c00:3050 (rev 10) (prog-if 05 [16850])

Signed-off-by: Alexander Gerasiov 
---
 drivers/parport/parport_pc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/parport/parport_pc.c b/drivers/parport/parport_pc.c
index 489492b608cf..aa6bb50384ee 100644
--- a/drivers/parport/parport_pc.c
+++ b/drivers/parport/parport_pc.c
@@ -2646,6 +2646,7 @@ enum parport_pc_pci_cards {
netmos_9901,
netmos_9865,
quatech_sppxp100,
+   wch_ch382l,
 };
 
 
@@ -2708,6 +2709,7 @@ static struct parport_pc_pci {
/* netmos_9901 */   { 1, { { 0, -1 }, } },
/* netmos_9865 */   { 1, { { 0, -1 }, } },
/* quatech_sppxp100 */  { 1, { { 0, 1 }, } },
+   /* wch_ch382l */{ 1, { { 2, -1 }, } },
 };
 
 static const struct pci_device_id parport_pc_pci_tbl[] = {
@@ -2797,6 +2799,8 @@ static const struct pci_device_id parport_pc_pci_tbl[] = {
/* Quatech SPPXP-100 Parallel port PCI ExpressCard */
{ PCI_VENDOR_ID_QUATECH, PCI_DEVICE_ID_QUATECH_SPPXP_100,
  PCI_ANY_ID, PCI_ANY_ID, 0, 0, quatech_sppxp100 },
+   /* WCH CH382L PCI-E single parallel port card */
+   { 0x1c00, 0x3050, 0x1c00, 0x3050, 0, 0, wch_ch382l},
{ 0, } /* terminate list */
 };
 MODULE_DEVICE_TABLE(pci, parport_pc_pci_tbl);
-- 
2.11.0



[PATCH 3/3] x86/entry: Clear registers for compat syscalls

2018-02-03 Thread Dan Williams
From: Andi Kleen 

At entry userspace may have populated registers with values that could
be useful in a speculative execution attack. Clear them to minimize the
kernel's attack surface.

[djbw: rename the macro, only clear the extra registers]
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Andy Lutomirski 
Signed-off-by: Andi Kleen 
Signed-off-by: Dan Williams 
---
 arch/x86/entry/entry_64_compat.S |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 98d5358e4041..f55b018a580b 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -95,6 +95,7 @@ ENTRY(entry_SYSENTER_compat)
pushq   $0  /* pt_regs->r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
cld
+   CLEAR_EXTRA_REGS_NOSPEC
 
/*
 * SYSENTER doesn't filter flags, so we need to clear NT and AC
@@ -223,6 +224,7 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
pushq   $0  /* pt_regs->r13 = 0 */
pushq   $0  /* pt_regs->r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
+   CLEAR_EXTRA_REGS_NOSPEC
 
/*
 * User mode is traced as though IRQs are on, and SYSENTER
@@ -348,6 +350,7 @@ ENTRY(entry_INT80_compat)
pushq   %r14/* pt_regs->r14 */
pushq   %r15/* pt_regs->r15 */
cld
+   CLEAR_EXTRA_REGS_NOSPEC
 
/*
 * User mode is traced as though IRQs are on, and the interrupt



[PATCH 3/3] x86/entry: Clear registers for compat syscalls

2018-02-03 Thread Dan Williams
From: Andi Kleen 

At entry userspace may have populated registers with values that could
be useful in a speculative execution attack. Clear them to minimize the
kernel's attack surface.

[djbw: rename the macro, only clear the extra registers]
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Andy Lutomirski 
Signed-off-by: Andi Kleen 
Signed-off-by: Dan Williams 
---
 arch/x86/entry/entry_64_compat.S |3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 98d5358e4041..f55b018a580b 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -95,6 +95,7 @@ ENTRY(entry_SYSENTER_compat)
pushq   $0  /* pt_regs->r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
cld
+   CLEAR_EXTRA_REGS_NOSPEC
 
/*
 * SYSENTER doesn't filter flags, so we need to clear NT and AC
@@ -223,6 +224,7 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
pushq   $0  /* pt_regs->r13 = 0 */
pushq   $0  /* pt_regs->r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
+   CLEAR_EXTRA_REGS_NOSPEC
 
/*
 * User mode is traced as though IRQs are on, and SYSENTER
@@ -348,6 +350,7 @@ ENTRY(entry_INT80_compat)
pushq   %r14/* pt_regs->r14 */
pushq   %r15/* pt_regs->r15 */
cld
+   CLEAR_EXTRA_REGS_NOSPEC
 
/*
 * User mode is traced as though IRQs are on, and the interrupt



[PATCH 2/3] x86/entry: Clear registers for 64bit exceptions/interrupts

2018-02-03 Thread Dan Williams
From: Andi Kleen 

Clear the 'extra' registers on entering the 64bit kernel for exceptions
and interrupts. The common registers are not cleared since they are
likely clobbered well before they can be exploited in a speculative
execution attack.

Signed-off-by: Andi Kleen 
Signed-off-by: Dan Williams 
---
 arch/x86/entry/entry_64.S |5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 46260e951da6..d73eedf1eb47 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -563,6 +563,7 @@ END(irq_entries_start)
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER
 
testb   $3, CS(%rsp)
@@ -1121,6 +1122,7 @@ ENTRY(xen_failsafe_callback)
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER
jmp error_exit
 END(xen_failsafe_callback)
@@ -1166,6 +1168,7 @@ ENTRY(paranoid_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER 8
movl$1, %ebx
movl$MSR_GS_BASE, %ecx
@@ -1218,6 +1221,7 @@ ENTRY(error_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER 8
xorl%ebx, %ebx
testb   $3, CS+8(%rsp)
@@ -1416,6 +1420,7 @@ ENTRY(nmi)
pushq   %r14/* pt_regs->r14 */
pushq   %r15/* pt_regs->r15 */
UNWIND_HINT_REGS
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER
 
/*



[PATCH 2/3] x86/entry: Clear registers for 64bit exceptions/interrupts

2018-02-03 Thread Dan Williams
From: Andi Kleen 

Clear the 'extra' registers on entering the 64bit kernel for exceptions
and interrupts. The common registers are not cleared since they are
likely clobbered well before they can be exploited in a speculative
execution attack.

Signed-off-by: Andi Kleen 
Signed-off-by: Dan Williams 
---
 arch/x86/entry/entry_64.S |5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 46260e951da6..d73eedf1eb47 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -563,6 +563,7 @@ END(irq_entries_start)
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER
 
testb   $3, CS(%rsp)
@@ -1121,6 +1122,7 @@ ENTRY(xen_failsafe_callback)
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER
jmp error_exit
 END(xen_failsafe_callback)
@@ -1166,6 +1168,7 @@ ENTRY(paranoid_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER 8
movl$1, %ebx
movl$MSR_GS_BASE, %ecx
@@ -1218,6 +1221,7 @@ ENTRY(error_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER 8
xorl%ebx, %ebx
testb   $3, CS+8(%rsp)
@@ -1416,6 +1420,7 @@ ENTRY(nmi)
pushq   %r14/* pt_regs->r14 */
pushq   %r15/* pt_regs->r15 */
UNWIND_HINT_REGS
+   CLEAR_EXTRA_REGS_NOSPEC
ENCODE_FRAME_POINTER
 
/*



[PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

2018-02-03 Thread Dan Williams
At entry userspace may have populated the extra registers outside the
syscall calling convention with values that could be useful in a
speculative execution attack. Clear them to minimize the kernel's attack
surface. Note, this only clears the extra registers and not the unused
registers for syscalls less than 6 arguments since those registers are
likely to be clobbered well before their values could be put to use
under speculation.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Andy Lutomirski 
Suggested-by: Linus Torvalds 
Reported-by: Andi Kleen 
Signed-off-by: Dan Williams 
---
 arch/x86/entry/calling.h  |   17 +
 arch/x86/entry/entry_64.S |1 +
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 3f48f695d5e6..daee2d19e73d 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is 
built with
UNWIND_HINT_REGS offset=\offset
.endm
 
+   /*
+* Sanitize extra registers of values that a speculation attack
+* might want to exploit. In the CONFIG_FRAME_POINTER=y case,
+* the expectation is that %ebp will be clobbered before it
+* could be used.
+*/
+   .macro CLEAR_EXTRA_REGS_NOSPEC
+   xorq %r15, %r15
+   xorq %r14, %r14
+   xorq %r13, %r13
+   xorq %r12, %r12
+   xorl %ebx, %ebx
+#ifndef CONFIG_FRAME_POINTER
+   xorl %ebp, %ebp
+#endif
+   .endm
+
.macro POP_EXTRA_REGS
popq %r15
popq %r14
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c752abe89d80..46260e951da6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
TRACE_IRQS_OFF
 
/* IRQs are off. */
+   CLEAR_EXTRA_REGS_NOSPEC
movq%rsp, %rdi
calldo_syscall_64   /* returns with IRQs disabled */
 



[PATCH 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

2018-02-03 Thread Dan Williams
At entry userspace may have populated the extra registers outside the
syscall calling convention with values that could be useful in a
speculative execution attack. Clear them to minimize the kernel's attack
surface. Note, this only clears the extra registers and not the unused
registers for syscalls less than 6 arguments since those registers are
likely to be clobbered well before their values could be put to use
under speculation.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Andy Lutomirski 
Suggested-by: Linus Torvalds 
Reported-by: Andi Kleen 
Signed-off-by: Dan Williams 
---
 arch/x86/entry/calling.h  |   17 +
 arch/x86/entry/entry_64.S |1 +
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 3f48f695d5e6..daee2d19e73d 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -147,6 +147,23 @@ For 32-bit we have the following conventions - kernel is 
built with
UNWIND_HINT_REGS offset=\offset
.endm
 
+   /*
+* Sanitize extra registers of values that a speculation attack
+* might want to exploit. In the CONFIG_FRAME_POINTER=y case,
+* the expectation is that %ebp will be clobbered before it
+* could be used.
+*/
+   .macro CLEAR_EXTRA_REGS_NOSPEC
+   xorq %r15, %r15
+   xorq %r14, %r14
+   xorq %r13, %r13
+   xorq %r12, %r12
+   xorl %ebx, %ebx
+#ifndef CONFIG_FRAME_POINTER
+   xorl %ebp, %ebp
+#endif
+   .endm
+
.macro POP_EXTRA_REGS
popq %r15
popq %r14
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c752abe89d80..46260e951da6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -247,6 +247,7 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
TRACE_IRQS_OFF
 
/* IRQs are off. */
+   CLEAR_EXTRA_REGS_NOSPEC
movq%rsp, %rdi
calldo_syscall_64   /* returns with IRQs disabled */
 



[PATCH 0/3] x86/entry: clear registers to sanitize speculative usages

2018-02-03 Thread Dan Williams
At entry userspace may have populated callee saved registers with values
that could be useful in a speculative execution attack. Clear them to
minimize the kernel's attack surface.

Note, this is done to make it harder to find / manipulate exploitable
sequences in the kernel.

The clearing is limited to the 64-bit 'extra' registers since those are
the most likely to survive with user populated values deep into the call
chain. Normal register pressure likely clobbers values in the lower
registers and the 32-bit case.

As for cycle impact on my Sandy Bridge test system it can handle the xor
sequence at 3.5 instructions per cycle.

---

Andi Kleen (2):
  x86/entry: Clear registers for 64bit exceptions/interrupts
  x86/entry: Clear registers for compat syscalls

Dan Williams (1):
  x86/entry: Clear extra registers beyond syscall arguments for 64bit 
kernels


 arch/x86/entry/calling.h |   17 +
 arch/x86/entry/entry_64.S|6 ++
 arch/x86/entry/entry_64_compat.S |3 +++
 3 files changed, 26 insertions(+)


[PATCH 0/3] x86/entry: clear registers to sanitize speculative usages

2018-02-03 Thread Dan Williams
At entry userspace may have populated callee saved registers with values
that could be useful in a speculative execution attack. Clear them to
minimize the kernel's attack surface.

Note, this is done to make it harder to find / manipulate exploitable
sequences in the kernel.

The clearing is limited to the 64-bit 'extra' registers since those are
the most likely to survive with user populated values deep into the call
chain. Normal register pressure likely clobbers values in the lower
registers and the 32-bit case.

As for cycle impact on my Sandy Bridge test system it can handle the xor
sequence at 3.5 instructions per cycle.

---

Andi Kleen (2):
  x86/entry: Clear registers for 64bit exceptions/interrupts
  x86/entry: Clear registers for compat syscalls

Dan Williams (1):
  x86/entry: Clear extra registers beyond syscall arguments for 64bit 
kernels


 arch/x86/entry/calling.h |   17 +
 arch/x86/entry/entry_64.S|6 ++
 arch/x86/entry/entry_64_compat.S |3 +++
 3 files changed, 26 insertions(+)


[tip:x86/pti] KVM/x86: Add IBPB support

2018-02-03 Thread tip-bot for Ashok Raj
Commit-ID:  15d45071523d89b3fb7372e2135fbd72f6af9506
Gitweb: https://git.kernel.org/tip/15d45071523d89b3fb7372e2135fbd72f6af9506
Author: Ashok Raj 
AuthorDate: Thu, 1 Feb 2018 22:59:43 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:51 +0100

KVM/x86: Add IBPB support

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
there is a IBPB in that path.
(Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
   - vmx: expose PRED_CMD if guest has it in CPUID
   - svm: only pass through IBPB if guest has it in CPUID
   - vmx: support !cpu_has_vmx_msr_bitmap()]
   - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]

Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: k...@vger.kernel.org
Cc: Asit Mallick 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Greg KH 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: Dan Williams 
Cc: Tim Chen 
Link: 
http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com
Link: 
https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/cpuid.c | 11 +++-
 arch/x86/kvm/svm.c   | 28 ++
 arch/x86/kvm/vmx.c   | 80 ++--
 3 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+   /* cpuid 0x8008.ebx */
+   const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+   F(IBPB);
+
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
-   entry->ebx = entry->edx = 0;
+   entry->edx = 0;
+   /* IBPB isn't necessarily present in hardware cpuid */
+   if 

[tip:x86/pti] KVM/x86: Add IBPB support

2018-02-03 Thread tip-bot for Ashok Raj
Commit-ID:  15d45071523d89b3fb7372e2135fbd72f6af9506
Gitweb: https://git.kernel.org/tip/15d45071523d89b3fb7372e2135fbd72f6af9506
Author: Ashok Raj 
AuthorDate: Thu, 1 Feb 2018 22:59:43 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:51 +0100

KVM/x86: Add IBPB support

The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
control mechanism. It keeps earlier branches from influencing
later ones.

Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
It's a command that ensures predicted branch targets aren't used after
the barrier. Although IBRS and IBPB are enumerated by the same CPUID
enumeration, IBPB is very different.

IBPB helps mitigate against three potential attacks:

* Mitigate guests from being attacked by other guests.
  - This is addressed by issing IBPB when we do a guest switch.

* Mitigate attacks from guest/ring3->host/ring3.
  These would require a IBPB during context switch in host, or after
  VMEXIT. The host process has two ways to mitigate
  - Either it can be compiled with retpoline
  - If its going through context switch, and has set !dumpable then
there is a IBPB in that path.
(Tim's patch: https://patchwork.kernel.org/patch/10192871)
  - The case where after a VMEXIT you return back to Qemu might make
Qemu attackable from guest when Qemu isn't compiled with retpoline.
  There are issues reported when doing IBPB on every VMEXIT that resulted
  in some tsc calibration woes in guest.

* Mitigate guest/ring0->host/ring0 attacks.
  When host kernel is using retpoline it is safe against these attacks.
  If host kernel isn't using retpoline we might need to do a IBPB flush on
  every VMEXIT.

Even when using retpoline for indirect calls, in certain conditions 'ret'
can use the BTB on Skylake-era CPUs. There are other mitigations
available like RSB stuffing/clearing.

* IBPB is issued only for SVM during svm_free_vcpu().
  VMX has a vmclear and SVM doesn't.  Follow discussion here:
  https://lkml.org/lkml/2018/1/15/146

Please refer to the following spec for more details on the enumeration
and control.

Refer here to get documentation about mitigations.

https://software.intel.com/en-us/side-channel-security-support

[peterz: rebase and changelog rewrite]
[karahmed: - rebase
   - vmx: expose PRED_CMD if guest has it in CPUID
   - svm: only pass through IBPB if guest has it in CPUID
   - vmx: support !cpu_has_vmx_msr_bitmap()]
   - vmx: support nested]
[dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
PRED_CMD is a write-only MSR]

Signed-off-by: Ashok Raj 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: David Woodhouse 
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: k...@vger.kernel.org
Cc: Asit Mallick 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Arjan Van De Ven 
Cc: Greg KH 
Cc: Jun Nakajima 
Cc: Paolo Bonzini 
Cc: Dan Williams 
Cc: Tim Chen 
Link: 
http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok@intel.com
Link: 
https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/cpuid.c | 11 +++-
 arch/x86/kvm/svm.c   | 28 ++
 arch/x86/kvm/vmx.c   | 80 ++--
 3 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c0eb337..033004d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
+   /* cpuid 0x8008.ebx */
+   const u32 kvm_cpuid_8000_0008_ebx_x86_features =
+   F(IBPB);
+
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
@@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!g_phys_as)
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
-   entry->ebx = entry->edx = 0;
+   entry->edx = 0;
+   /* IBPB isn't necessarily present in hardware cpuid */
+   if (boot_cpu_has(X86_FEATURE_IBPB))
+   entry->ebx |= F(IBPB);
+   entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
+   cpuid_mask(>ebx, CPUID_8000_0008_EBX);
break;
}
case 0x8019:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f40d0da..254eefb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -249,6 +249,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_CSTAR,  

[tip:x86/pti] KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-03 Thread tip-bot for KarimAllah Ahmed
Commit-ID:  b2ac58f90540e39324e7a29a7ad471407ae0bf48
Gitweb: https://git.kernel.org/tip/b2ac58f90540e39324e7a29a7ad471407ae0bf48
Author: KarimAllah Ahmed 
AuthorDate: Sat, 3 Feb 2018 15:56:23 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:52 +0100

KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Paolo Bonzini  ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Darren Kenny 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Jun Nakajima 
Cc: k...@vger.kernel.org
Cc: Dave Hansen 
Cc: Tim Chen 
Cc: Andy Lutomirski 
Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Dan Williams 
Cc: Linus Torvalds 
Cc: Ashok Raj 
Link: 
https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/svm.c | 88 ++
 1 file changed, 88 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 254eefb..4e3c795 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,8 @@ struct vcpu_svm {
u64 gs_base;
} host;
 
+   u64 spec_ctrl;
+
u32 *msrpm;
 
ulong nmi_iret_rip;
@@ -249,6 +251,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_CSTAR,   .always = true  },
{ .index = MSR_SYSCALL_MASK,.always = true  },
 #endif
+   { .index = MSR_IA32_SPEC_CTRL,  .always = false },
{ .index = MSR_IA32_PRED_CMD,   .always = false },
{ .index = MSR_IA32_LASTBRANCHFROMIP,   .always = false },
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
@@ -882,6 +885,25 @@ static bool valid_msr_intercept(u32 index)
return false;
 }
 
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
+{
+   u8 bit_write;
+   unsigned long tmp;
+   u32 offset;
+   u32 *msrpm;
+
+   msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
+ to_svm(vcpu)->msrpm;
+
+   offset= svm_msrpm_offset(msr);
+   bit_write = 2 * (msr & 0x0f) + 1;
+   tmp   = msrpm[offset];
+
+   BUG_ON(offset == MSR_INVALID);
+
+   return !!test_bit(bit_write,  );
+}
+
 static void set_msr_interception(u32 *msrpm, unsigned msr,
 int read, int write)
 {
@@ -1584,6 +1606,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
u32 dummy;
u32 eax = 1;
 
+   svm->spec_ctrl = 0;
+
if (!init_event) {
svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
   MSR_IA32_APICBASE_ENABLE;
@@ -3605,6 +3629,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_VM_CR:
msr_info->data = svm->nested.vm_cr_msr;
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   msr_info->data = svm->spec_ctrl;
+   break;
case MSR_IA32_UCODE_REV:
msr_info->data = 0x0165;
break;
@@ -3696,6 +3727,33 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   /* The STIBP bit doesn't fault even if it's not advertised */
+   if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+   return 1;
+
+   svm->spec_ctrl = data;
+
+   if (!data)
+   break;
+
+   /*
+* For non-nested:
+* When it's written (to non-zero) for the first time, pass
+* it through.
+*
+* For nested:
+* The handling of the MSR bitmap for L2 

[tip:x86/pti] KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-03 Thread tip-bot for KarimAllah Ahmed
Commit-ID:  b2ac58f90540e39324e7a29a7ad471407ae0bf48
Gitweb: https://git.kernel.org/tip/b2ac58f90540e39324e7a29a7ad471407ae0bf48
Author: KarimAllah Ahmed 
AuthorDate: Sat, 3 Feb 2018 15:56:23 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:52 +0100

KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Paolo Bonzini  ]

... basically doing exactly what we do for VMX:

- Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
- Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
  actually used it.

Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Darren Kenny 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Jun Nakajima 
Cc: k...@vger.kernel.org
Cc: Dave Hansen 
Cc: Tim Chen 
Cc: Andy Lutomirski 
Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Dan Williams 
Cc: Linus Torvalds 
Cc: Ashok Raj 
Link: 
https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/svm.c | 88 ++
 1 file changed, 88 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 254eefb..4e3c795 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -184,6 +184,8 @@ struct vcpu_svm {
u64 gs_base;
} host;
 
+   u64 spec_ctrl;
+
u32 *msrpm;
 
ulong nmi_iret_rip;
@@ -249,6 +251,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_CSTAR,   .always = true  },
{ .index = MSR_SYSCALL_MASK,.always = true  },
 #endif
+   { .index = MSR_IA32_SPEC_CTRL,  .always = false },
{ .index = MSR_IA32_PRED_CMD,   .always = false },
{ .index = MSR_IA32_LASTBRANCHFROMIP,   .always = false },
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
@@ -882,6 +885,25 @@ static bool valid_msr_intercept(u32 index)
return false;
 }
 
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr)
+{
+   u8 bit_write;
+   unsigned long tmp;
+   u32 offset;
+   u32 *msrpm;
+
+   msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm:
+ to_svm(vcpu)->msrpm;
+
+   offset= svm_msrpm_offset(msr);
+   bit_write = 2 * (msr & 0x0f) + 1;
+   tmp   = msrpm[offset];
+
+   BUG_ON(offset == MSR_INVALID);
+
+   return !!test_bit(bit_write,  );
+}
+
 static void set_msr_interception(u32 *msrpm, unsigned msr,
 int read, int write)
 {
@@ -1584,6 +1606,8 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
u32 dummy;
u32 eax = 1;
 
+   svm->spec_ctrl = 0;
+
if (!init_event) {
svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
   MSR_IA32_APICBASE_ENABLE;
@@ -3605,6 +3629,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_VM_CR:
msr_info->data = svm->nested.vm_cr_msr;
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   msr_info->data = svm->spec_ctrl;
+   break;
case MSR_IA32_UCODE_REV:
msr_info->data = 0x0165;
break;
@@ -3696,6 +3727,33 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
case MSR_IA32_TSC:
kvm_write_tsc(vcpu, msr);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS))
+   return 1;
+
+   /* The STIBP bit doesn't fault even if it's not advertised */
+   if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP))
+   return 1;
+
+   svm->spec_ctrl = data;
+
+   if (!data)
+   break;
+
+   /*
+* For non-nested:
+* When it's written (to non-zero) for the first time, pass
+* it through.
+*
+* For nested:
+* The handling of the MSR bitmap for L2 guests is done in
+* nested_svm_vmrun_msrpm.
+* We update the L1 MSR bit as well since it will end up
+* touching the MSR anyway now.
+*/
+   set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
+   break;
case MSR_IA32_PRED_CMD:
if (!msr->host_initiated &&
!guest_cpuid_has(vcpu, X86_FEATURE_IBPB))
@@ -4964,6 +5022,15 @@ static void 

[tip:x86/pti] KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-03 Thread tip-bot for KarimAllah Ahmed
Commit-ID:  d28b387fb74da95d69d2615732f50cceb38e9a4d
Gitweb: https://git.kernel.org/tip/d28b387fb74da95d69d2615732f50cceb38e9a4d
Author: KarimAllah Ahmed 
AuthorDate: Thu, 1 Feb 2018 22:59:45 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:52 +0100

KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Ashok Raj  ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Darren Kenny 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jim Mattson 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Jun Nakajima 
Cc: k...@vger.kernel.org
Cc: Dave Hansen 
Cc: Tim Chen 
Cc: Andy Lutomirski 
Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Dan Williams 
Cc: Linus Torvalds 
Cc: Ashok Raj 
Link: 
https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/cpuid.c |   9 +++--
 arch/x86/kvm/vmx.c   | 105 ++-
 arch/x86/kvm/x86.c   |   2 +-
 3 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 0x8008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-   F(IBPB);
+   F(IBPB) | F(IBRS);
 
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+   F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->edx = 0;
-   /* IBPB isn't necessarily present in hardware cpuid */
+   /* IBRS and IBPB aren't necessarily present in hardware cpuid */
if (boot_cpu_has(X86_FEATURE_IBPB))
entry->ebx |= F(IBPB);
+   if (boot_cpu_has(X86_FEATURE_IBRS))
+   entry->ebx |= F(IBRS);
entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
cpuid_mask(>ebx, CPUID_8000_0008_EBX);
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e5f75eb..bee4c49 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -595,6 +595,7 @@ struct vcpu_vmx {
 #endif
 
u64   arch_capabilities;
+   u64   spec_ctrl;
 
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -1911,6 +1912,29 @@ static void update_exception_bitmap(struct kvm_vcpu 
*vcpu)
 }
 
 /*
+ * Check if MSR is intercepted for currently loaded MSR bitmap.
+ */
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+{
+   unsigned long *msr_bitmap;
+   int f = sizeof(unsigned long);
+
+   if (!cpu_has_vmx_msr_bitmap())
+   return true;
+
+   msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+
+   if (msr <= 0x1fff) {
+   return !!test_bit(msr, msr_bitmap + 0x800 / f);
+   } else if ((msr >= 0xc000) && (msr <= 0xc0001fff)) {
+   msr &= 0x1fff;
+   return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+   }
+
+   return true;
+}

[tip:x86/pti] KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

2018-02-03 Thread tip-bot for KarimAllah Ahmed
Commit-ID:  28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
Gitweb: https://git.kernel.org/tip/28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
Author: KarimAllah Ahmed 
AuthorDate: Thu, 1 Feb 2018 22:59:44 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:52 +0100

KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Darren Kenny 
Reviewed-by: Jim Mattson 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Jun Nakajima 
Cc: k...@vger.kernel.org
Cc: Dave Hansen 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Greg KH 
Cc: Dan Williams 
Cc: Tim Chen 
Cc: Ashok Raj 
Link: 
https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 73acdcf..e5f75eb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,8 @@ struct vcpu_vmx {
u64   msr_guest_kernel_gs_base;
 #endif
 
+   u64   arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3260,6 +3262,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+   return 1;
+   msr_info->data = to_vmx(vcpu)->arch_capabilities;
+   break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3395,6 +3403,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, 
MSR_IA32_PRED_CMD,
  MSR_TYPE_W);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated)
+   return 1;
+   vmx->arch_capabilities = data;
+   break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5657,6 +5670,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
 
+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+   MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;


[tip:x86/pti] KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL

2018-02-03 Thread tip-bot for KarimAllah Ahmed
Commit-ID:  d28b387fb74da95d69d2615732f50cceb38e9a4d
Gitweb: https://git.kernel.org/tip/d28b387fb74da95d69d2615732f50cceb38e9a4d
Author: KarimAllah Ahmed 
AuthorDate: Thu, 1 Feb 2018 22:59:45 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:52 +0100

KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL

[ Based on a patch from Ashok Raj  ]

Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
be using a retpoline+IBPB based approach.

To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
guests that do not actually use the MSR, only start saving and restoring
when a non-zero is written to it.

No attempt is made to handle STIBP here, intentionally. Filtering STIBP
may be added in a future patch, which may require trapping all writes
if we don't want to pass it through directly to the guest.

[dwmw2: Clean up CPUID bits, save/restore manually, handle reset]

Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Darren Kenny 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jim Mattson 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Jun Nakajima 
Cc: k...@vger.kernel.org
Cc: Dave Hansen 
Cc: Tim Chen 
Cc: Andy Lutomirski 
Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Greg KH 
Cc: Paolo Bonzini 
Cc: Dan Williams 
Cc: Linus Torvalds 
Cc: Ashok Raj 
Link: 
https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/cpuid.c |   9 +++--
 arch/x86/kvm/vmx.c   | 105 ++-
 arch/x86/kvm/x86.c   |   2 +-
 3 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1909635..13f5d42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 0x8008.ebx */
const u32 kvm_cpuid_8000_0008_ebx_x86_features =
-   F(IBPB);
+   F(IBPB) | F(IBRS);
 
/* cpuid 0xC001.edx */
const u32 kvm_cpuid_C000_0001_edx_x86_features =
@@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
+   F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
g_phys_as = phys_as;
entry->eax = g_phys_as | (virt_as << 8);
entry->edx = 0;
-   /* IBPB isn't necessarily present in hardware cpuid */
+   /* IBRS and IBPB aren't necessarily present in hardware cpuid */
if (boot_cpu_has(X86_FEATURE_IBPB))
entry->ebx |= F(IBPB);
+   if (boot_cpu_has(X86_FEATURE_IBRS))
+   entry->ebx |= F(IBRS);
entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features;
cpuid_mask(>ebx, CPUID_8000_0008_EBX);
break;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e5f75eb..bee4c49 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -595,6 +595,7 @@ struct vcpu_vmx {
 #endif
 
u64   arch_capabilities;
+   u64   spec_ctrl;
 
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
@@ -1911,6 +1912,29 @@ static void update_exception_bitmap(struct kvm_vcpu 
*vcpu)
 }
 
 /*
+ * Check if MSR is intercepted for currently loaded MSR bitmap.
+ */
+static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
+{
+   unsigned long *msr_bitmap;
+   int f = sizeof(unsigned long);
+
+   if (!cpu_has_vmx_msr_bitmap())
+   return true;
+
+   msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
+
+   if (msr <= 0x1fff) {
+   return !!test_bit(msr, msr_bitmap + 0x800 / f);
+   } else if ((msr >= 0xc000) && (msr <= 0xc0001fff)) {
+   msr &= 0x1fff;
+   return !!test_bit(msr, msr_bitmap + 0xc00 / f);
+   }
+
+   return true;
+}
+
+/*
  * Check if MSR is intercepted for L01 MSR bitmap.
  */
 static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
@@ -3262,6 +3286,14 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_SPEC_CTRL:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) &&
+   

[tip:x86/pti] KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

2018-02-03 Thread tip-bot for KarimAllah Ahmed
Commit-ID:  28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
Gitweb: https://git.kernel.org/tip/28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
Author: KarimAllah Ahmed 
AuthorDate: Thu, 1 Feb 2018 22:59:44 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:52 +0100

KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES

Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
(bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
contents will come directly from the hardware, but user-space can still
override it.

[dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]

Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Darren Kenny 
Reviewed-by: Jim Mattson 
Reviewed-by: Konrad Rzeszutek Wilk 
Cc: Andrea Arcangeli 
Cc: Andi Kleen 
Cc: Jun Nakajima 
Cc: k...@vger.kernel.org
Cc: Dave Hansen 
Cc: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Asit Mallick 
Cc: Arjan Van De Ven 
Cc: Greg KH 
Cc: Dan Williams 
Cc: Tim Chen 
Cc: Ashok Raj 
Link: 
https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/cpuid.c |  2 +-
 arch/x86/kvm/vmx.c   | 15 +++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 033004d..1909635 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 73acdcf..e5f75eb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -594,6 +594,8 @@ struct vcpu_vmx {
u64   msr_guest_kernel_gs_base;
 #endif
 
+   u64   arch_capabilities;
+
u32 vm_entry_controls_shadow;
u32 vm_exit_controls_shadow;
u32 secondary_exec_control;
@@ -3260,6 +3262,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_TSC:
msr_info->data = guest_read_tsc(vcpu);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated &&
+   !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+   return 1;
+   msr_info->data = to_vmx(vcpu)->arch_capabilities;
+   break;
case MSR_IA32_SYSENTER_CS:
msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
@@ -3395,6 +3403,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, 
MSR_IA32_PRED_CMD,
  MSR_TYPE_W);
break;
+   case MSR_IA32_ARCH_CAPABILITIES:
+   if (!msr_info->host_initiated)
+   return 1;
+   vmx->arch_capabilities = data;
+   break;
case MSR_IA32_CR_PAT:
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
@@ -5657,6 +5670,8 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
 
+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
 
vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c53298d..4ec142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = {
 #endif
MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA,
MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX,
+   MSR_IA32_ARCH_CAPABILITIES
 };
 
 static unsigned num_msrs_to_save;


[tip:x86/pti] KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX

2018-02-03 Thread tip-bot for KarimAllah Ahmed
Commit-ID:  b7b27aa011a1df42728d1768fc181d9ce69e6911
Gitweb: https://git.kernel.org/tip/b7b27aa011a1df42728d1768fc181d9ce69e6911
Author: KarimAllah Ahmed 
AuthorDate: Thu, 1 Feb 2018 22:59:42 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:51 +0100

KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX

[dwmw2: Stop using KF() for bits in it, too]
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jim Mattson 
Cc: k...@vger.kernel.org
Cc: Radim Krčmář 
Link: 
https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/cpuid.c | 8 +++-
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW 2
-#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+   cpuid_mask(>edx, CPUID_7_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)


[tip:x86/pti] KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX

2018-02-03 Thread tip-bot for KarimAllah Ahmed
Commit-ID:  b7b27aa011a1df42728d1768fc181d9ce69e6911
Gitweb: https://git.kernel.org/tip/b7b27aa011a1df42728d1768fc181d9ce69e6911
Author: KarimAllah Ahmed 
AuthorDate: Thu, 1 Feb 2018 22:59:42 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 3 Feb 2018 23:06:51 +0100

KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX

[dwmw2: Stop using KF() for bits in it, too]
Signed-off-by: KarimAllah Ahmed 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Jim Mattson 
Cc: k...@vger.kernel.org
Cc: Radim Krčmář 
Link: 
https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karah...@amazon.de

---
 arch/x86/kvm/cpuid.c | 8 +++-
 arch/x86/kvm/cpuid.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0099e10..c0eb337 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
-/* These are scattered features in cpufeatures.h. */
-#define KVM_CPUID_BIT_AVX512_4VNNIW 2
-#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+/* For scattered features from cpufeatures.h; we currently expose none */
 #define KF(x) bit(KVM_CPUID_BIT_##x)
 
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 
/* cpuid 7.0.edx*/
const u32 kvm_cpuid_7_0_edx_x86_features =
-   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+   F(AVX512_4VNNIW) | F(AVX512_4FMAPS);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
@@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
entry->ecx &= ~F(PKU);
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
-   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
+   cpuid_mask(>edx, CPUID_7_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2cea66..9a327d5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
[CPUID_8000_000A_EDX] = {0x800a, 0, CPUID_EDX},
[CPUID_7_ECX] = { 7, 0, CPUID_ECX},
[CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
+   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
 };
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature)


[PATCH 3/3] x86/Kconfig : Explicitly enumerate i686-class cpus in Kconfig

2018-02-03 Thread Matthew Whitehead
The X86_P6_NOP config class leaves out many i686-class cpus. Instead,
explicitly enumerate all these cpus.

Using a configuration with M686 currently sets X86_MINIMUM_CPU_FAMILY=5
instead of the correct value 6.

Booting on an i586 it will fail to generate the "This kernel
requires an i686 CPU, but only detected an i586 CPU" message and
intentional halt as expected. It will instead just silently hang
when it hits i686-specific instructions.

Signed-off-by: Matthew Whitehead 
---
 arch/x86/Kconfig.cpu | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index ec64aa7..8b8d229 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -385,7 +385,7 @@ config X86_CMOV
 config X86_MINIMUM_CPU_FAMILY
int
default "64" if X86_64
-   default "6" if X86_32 && X86_P6_NOP
+   default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || 
MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MCRUSOE || 
MCORE2 || MK7 || MK8)
default "5" if X86_32 && X86_CMPXCHG64
default "4"
 
-- 
1.8.3.1



[PATCH 3/3] x86/Kconfig : Explicitly enumerate i686-class cpus in Kconfig

2018-02-03 Thread Matthew Whitehead
The X86_P6_NOP config class leaves out many i686-class cpus. Instead,
explicitly enumerate all these cpus.

Using a configuration with M686 currently sets X86_MINIMUM_CPU_FAMILY=5
instead of the correct value 6.

Booting on an i586 it will fail to generate the "This kernel
requires an i686 CPU, but only detected an i586 CPU" message and
intentional halt as expected. It will instead just silently hang
when it hits i686-specific instructions.

Signed-off-by: Matthew Whitehead 
---
 arch/x86/Kconfig.cpu | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index ec64aa7..8b8d229 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -385,7 +385,7 @@ config X86_CMOV
 config X86_MINIMUM_CPU_FAMILY
int
default "64" if X86_64
-   default "6" if X86_32 && X86_P6_NOP
+   default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || 
MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MCRUSOE || 
MCORE2 || MK7 || MK8)
default "5" if X86_32 && X86_CMPXCHG64
default "4"
 
-- 
1.8.3.1



[PATCH 1/3] x86/Kconfig : Add missing i586-class cpus to X86_CMPXCHG64 Kconfig group

2018-02-03 Thread Matthew Whitehead
Several i586-class cpus supporting this instruction are missing from
the X86_CMPXCHG64 config group.

Using a configuration with either M586TSC or M586MMX currently sets
X86_MINIMUM_CPU_FAMILY=4 instead of the correct value 5.

Booting on an i486 it will fail to generate the "This kernel
requires an i586 CPU, but only detected an i486 CPU" message and
intentional halt as expected. It will instead just silently hang
when it hits i586-specific instructions.

The M586 cpu is not in this list because at least the Cyrix 5x86
lacks this instruction, and perhaps others.

Signed-off-by: Matthew Whitehead 
---
 arch/x86/Kconfig.cpu | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 65a9a47..ec64aa7 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -374,7 +374,7 @@ config X86_TSC
 
 config X86_CMPXCHG64
def_bool y
-   depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || 
MPENTIUMIII || MPENTIUMII || M686 || MATOM
+   depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || 
MPENTIUMIII || MPENTIUMII || M686 || M586TSC || M586MMX || MATOM || MGEODE_LX 
|| MGEODEGX1 || MK6 || MK7 || MK8
 
 # this should be set for all -march=.. options where the compiler
 # generates cmov.
-- 
1.8.3.1



[PATCH 2/3] x86/Kconfig : Exclude i586-class cpus lacking PAE support from HIGHMEM64G Kconfig group

2018-02-03 Thread Matthew Whitehead
i586-class machines also lack support for Physical Address Extension (PAE),
so add them to the exclusion list

Signed-off-by: Matthew Whitehead 
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 423e4b6..c45fe6d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1359,7 +1359,7 @@ config HIGHMEM4G
 
 config HIGHMEM64G
bool "64GB"
-   depends on !M486
+   depends on !M486 && !M586 && !M586TSC && !M586MMX && !MGEODE_LX && 
!MGEODEGX1 && !MCYRIXIII && !MELAN && !MWINCHIPC6 && !WINCHIP3D && !MK6
select X86_PAE
---help---
  Select this if you have a 32-bit processor and more than 4
-- 
1.8.3.1



[PATCH 2/3] x86/Kconfig : Exclude i586-class cpus lacking PAE support from HIGHMEM64G Kconfig group

2018-02-03 Thread Matthew Whitehead
i586-class machines also lack support for Physical Address Extension (PAE),
so add them to the exclusion list

Signed-off-by: Matthew Whitehead 
---
 arch/x86/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 423e4b6..c45fe6d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1359,7 +1359,7 @@ config HIGHMEM4G
 
 config HIGHMEM64G
bool "64GB"
-   depends on !M486
+   depends on !M486 && !M586 && !M586TSC && !M586MMX && !MGEODE_LX && 
!MGEODEGX1 && !MCYRIXIII && !MELAN && !MWINCHIPC6 && !WINCHIP3D && !MK6
select X86_PAE
---help---
  Select this if you have a 32-bit processor and more than 4
-- 
1.8.3.1



[PATCH 1/3] x86/Kconfig : Add missing i586-class cpus to X86_CMPXCHG64 Kconfig group

2018-02-03 Thread Matthew Whitehead
Several i586-class cpus supporting this instruction are missing from
the X86_CMPXCHG64 config group.

Using a configuration with either M586TSC or M586MMX currently sets
X86_MINIMUM_CPU_FAMILY=4 instead of the correct value 5.

Booting on an i486 it will fail to generate the "This kernel
requires an i586 CPU, but only detected an i486 CPU" message and
intentional halt as expected. It will instead just silently hang
when it hits i586-specific instructions.

The M586 cpu is not in this list because at least the Cyrix 5x86
lacks this instruction, and perhaps others.

Signed-off-by: Matthew Whitehead 
---
 arch/x86/Kconfig.cpu | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index 65a9a47..ec64aa7 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -374,7 +374,7 @@ config X86_TSC
 
 config X86_CMPXCHG64
def_bool y
-   depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || 
MPENTIUMIII || MPENTIUMII || M686 || MATOM
+   depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || 
MPENTIUMIII || MPENTIUMII || M686 || M586TSC || M586MMX || MATOM || MGEODE_LX 
|| MGEODEGX1 || MK6 || MK7 || MK8
 
 # this should be set for all -march=.. options where the compiler
 # generates cmov.
-- 
1.8.3.1



Re: [PATCH] audit: update bugtracker and source URIs

2018-02-03 Thread Paul Moore
On Sat, Feb 3, 2018 at 12:33 AM, Richard Guy Briggs  wrote:
> Since the Linux Audit project has transitioned completely over to
> github, update the MAINTAINERS file and the primary audit source file to
> reflect that reality.
>
> Signed-off-by: Richard Guy Briggs 
> ---
>  MAINTAINERS| 1 -
>  kernel/audit.c | 3 ++-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Thanks for the revision, especially considering it was a really small
nit.  I'll queue this up for after the merge window.

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 845fc25..fba4875 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2479,7 +2479,6 @@ M:Paul Moore 
>  M: Eric Paris 
>  L: linux-au...@redhat.com (moderated for non-subscribers)
>  W: https://github.com/linux-audit
> -W: https://people.redhat.com/sgrubb/audit
>  T: git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
>  S: Supported
>  F: include/linux/audit.h
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 227db99..5c25449 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -38,7 +38,8 @@
>   *   6) Support low-overhead kernel-based filtering to minimize the
>   *  information that must be passed to user-space.
>   *
> - * Example user-space utilities: http://people.redhat.com/sgrubb/audit/
> + * Audit userspace, documentation, tests, and bug/issue trackers:
> + * https://github.com/linux-audit
>   */
>
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> --
> 1.8.3.1
>
> --
> Linux-audit mailing list
> linux-au...@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit



-- 
paul moore
www.paul-moore.com


Re: [PATCH] audit: update bugtracker and source URIs

2018-02-03 Thread Paul Moore
On Sat, Feb 3, 2018 at 12:33 AM, Richard Guy Briggs  wrote:
> Since the Linux Audit project has transitioned completely over to
> github, update the MAINTAINERS file and the primary audit source file to
> reflect that reality.
>
> Signed-off-by: Richard Guy Briggs 
> ---
>  MAINTAINERS| 1 -
>  kernel/audit.c | 3 ++-
>  2 files changed, 2 insertions(+), 2 deletions(-)

Thanks for the revision, especially considering it was a really small
nit.  I'll queue this up for after the merge window.

> diff --git a/MAINTAINERS b/MAINTAINERS
> index 845fc25..fba4875 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2479,7 +2479,6 @@ M:Paul Moore 
>  M: Eric Paris 
>  L: linux-au...@redhat.com (moderated for non-subscribers)
>  W: https://github.com/linux-audit
> -W: https://people.redhat.com/sgrubb/audit
>  T: git git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
>  S: Supported
>  F: include/linux/audit.h
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 227db99..5c25449 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -38,7 +38,8 @@
>   *   6) Support low-overhead kernel-based filtering to minimize the
>   *  information that must be passed to user-space.
>   *
> - * Example user-space utilities: http://people.redhat.com/sgrubb/audit/
> + * Audit userspace, documentation, tests, and bug/issue trackers:
> + * https://github.com/linux-audit
>   */
>
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> --
> 1.8.3.1
>
> --
> Linux-audit mailing list
> linux-au...@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit



-- 
paul moore
www.paul-moore.com


Re: [PATCH AUTOSEL for 4.9 28/52] led: core: Fix brightness setting when setting delay_off=0

2018-02-03 Thread Jacek Anaszewski
On 02/03/2018 10:22 PM, Jacek Anaszewski wrote:
> Hi Sasha,
> 
> All 3.18, 4.4 and 4.9 also require the follow-up
> patch [0], similarly like autosel did it for 4.14,
> since this one alone breaks the other use case.

Actually after taking closer look, it turns out that
the patch [0] applies cleanly only to 4.14.

4.9 requires also the patch :

eb1610b4c273 ("led: core: Fix blink_brightness setting race")

3.18 and 4.4 shouldn't have the issue since it was introduced
along with LED_BLINK_SW flag added in 4.7.

Thanks,
Jacek Anaszewski

> [0] https://lkml.org/lkml/2018/2/3/249
> 
> On 02/03/2018 07:03 PM, Sasha Levin wrote:
>> From: Matthieu CASTET 
>>
>> [ Upstream commit 2b83ff96f51d0b039c4561b9f95c824d7bddb85c ]
>>
>> With the current code, the following sequence won't work :
>> echo timer > trigger
>>
>> echo 0 >  delay_off
>> * at this point we call
>> ** led_delay_off_store
>> ** led_blink_set
>> *** stop timer
>> ** led_blink_setup
>> ** led_set_software_blink
>> *** if !delay_on, led off
>> *** if !delay_off, set led_set_brightness_nosleep <--- LED_BLINK_SW is set 
>> but timer is stop
>> *** otherwise start timer/set LED_BLINK_SW flag
>>
>> echo xxx > brightness
>> * led_set_brightness
>> ** if LED_BLINK_SW
>> *** if brightness=0, led off
>> *** else apply brightness if next timer <--- timer is stop, and will never 
>> apply new setting
>> ** otherwise set led_set_brightness_nosleep
>>
>> To fix that, when we delete the timer, we should clear LED_BLINK_SW.
>>
>> Cc: linux-l...@vger.kernel.org
>> Signed-off-by: Matthieu CASTET 
>> Signed-off-by: Jacek Anaszewski 
>> Signed-off-by: Sasha Levin 
>> ---
>>  drivers/leds/led-core.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
>> index 3bce44893021..d70d4a5273b8 100644
>> --- a/drivers/leds/led-core.c
>> +++ b/drivers/leds/led-core.c
>> @@ -186,7 +186,7 @@ void led_blink_set(struct led_classdev *led_cdev,
>> unsigned long *delay_on,
>> unsigned long *delay_off)
>>  {
>> -del_timer_sync(_cdev->blink_timer);
>> +led_stop_software_blink(led_cdev);
>>  
>>  led_cdev->flags &= ~LED_BLINK_ONESHOT;
>>  led_cdev->flags &= ~LED_BLINK_ONESHOT_STOP;
>>
> 



Re: [PATCH AUTOSEL for 4.9 28/52] led: core: Fix brightness setting when setting delay_off=0

2018-02-03 Thread Jacek Anaszewski
On 02/03/2018 10:22 PM, Jacek Anaszewski wrote:
> Hi Sasha,
> 
> All 3.18, 4.4 and 4.9 also require the follow-up
> patch [0], similarly like autosel did it for 4.14,
> since this one alone breaks the other use case.

Actually after taking closer look, it turns out that
the patch [0] applies cleanly only to 4.14.

4.9 requires also the patch :

eb1610b4c273 ("led: core: Fix blink_brightness setting race")

3.18 and 4.4 shouldn't have the issue since it was introduced
along with LED_BLINK_SW flag added in 4.7.

Thanks,
Jacek Anaszewski

> [0] https://lkml.org/lkml/2018/2/3/249
> 
> On 02/03/2018 07:03 PM, Sasha Levin wrote:
>> From: Matthieu CASTET 
>>
>> [ Upstream commit 2b83ff96f51d0b039c4561b9f95c824d7bddb85c ]
>>
>> With the current code, the following sequence won't work :
>> echo timer > trigger
>>
>> echo 0 >  delay_off
>> * at this point we call
>> ** led_delay_off_store
>> ** led_blink_set
>> *** stop timer
>> ** led_blink_setup
>> ** led_set_software_blink
>> *** if !delay_on, led off
>> *** if !delay_off, set led_set_brightness_nosleep <--- LED_BLINK_SW is set 
>> but timer is stop
>> *** otherwise start timer/set LED_BLINK_SW flag
>>
>> echo xxx > brightness
>> * led_set_brightness
>> ** if LED_BLINK_SW
>> *** if brightness=0, led off
>> *** else apply brightness if next timer <--- timer is stop, and will never 
>> apply new setting
>> ** otherwise set led_set_brightness_nosleep
>>
>> To fix that, when we delete the timer, we should clear LED_BLINK_SW.
>>
>> Cc: linux-l...@vger.kernel.org
>> Signed-off-by: Matthieu CASTET 
>> Signed-off-by: Jacek Anaszewski 
>> Signed-off-by: Sasha Levin 
>> ---
>>  drivers/leds/led-core.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/leds/led-core.c b/drivers/leds/led-core.c
>> index 3bce44893021..d70d4a5273b8 100644
>> --- a/drivers/leds/led-core.c
>> +++ b/drivers/leds/led-core.c
>> @@ -186,7 +186,7 @@ void led_blink_set(struct led_classdev *led_cdev,
>> unsigned long *delay_on,
>> unsigned long *delay_off)
>>  {
>> -del_timer_sync(_cdev->blink_timer);
>> +led_stop_software_blink(led_cdev);
>>  
>>  led_cdev->flags &= ~LED_BLINK_ONESHOT;
>>  led_cdev->flags &= ~LED_BLINK_ONESHOT_STOP;
>>
> 



[PULL REQUEST] i2c for 4.16

2018-02-03 Thread Wolfram Sang
Linus,

I2C has the following changes for you:

* new flag to mark DMA safe buffers in i2c_msg. Also, some
  infrastructure around it. And docs.
* huge refactoring of the at24 driver led by the new maintainer Bartosz
* update I2C bus recovery to send STOP after recovery
* conversion from gpio to gpiod for I2C bus recovery
* adding a fault-injector to the i2c-gpio driver
* lots of small driver improvements, and bigger ones to i2c-sh_mobile

There was a small merge conflict in MAINTAINERS in linux-next, but that
should be easy to fix.

Please pull.

Thanks,

   Wolfram


The following changes since commit 30a7acd573899fd8b8ac39236eff6468b195ac7d:

  Linux 4.15-rc6 (2017-12-31 14:47:43 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-4.16

for you to fetch changes up to e38c85644e11c6dc5a39305c96b617f63403423d:

  i2c: mv64xxx: Add myself as maintainer for this driver (2018-01-26 18:51:29 
+0100)


Adrian Fiergolski (1):
  i2c: mux: pca954x: add support for NXP PCA984x family

Andrzej Hajda (1):
  i2c: exynos5: change internal transmission timeout to 100ms

Andy Shevchenko (1):
  i2c: ismt: Use %pad specifier for dma_addr_t variables

Arnd Bergmann (1):
  i2c: acorn: add MODULE_LICENSE tag

Arseny Solokha (4):
  i2c: mpc: get MPC8xxx I2C clock prescaler before using it in calculations
  i2c: mpc: unify obtaining the MPC8533/44 I2C clock prescaler w/ MPC8xxx
  i2c: mpc: fix PORDEVSR2 mask for MPC8533/44
  i2c: mpc: always determine I2C clock prescaler at runtime

Bartosz Golaszewski (12):
  MAINTAINERS: add git URL for at24
  eeprom: at24: fix coding style issues
  eeprom: at24: use a common prefix for all symbols in at24.c
  eeprom: at24: code shrink
  dt-bindings: at24: new optional property - wp-gpios
  eeprom: at24: add support for the write-protect pin
  eeprom: at24: fix a whitespace error in platform data
  dt-bindings: at24: consistently document the compatible property
  dt-bindings: at24: fix formatting and style
  dt-bindings: at24: extend the list of supported chips
  eeprom: at24: extend the list of chips supported in DT
  i2c: davinci: fix the cpufreq transition

Fugang Duan (1):
  i2c: imx-lpi2c: add runtime pm support

Gregory CLEMENT (3):
  i2c: mv64xxx: Remove useless test before clk_disable_unprepare
  i2c: mv64xxx: Fix clock resource by adding an optional bus clock
  i2c: mv64xxx: Add myself as maintainer for this driver

Gustavo A. R. Silva (1):
  i2c: mxs: use true and false for boolean values

Heiner Kallweit (7):
  eeprom: at24: add basic regmap_i2c support
  eeprom: at24: change at24_translate_offset return type
  eeprom: at24: add regmap-based write function
  eeprom: at24: remove old write functions
  eeprom: at24: add regmap-based read function
  eeprom: at24: remove old read functions
  eeprom: at24: remove now unneeded smbus-related code

Jan Kundr??t (1):
  i2c: gpio: Enable working over slow can_sleep GPIOs

Jarkko Nikula (1):
  i2c: designware: Don't set SCL timings and speed mode when in slave mode

Jian Hu (2):
  dt-bindings: i2c: update documentation for the Meson-AXG
  i2c: meson: add configurable divider factors

Julia Lawall (1):
  i2c: rk3x: account for const type of of_device_id.data

Jun Gao (3):
  dt-bindings: i2c: Add MediaTek MT2712 i2c binding
  i2c: mediatek: Add i2c compatible for MediaTek MT2712
  i2c: mediatek: Enable i2c module clock before i2c registers access.

Linus Walleij (2):
  i2c: imx: Include the right GPIO header
  i2c/ARM: davinci: Deep refactoring of I2C recovery

Phil Reid (8):
  i2c: Switch to using gpiod interface for gpio bus recovery
  i2c: designware: move i2c_dw_plat_prepare_clk to common
  i2c: designware: rename i2c_dw_plat_prepare_clk to i2c_dw_prepare_clk
  i2c: imx: switch to using gpiod for bus recovery gpios
  i2c: davinci: switch to using gpiod for bus recovery gpios
  i2c: remove legacy integer scl/sda gpio for recovery
  i2c: core: fix compile issue related to incorrect gpio header
  i2c: designware: fix building driver as module

Radu Rendec (2):
  i2c: ismt: dump registers at the end of transactions
  i2c: ismt: 16-byte align the DMA buffer address

Stefan Lengfeld (1):
  i2c: use macro IS_ENABLED in header i2c.h

Sven Van Asbroeck (4):
  dt-bindings: add eeprom "no-read-rollover" property
  eeprom: at24: support eeproms that do not auto-rollover reads
  eeprom: at24: convert magic numbers to structs
  eeprom: at24: remove temporary fix for at24mac402 size

Tim Sander (1):
  i2c: designware: add i2c gpio recovery option

Tomasz Bachorski (1):
  i2c: mux: reg: don't log an error for probe deferral

Wolfram Sang (41):
  i2c: sh_mobile: remove redundant 

[PULL REQUEST] i2c for 4.16

2018-02-03 Thread Wolfram Sang
Linus,

I2C has the following changes for you:

* new flag to mark DMA safe buffers in i2c_msg. Also, some
  infrastructure around it. And docs.
* huge refactoring of the at24 driver led by the new maintainer Bartosz
* update I2C bus recovery to send STOP after recovery
* conversion from gpio to gpiod for I2C bus recovery
* adding a fault-injector to the i2c-gpio driver
* lots of small driver improvements, and bigger ones to i2c-sh_mobile

There was a small merge conflict in MAINTAINERS in linux-next, but that
should be easy to fix.

Please pull.

Thanks,

   Wolfram


The following changes since commit 30a7acd573899fd8b8ac39236eff6468b195ac7d:

  Linux 4.15-rc6 (2017-12-31 14:47:43 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-4.16

for you to fetch changes up to e38c85644e11c6dc5a39305c96b617f63403423d:

  i2c: mv64xxx: Add myself as maintainer for this driver (2018-01-26 18:51:29 
+0100)


Adrian Fiergolski (1):
  i2c: mux: pca954x: add support for NXP PCA984x family

Andrzej Hajda (1):
  i2c: exynos5: change internal transmission timeout to 100ms

Andy Shevchenko (1):
  i2c: ismt: Use %pad specifier for dma_addr_t variables

Arnd Bergmann (1):
  i2c: acorn: add MODULE_LICENSE tag

Arseny Solokha (4):
  i2c: mpc: get MPC8xxx I2C clock prescaler before using it in calculations
  i2c: mpc: unify obtaining the MPC8533/44 I2C clock prescaler w/ MPC8xxx
  i2c: mpc: fix PORDEVSR2 mask for MPC8533/44
  i2c: mpc: always determine I2C clock prescaler at runtime

Bartosz Golaszewski (12):
  MAINTAINERS: add git URL for at24
  eeprom: at24: fix coding style issues
  eeprom: at24: use a common prefix for all symbols in at24.c
  eeprom: at24: code shrink
  dt-bindings: at24: new optional property - wp-gpios
  eeprom: at24: add support for the write-protect pin
  eeprom: at24: fix a whitespace error in platform data
  dt-bindings: at24: consistently document the compatible property
  dt-bindings: at24: fix formatting and style
  dt-bindings: at24: extend the list of supported chips
  eeprom: at24: extend the list of chips supported in DT
  i2c: davinci: fix the cpufreq transition

Fugang Duan (1):
  i2c: imx-lpi2c: add runtime pm support

Gregory CLEMENT (3):
  i2c: mv64xxx: Remove useless test before clk_disable_unprepare
  i2c: mv64xxx: Fix clock resource by adding an optional bus clock
  i2c: mv64xxx: Add myself as maintainer for this driver

Gustavo A. R. Silva (1):
  i2c: mxs: use true and false for boolean values

Heiner Kallweit (7):
  eeprom: at24: add basic regmap_i2c support
  eeprom: at24: change at24_translate_offset return type
  eeprom: at24: add regmap-based write function
  eeprom: at24: remove old write functions
  eeprom: at24: add regmap-based read function
  eeprom: at24: remove old read functions
  eeprom: at24: remove now unneeded smbus-related code

Jan Kundr??t (1):
  i2c: gpio: Enable working over slow can_sleep GPIOs

Jarkko Nikula (1):
  i2c: designware: Don't set SCL timings and speed mode when in slave mode

Jian Hu (2):
  dt-bindings: i2c: update documentation for the Meson-AXG
  i2c: meson: add configurable divider factors

Julia Lawall (1):
  i2c: rk3x: account for const type of of_device_id.data

Jun Gao (3):
  dt-bindings: i2c: Add MediaTek MT2712 i2c binding
  i2c: mediatek: Add i2c compatible for MediaTek MT2712
  i2c: mediatek: Enable i2c module clock before i2c registers access.

Linus Walleij (2):
  i2c: imx: Include the right GPIO header
  i2c/ARM: davinci: Deep refactoring of I2C recovery

Phil Reid (8):
  i2c: Switch to using gpiod interface for gpio bus recovery
  i2c: designware: move i2c_dw_plat_prepare_clk to common
  i2c: designware: rename i2c_dw_plat_prepare_clk to i2c_dw_prepare_clk
  i2c: imx: switch to using gpiod for bus recovery gpios
  i2c: davinci: switch to using gpiod for bus recovery gpios
  i2c: remove legacy integer scl/sda gpio for recovery
  i2c: core: fix compile issue related to incorrect gpio header
  i2c: designware: fix building driver as module

Radu Rendec (2):
  i2c: ismt: dump registers at the end of transactions
  i2c: ismt: 16-byte align the DMA buffer address

Stefan Lengfeld (1):
  i2c: use macro IS_ENABLED in header i2c.h

Sven Van Asbroeck (4):
  dt-bindings: add eeprom "no-read-rollover" property
  eeprom: at24: support eeproms that do not auto-rollover reads
  eeprom: at24: convert magic numbers to structs
  eeprom: at24: remove temporary fix for at24mac402 size

Tim Sander (1):
  i2c: designware: add i2c gpio recovery option

Tomasz Bachorski (1):
  i2c: mux: reg: don't log an error for probe deferral

Wolfram Sang (41):
  i2c: sh_mobile: remove redundant 

Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-02-03 Thread Boris Lukashev
On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa  wrote:
>
>
> On 03/02/18 22:12, Boris Lukashev wrote:
>
>> Regarding the notion of validated protected memory, is there a method
>> by which the resulting checksum could be used in a lookup
>> table/function to resolve the location of the protected data?
>
> What I have in mind is a checksum at page/vmap_area level, so there
> would be no 1:1 mapping between a specific allocation and the checksum.
>
> An extreme case would be the one where an allocation crosses one or more
> page boundaries, while the checksum refers to a (partially) overlapping
> memory area.
>
> Code accessing a pool could perform one (relatively expensive)
> validation. But still something that would require a more sophisticated
> attack, to subvert.
>
>> Effectively a hash table of protected allocations, with a benefit of
>> dedup since any data matching the same key would be the same data
>> (multiple identical cred structs being pushed around). Should leave
>> the resolver address/csum in recent memory to check against, right?
>
> I see where you are trying to land, but I do not see how it would work
> without a further intermediate step.
>
> pmalloc dishes out virtual memory addresses, when called.
>
> It doesn't know what the user of the allocation will put in it.
> The user, otoh, has the direct address of the memory it got.
>
> What you are suggesting, if I have understood it correctly, is that,
> when the pool is protected, the addresses already given out, will become
> traps that get resolved through a lookup table that is built based on
> the content of each allocation.
>
> That seems to generate a lot of overhead, not to mention the fact that
> it might not play very well with the MMU.

That is effectively what i'm suggesting - as a form of protection for
consumers against direct reads of data which may have been corrupted
by some irrelevant means. In the context of pmalloc, it would probably
be a separate type of ro+verified pool which consumers would
explicitly opt into. Say there's a maintenance cycle on a  and it wants to make sure that the
instructions it read in are what they should have been before running
them, those consumers might well take the penalty if it keeps  from doing .
If such a resolver could be implemented in a manner which doesnt break
all the things (including acceptable performance for at least a
significant number of workloads), it might be useful as a general tool
for handing out memory to userspace, even in rw, as it provides
execution context in which other requirements can be forcibly
resolved, preventing unauthorized access to pages the consumer
shouldn't get in a very generic way. Spectre comes to mind as a
potential class of issues to be addressed this way, since speculative
load could be prevented if the resolution were to fail.

>
> If I misunderstood, then I'd need a step by step description of what
> happens, because it's not clear to me how else the data would be
> accessed if not through the address that was obtained when pmalloc was
> invoked.
>
> --
> igor



-- 
Boris Lukashev
Systems Architect
Semper Victus


Re: [kernel-hardening] [PATCH 4/6] Protectable Memory

2018-02-03 Thread Boris Lukashev
On Sat, Feb 3, 2018 at 3:32 PM, Igor Stoppa  wrote:
>
>
> On 03/02/18 22:12, Boris Lukashev wrote:
>
>> Regarding the notion of validated protected memory, is there a method
>> by which the resulting checksum could be used in a lookup
>> table/function to resolve the location of the protected data?
>
> What I have in mind is a checksum at page/vmap_area level, so there
> would be no 1:1 mapping between a specific allocation and the checksum.
>
> An extreme case would be the one where an allocation crosses one or more
> page boundaries, while the checksum refers to a (partially) overlapping
> memory area.
>
> Code accessing a pool could perform one (relatively expensive)
> validation. But still something that would require a more sophisticated
> attack, to subvert.
>
>> Effectively a hash table of protected allocations, with a benefit of
>> dedup since any data matching the same key would be the same data
>> (multiple identical cred structs being pushed around). Should leave
>> the resolver address/csum in recent memory to check against, right?
>
> I see where you are trying to land, but I do not see how it would work
> without a further intermediate step.
>
> pmalloc dishes out virtual memory addresses, when called.
>
> It doesn't know what the user of the allocation will put in it.
> The user, otoh, has the direct address of the memory it got.
>
> What you are suggesting, if I have understood it correctly, is that,
> when the pool is protected, the addresses already given out, will become
> traps that get resolved through a lookup table that is built based on
> the content of each allocation.
>
> That seems to generate a lot of overhead, not to mention the fact that
> it might not play very well with the MMU.

That is effectively what i'm suggesting - as a form of protection for
consumers against direct reads of data which may have been corrupted
by some irrelevant means. In the context of pmalloc, it would probably
be a separate type of ro+verified pool which consumers would
explicitly opt into. Say there's a maintenance cycle on a  and it wants to make sure that the
instructions it read in are what they should have been before running
them, those consumers might well take the penalty if it keeps  from doing .
If such a resolver could be implemented in a manner which doesnt break
all the things (including acceptable performance for at least a
significant number of workloads), it might be useful as a general tool
for handing out memory to userspace, even in rw, as it provides
execution context in which other requirements can be forcibly
resolved, preventing unauthorized access to pages the consumer
shouldn't get in a very generic way. Spectre comes to mind as a
potential class of issues to be addressed this way, since speculative
load could be prevented if the resolution were to fail.

>
> If I misunderstood, then I'd need a step by step description of what
> happens, because it's not clear to me how else the data would be
> accessed if not through the address that was obtained when pmalloc was
> invoked.
>
> --
> igor



-- 
Boris Lukashev
Systems Architect
Semper Victus


Re: [GIT PULL tools] Linux kernel memory model

2018-02-03 Thread Alan Stern
On Sat, 3 Feb 2018, Paul E. McKenney wrote:

> Please see below for an initial patch to this effect.  This activity
> proved to be more productive than expected for these tests, which certainly
> supports our assertion that locking needs more testing...
> 
> MP+polocks.litmus
> MP+porevlocks.litmus
> 
>   These are allowed by the current model, which surprised me a bit,
>   given that even powerpc would forbid them.  Is the rationale
>   that a lock-savvy compiler could pull accesses into the lock's
>   critical section and then reorder those accesses?  Or does this
>   constitute a bug in our model of locking?
> 
>   (And these were allowed when I wrote recipes.txt, embarrassingly
>   enough...)
> 
> Z6.0+pooncelock+poonceLock+pombonce.litmus
> 
>   This was forbidden when I wrote recipes.txt, but now is allowed.
>   The header comment for smp_mb__after_spinlock() makes it pretty
>   clear that it must be forbidden.  So this one is a bug in our
>   model of locking.

I just tried testing these under the most recent version of herd, and 
all three were forbidden.

Alan



Re: [GIT PULL tools] Linux kernel memory model

2018-02-03 Thread Alan Stern
On Sat, 3 Feb 2018, Paul E. McKenney wrote:

> Please see below for an initial patch to this effect.  This activity
> proved to be more productive than expected for these tests, which certainly
> supports our assertion that locking needs more testing...
> 
> MP+polocks.litmus
> MP+porevlocks.litmus
> 
>   These are allowed by the current model, which surprised me a bit,
>   given that even powerpc would forbid them.  Is the rationale
>   that a lock-savvy compiler could pull accesses into the lock's
>   critical section and then reorder those accesses?  Or does this
>   constitute a bug in our model of locking?
> 
>   (And these were allowed when I wrote recipes.txt, embarrassingly
>   enough...)
> 
> Z6.0+pooncelock+poonceLock+pombonce.litmus
> 
>   This was forbidden when I wrote recipes.txt, but now is allowed.
>   The header comment for smp_mb__after_spinlock() makes it pretty
>   clear that it must be forbidden.  So this one is a bug in our
>   model of locking.

I just tried testing these under the most recent version of herd, and 
all three were forbidden.

Alan



Re: [PATCH v2 1/2] arm64: dts: rockchip: add i2s0-2ch-bus pins on rk3399

2018-02-03 Thread Heiko Stuebner
Am Samstag, 3. Februar 2018, 16:50:15 CET schrieb Klaus Goger:
> Add pin definition for I2S0 if used as a 2-channel only bus.
> 
> Signed-off-by: Klaus Goger 

applied for 4.17


Thanks
Heiko


Re: [PATCH v2 1/2] arm64: dts: rockchip: add i2s0-2ch-bus pins on rk3399

2018-02-03 Thread Heiko Stuebner
Am Samstag, 3. Februar 2018, 16:50:15 CET schrieb Klaus Goger:
> Add pin definition for I2S0 if used as a 2-channel only bus.
> 
> Signed-off-by: Klaus Goger 

applied for 4.17


Thanks
Heiko


Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

2018-02-03 Thread Linus Torvalds
On Sat, Feb 3, 2018 at 9:04 AM, Mathieu Desnoyers
 wrote:
>
> The approach proposed here will introduce an expectation that internal
> function signatures never change in the kernel, else it would break user-space
> tools hooking on those functions.

No, I really don't think so.

There's two reasons for that:

The first is purely about kernel development. I, and every sane kernel
engineer, will simply laugh in the face of somebody who comes to us
and says "hey, I had this script that did low-level function tracing
on your kernel, and then you changed something, and now the random
function I was tracing has a new name and different arguments".

We'll just go "yeah, tough, change your script". Or more likely, not
even bother to reply at all.

But the bigger issue is actually simply just psychology. Exactly
*because* this is all implicit, and there are no explicit trace
points, it's _obvious_ to any user that there isn't something
long-term dependable that they hang their hat on.

Everybody *understands* that this is like a debugger: if you have a
gdb script that shows some information, and then you go around and
change the source code, then *obviously* you'll have to change your
debugger script too. You don't keep the source code static just to
make your gdb script happy., That would be silly.

In contrast, the explicit tracepoints really made people believe that
they have some long-term meaning.

So yes, we'll  make it obvious that hell no, random kernel functions
are not a long-term ABI. But honestly, I don't think we even need to
have a lot of "education" on this, simply because it's so obvious that
anybody who thinks it's some ABI is not going to be somebody we'll
have to worry about.

Because the kind of person thinking "Ooh, this is a stable ABI" won't
be doing interesting work anyway. That kind of person will be sitting
in a corner eating paste, not doing interesting kernel tracing.

 Linus


Re: [PATCH 00/18] [ANNOUNCE] Dynamically created function based events

2018-02-03 Thread Linus Torvalds
On Sat, Feb 3, 2018 at 9:04 AM, Mathieu Desnoyers
 wrote:
>
> The approach proposed here will introduce an expectation that internal
> function signatures never change in the kernel, else it would break user-space
> tools hooking on those functions.

No, I really don't think so.

There's two reasons for that:

The first is purely about kernel development. I, and every sane kernel
engineer, will simply laugh in the face of somebody who comes to us
and says "hey, I had this script that did low-level function tracing
on your kernel, and then you changed something, and now the random
function I was tracing has a new name and different arguments".

We'll just go "yeah, tough, change your script". Or more likely, not
even bother to reply at all.

But the bigger issue is actually simply just psychology. Exactly
*because* this is all implicit, and there are no explicit trace
points, it's _obvious_ to any user that there isn't something
long-term dependable that they hang their hat on.

Everybody *understands* that this is like a debugger: if you have a
gdb script that shows some information, and then you go around and
change the source code, then *obviously* you'll have to change your
debugger script too. You don't keep the source code static just to
make your gdb script happy., That would be silly.

In contrast, the explicit tracepoints really made people believe that
they have some long-term meaning.

So yes, we'll  make it obvious that hell no, random kernel functions
are not a long-term ABI. But honestly, I don't think we even need to
have a lot of "education" on this, simply because it's so obvious that
anybody who thinks it's some ABI is not going to be somebody we'll
have to worry about.

Because the kind of person thinking "Ooh, this is a stable ABI" won't
be doing interesting work anyway. That kind of person will be sitting
in a corner eating paste, not doing interesting kernel tracing.

 Linus


  1   2   3   4   5   6   7   8   9   >